2024-09-13 17:02:02,345 INFO [train.py:1266] (1/2) Training started 2024-09-13 17:02:02,346 INFO [train.py:1276] (1/2) Device: cuda:1 2024-09-13 17:02:02,364 INFO [train.py:1307] (1/2) Using dtype=torch.float16 2024-09-13 17:02:02,365 INFO [train.py:1308] (1/2) Use AMP=True 2024-09-13 17:02:02,365 INFO [train.py:1310] (1/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '44a9d5682af9fd3ef77074777e15278ec6d390eb', 'k2-git-date': 'Wed Sep 27 11:22:55 2023', 'lhotse-version': '1.17.0.dev+git.ccfc5b2c.dirty', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': 'a6eead6c-clean', 'icefall-git-date': 'Mon Sep 9 10:10:08 2024', 'icefall-path': '/star-zw/workspace/zipformer/icefall_cr_ctc', 'k2-path': '/star-zw/workspace/k2/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-zw/workspace/lhotse/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-3-0904151514-6f47fc7cf9-27nzd', 'IP address': '10.30.28.42'}, 'world_size': 2, 'master_port': 12348, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.04, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 1.0, 'cr_loss_scale': 0.2, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,2,2,2,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,768,768,768,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,256,256,256,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,192,192,192,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': False, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 850, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-13 17:02:02,365 INFO [train.py:1312] (1/2) About to create model 2024-09-13 17:02:03,032 INFO [train.py:1316] (1/2) Number of model parameters: 22118279 2024-09-13 17:02:03,033 INFO [train.py:752] (1/2) num_frame_masks: 25, max_frames_mask_fraction: 0.375 2024-09-13 17:02:11,375 INFO [train.py:1338] (1/2) Using DDP 2024-09-13 17:02:12,234 INFO [asr_datamodule.py:436] (1/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-13 17:02:13,758 INFO [asr_datamodule.py:232] (1/2) Enable MUSAN 2024-09-13 17:02:13,758 INFO [asr_datamodule.py:233] (1/2) About to get Musan cuts 2024-09-13 17:02:15,592 INFO [asr_datamodule.py:279] (1/2) Disable SpecAugment 2024-09-13 17:02:15,592 INFO [asr_datamodule.py:281] (1/2) About to create train dataset 2024-09-13 17:02:15,593 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler. 2024-09-13 17:02:40,077 INFO [asr_datamodule.py:325] (1/2) About to create train dataloader 2024-09-13 17:02:40,079 INFO [asr_datamodule.py:453] (1/2) About to get dev-clean cuts 2024-09-13 17:02:40,080 INFO [asr_datamodule.py:460] (1/2) About to get dev-other cuts 2024-09-13 17:02:40,080 INFO [asr_datamodule.py:356] (1/2) About to create dev dataset 2024-09-13 17:02:40,280 INFO [asr_datamodule.py:373] (1/2) About to create dev dataloader 2024-09-13 17:02:40,280 INFO [train.py:1545] (1/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-13 17:06:53,814 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 17162MB 2024-09-13 17:06:55,758 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:06:58,027 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:00,100 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:02,845 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:05,086 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 19829MB 2024-09-13 17:07:57,902 INFO [train.py:1198] (1/2) Epoch 1, batch 0, loss[loss=4.938, ctc_loss=4.74, cr_loss=0.9914, over 21007.00 frames. ], tot_loss[loss=4.938, ctc_loss=4.74, cr_loss=0.9914, over 21007.00 frames. ], batch size: 61, lr: 2.00e-02, grad_scale: 2.0 2024-09-13 17:07:57,902 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 17:08:16,729 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=4.672, ctc_loss=4.672, cr_loss=3.997e-15, over 944034.00 frames. 2024-09-13 17:08:16,729 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 19829MB 2024-09-13 17:08:19,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=3.0 2024-09-13 17:08:20,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=0.0, ans=0.5 2024-09-13 17:08:21,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=0.0, ans=0.1 2024-09-13 17:08:36,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 3.661e+03 5.991e+03 6.509e+03 6.684e+03 1.271e+04, threshold=2.604e+04, percent-clipped=0.0 2024-09-13 17:08:54,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.533e+03 4.882e+03 6.179e+03 1.114e+04 2.062e+04, threshold=2.472e+04, percent-clipped=0.0 2024-09-13 17:09:04,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=7.5425 2024-09-13 17:09:13,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=7.56375 2024-09-13 17:09:29,890 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 3.053e+02 1.776e+03 3.339e+03 6.179e+03 2.062e+04, threshold=1.336e+04, percent-clipped=0.0 2024-09-13 17:09:33,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=113.33333333333333, ans=0.4946875 2024-09-13 17:09:34,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=17.02 vs. limit=7.5425 2024-09-13 17:09:43,461 INFO [train.py:1198] (1/2) Epoch 1, batch 50, loss[loss=1.313, ctc_loss=1.279, cr_loss=0.169, over 20974.00 frames. ], tot_loss[loss=1.804, ctc_loss=1.741, cr_loss=0.3154, over 923321.60 frames. ], batch size: 58, lr: 2.20e-02, grad_scale: 0.5 2024-09-13 17:09:44,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=7.60625 2024-09-13 17:09:49,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=4.056666666666667 2024-09-13 17:09:53,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=141.66666666666666, ans=0.8950416666666667 2024-09-13 17:09:54,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=7.60625 2024-09-13 17:10:04,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=170.0, ans=0.09893750000000001 2024-09-13 17:10:18,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=28.06 vs. limit=5.099166666666667 2024-09-13 17:10:20,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=198.33333333333334, ans=0.490703125 2024-09-13 17:10:22,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.26 vs. limit=7.574375 2024-09-13 17:10:39,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226.66666666666666, ans=0.29773333333333335 2024-09-13 17:10:41,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.09 vs. limit=5.056666666666667 2024-09-13 17:10:44,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=45.54 vs. limit=7.585 2024-09-13 17:10:49,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=42.85 vs. limit=5.113333333333333 2024-09-13 17:10:55,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.97 vs. limit=7.595625 2024-09-13 17:11:01,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=255.0, ans=0.1904375 2024-09-13 17:11:02,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=30.05 vs. limit=7.595625 2024-09-13 17:11:03,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=255.0, ans=0.0942625 2024-09-13 17:11:09,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=31.05 vs. limit=7.60625 2024-09-13 17:11:10,333 INFO [train.py:1198] (1/2) Epoch 1, batch 100, loss[loss=1.239, ctc_loss=1.217, cr_loss=0.1098, over 21031.00 frames. ], tot_loss[loss=1.481, ctc_loss=1.44, cr_loss=0.2054, over 1635924.76 frames. ], batch size: 63, lr: 2.40e-02, grad_scale: 1.0 2024-09-13 17:11:13,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.479e+02 5.147e+02 9.858e+02 2.357e+03 2.062e+04, threshold=1.972e+03, percent-clipped=0.0 2024-09-13 17:11:21,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=35.57 vs. limit=7.7125 2024-09-13 17:11:31,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.79 vs. limit=7.616875 2024-09-13 17:11:33,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=311.6666666666667, ans=5.194791666666667 2024-09-13 17:11:37,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=7.73375 2024-09-13 17:11:39,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=7.73375 2024-09-13 17:11:49,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=44.24 vs. limit=7.6275 2024-09-13 17:11:53,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=34.65 vs. limit=7.6275 2024-09-13 17:11:54,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=7.6275 2024-09-13 17:12:06,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=166.71 vs. limit=7.77625 2024-09-13 17:12:24,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=396.6666666666667, ans=0.5 2024-09-13 17:12:28,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=4.158666666666667 2024-09-13 17:12:29,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=396.6666666666667, ans=0.5 2024-09-13 17:12:33,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=396.6666666666667, ans=0.091075 2024-09-13 17:12:37,996 INFO [train.py:1198] (1/2) Epoch 1, batch 150, loss[loss=1.144, ctc_loss=1.124, cr_loss=0.09845, over 14072.00 frames. ], tot_loss[loss=1.359, ctc_loss=1.327, cr_loss=0.1594, over 2172962.80 frames. ], batch size: 150, lr: 2.60e-02, grad_scale: 1.0 2024-09-13 17:12:55,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=425.0, ans=7.659375 2024-09-13 17:13:19,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=73.30 vs. limit=7.680625 2024-09-13 17:13:20,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=148.14 vs. limit=7.680625 2024-09-13 17:13:29,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=105.56 vs. limit=7.680625 2024-09-13 17:13:38,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=7.8825 2024-09-13 17:13:38,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=65.09 vs. limit=5.255 2024-09-13 17:13:43,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=510.0, ans=0.20765 2024-09-13 17:13:50,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=42.52 vs. limit=7.701875 2024-09-13 17:13:55,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=165.44 vs. limit=7.701875 2024-09-13 17:14:01,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=119.36 vs. limit=7.9037500000000005 2024-09-13 17:14:09,306 INFO [train.py:1198] (1/2) Epoch 1, batch 200, loss[loss=1.225, ctc_loss=1.206, cr_loss=0.09713, over 18297.00 frames. ], tot_loss[loss=1.297, ctc_loss=1.27, cr_loss=0.1365, over 2597257.81 frames. ], batch size: 108, lr: 2.80e-02, grad_scale: 2.0 2024-09-13 17:14:12,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.501e+02 3.089e+02 4.430e+02 6.680e+02 6.501e+03, threshold=8.860e+02, percent-clipped=1.0 2024-09-13 17:14:14,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=566.6666666666666, ans=0.04822916666666667 2024-09-13 17:14:30,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.58 vs. limit=7.94625 2024-09-13 17:14:33,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=101.21 vs. limit=7.723125 2024-09-13 17:14:49,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=623.3333333333334, ans=0.47078125 2024-09-13 17:14:51,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=5.155833333333334 2024-09-13 17:14:53,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=86.00 vs. limit=7.73375 2024-09-13 17:15:07,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=651.6666666666666, ans=0.469453125 2024-09-13 17:15:07,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=29.28 vs. limit=7.98875 2024-09-13 17:15:21,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=74.47 vs. limit=5.34 2024-09-13 17:15:28,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=8.01 2024-09-13 17:15:40,598 INFO [train.py:1198] (1/2) Epoch 1, batch 250, loss[loss=1.223, ctc_loss=1.199, cr_loss=0.1204, over 14092.00 frames. ], tot_loss[loss=1.262, ctc_loss=1.237, cr_loss=0.1241, over 2916740.64 frames. ], batch size: 149, lr: 3.00e-02, grad_scale: 2.0 2024-09-13 17:15:41,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=8.03125 2024-09-13 17:15:48,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=8.03125 2024-09-13 17:15:48,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=91.92 vs. limit=7.765625 2024-09-13 17:15:50,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=7.765625 2024-09-13 17:15:56,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=8.0525 2024-09-13 17:15:57,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=736.6666666666666, ans=5.460416666666666 2024-09-13 17:15:59,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=736.6666666666666, ans=0.46546875 2024-09-13 17:16:08,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.47 vs. limit=5.368333333333333 2024-09-13 17:16:23,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=765.0, ans=0.1713125 2024-09-13 17:16:32,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=140.35 vs. limit=7.7975 2024-09-13 17:16:45,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=793.3333333333334, ans=0.7579333333333333 2024-09-13 17:16:47,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=821.6666666666666, ans=0.461484375 2024-09-13 17:16:47,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=70.69 vs. limit=7.808125 2024-09-13 17:17:06,345 INFO [train.py:1198] (1/2) Epoch 1, batch 300, loss[loss=1.25, ctc_loss=1.221, cr_loss=0.1447, over 20671.00 frames. ], tot_loss[loss=1.247, ctc_loss=1.223, cr_loss=0.1217, over 3179745.56 frames. ], batch size: 66, lr: 3.20e-02, grad_scale: 4.0 2024-09-13 17:17:07,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=155.57 vs. limit=7.81875 2024-09-13 17:17:09,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=19.36 vs. limit=7.81875 2024-09-13 17:17:09,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.667e+02 2.646e+02 3.598e+02 4.970e+02 1.314e+03, threshold=7.197e+02, percent-clipped=2.0 2024-09-13 17:17:12,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=850.0, ans=0.46015625 2024-09-13 17:17:13,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=850.0, ans=0.46015625 2024-09-13 17:17:22,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=878.3333333333334, ans=5.548958333333333 2024-09-13 17:17:29,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=878.3333333333334, ans=0.0802375 2024-09-13 17:17:46,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=27.53 vs. limit=7.84 2024-09-13 17:17:47,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=8.18 2024-09-13 17:18:04,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=935.0, ans=0.0789625 2024-09-13 17:18:16,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=963.3333333333334, ans=0.45484375 2024-09-13 17:18:19,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=37.00 vs. limit=8.2225 2024-09-13 17:18:25,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=8.2225 2024-09-13 17:18:33,240 INFO [train.py:1198] (1/2) Epoch 1, batch 350, loss[loss=1.256, ctc_loss=1.215, cr_loss=0.2015, over 20262.00 frames. ], tot_loss[loss=1.233, ctc_loss=1.206, cr_loss=0.1339, over 3391951.61 frames. ], batch size: 74, lr: 3.40e-02, grad_scale: 4.0 2024-09-13 17:18:53,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=7.8825 2024-09-13 17:19:01,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=8.265 2024-09-13 17:19:05,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=70.61 vs. limit=7.8825 2024-09-13 17:19:05,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=8.265 2024-09-13 17:19:13,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1048.3333333333333, ans=0.450859375 2024-09-13 17:19:22,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1048.3333333333333, ans=0.450859375 2024-09-13 17:19:26,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=59.17 vs. limit=7.8931249999999995 2024-09-13 17:19:27,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=8.307500000000001 2024-09-13 17:19:39,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=7.9037500000000005 2024-09-13 17:20:02,742 INFO [train.py:1198] (1/2) Epoch 1, batch 400, loss[loss=1.203, ctc_loss=1.158, cr_loss=0.2229, over 20927.00 frames. ], tot_loss[loss=1.217, ctc_loss=1.187, cr_loss=0.1496, over 3554539.45 frames. ], batch size: 60, lr: 3.60e-02, grad_scale: 8.0 2024-09-13 17:20:03,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=31.47 vs. limit=7.925 2024-09-13 17:20:06,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.38 vs. limit=8.35 2024-09-13 17:20:07,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.611e+02 3.604e+02 4.772e+02 6.857e+02 1.281e+03, threshold=9.545e+02, percent-clipped=18.0 2024-09-13 17:20:10,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=5.283333333333333 2024-09-13 17:20:18,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161.6666666666667, ans=0.2883833333333333 2024-09-13 17:20:21,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1161.6666666666667, ans=0.217425 2024-09-13 17:20:24,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=7.935625 2024-09-13 17:20:30,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.25 vs. limit=5.5808333333333335 2024-09-13 17:20:48,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=31.63 vs. limit=7.94625 2024-09-13 17:20:51,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1190.0, ans=0.44421875 2024-09-13 17:20:57,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=23.02 vs. limit=7.956875 2024-09-13 17:20:57,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=5.304583333333333 2024-09-13 17:21:03,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1218.3333333333333, ans=0.442890625 2024-09-13 17:21:23,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=7.9675 2024-09-13 17:21:26,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1246.6666666666667, ans=0.3441666666666666 2024-09-13 17:21:31,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=7.978125 2024-09-13 17:21:32,608 INFO [train.py:1198] (1/2) Epoch 1, batch 450, loss[loss=1.202, ctc_loss=1.155, cr_loss=0.2313, over 20952.00 frames. ], tot_loss[loss=1.202, ctc_loss=1.169, cr_loss=0.1642, over 3683432.15 frames. ], batch size: 58, lr: 3.80e-02, grad_scale: 4.0 2024-09-13 17:21:56,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1303.3333333333333, ans=0.070675 2024-09-13 17:21:58,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=26.65 vs. limit=7.98875 2024-09-13 17:22:08,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1331.6666666666667, ans=0.3335416666666666 2024-09-13 17:22:28,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=8.01 2024-09-13 17:22:49,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=21.33 vs. limit=8.020625 2024-09-13 17:22:58,370 INFO [train.py:1198] (1/2) Epoch 1, batch 500, loss[loss=1.218, ctc_loss=1.167, cr_loss=0.2558, over 20266.00 frames. ], tot_loss[loss=1.195, ctc_loss=1.159, cr_loss=0.1813, over 3749653.93 frames. ], batch size: 74, lr: 4.00e-02, grad_scale: 8.0 2024-09-13 17:23:03,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 3.291e+02 4.274e+02 6.351e+02 1.371e+03, threshold=8.549e+02, percent-clipped=4.0 2024-09-13 17:23:09,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=4.566666666666666 2024-09-13 17:23:11,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.99 vs. limit=8.5625 2024-09-13 17:23:12,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1416.6666666666667, ans=0.43359375 2024-09-13 17:23:21,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=40.79 vs. limit=8.041875 2024-09-13 17:23:25,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1445.0, ans=0.432265625 2024-09-13 17:23:34,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1473.3333333333333, ans=0.06684999999999999 2024-09-13 17:23:34,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=22.27 vs. limit=8.0525 2024-09-13 17:23:47,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=1501.6666666666667, ans=0.16553125000000002 2024-09-13 17:23:51,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.91 vs. limit=8.063125 2024-09-13 17:24:01,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1501.6666666666667, ans=0.429609375 2024-09-13 17:24:22,868 INFO [train.py:1198] (1/2) Epoch 1, batch 550, loss[loss=1.163, ctc_loss=1.11, cr_loss=0.2673, over 20021.00 frames. ], tot_loss[loss=1.186, ctc_loss=1.145, cr_loss=0.2014, over 3824535.92 frames. ], batch size: 80, lr: 3.99e-02, grad_scale: 4.0 2024-09-13 17:24:24,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1558.3333333333333, ans=0.06493750000000001 2024-09-13 17:24:32,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.58 vs. limit=5.779166666666667 2024-09-13 17:24:32,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=8.084375 2024-09-13 17:24:34,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.19 vs. limit=5.779166666666667 2024-09-13 17:24:45,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=24.76 vs. limit=8.095 2024-09-13 17:24:47,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=8.095 2024-09-13 17:24:48,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1586.6666666666667, ans=0.28413333333333335 2024-09-13 17:25:11,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=8.105625 2024-09-13 17:25:14,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=8.7325 2024-09-13 17:25:37,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1671.6666666666667, ans=0.28328333333333333 2024-09-13 17:25:46,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.57 vs. limit=5.417916666666667 2024-09-13 17:25:50,656 INFO [train.py:1198] (1/2) Epoch 1, batch 600, loss[loss=1.099, ctc_loss=1.027, cr_loss=0.357, over 20878.00 frames. ], tot_loss[loss=1.166, ctc_loss=1.122, cr_loss=0.2242, over 3893797.18 frames. ], batch size: 57, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:25:57,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 3.241e+02 4.356e+02 6.431e+02 1.093e+03, threshold=8.713e+02, percent-clipped=4.0 2024-09-13 17:26:23,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=8.817499999999999 2024-09-13 17:26:29,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=8.817499999999999 2024-09-13 17:26:37,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.35 vs. limit=5.878333333333334 2024-09-13 17:26:51,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1785.0, ans=0.28215 2024-09-13 17:27:06,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=5.453333333333333 2024-09-13 17:27:07,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1813.3333333333333, ans=0.415 2024-09-13 17:27:08,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1813.3333333333333, ans=0.415 2024-09-13 17:27:15,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1813.3333333333333, ans=0.8365333333333334 2024-09-13 17:27:18,495 INFO [train.py:1198] (1/2) Epoch 1, batch 650, loss[loss=1.033, ctc_loss=0.9619, cr_loss=0.3546, over 20769.00 frames. ], tot_loss[loss=1.139, ctc_loss=1.09, cr_loss=0.2467, over 3938279.10 frames. ], batch size: 56, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:27:23,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1841.6666666666667, ans=0.413671875 2024-09-13 17:27:46,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=8.9025 2024-09-13 17:27:54,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1898.3333333333333, ans=0.411015625 2024-09-13 17:27:54,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1898.3333333333333, ans=0.411015625 2024-09-13 17:28:03,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=8.92375 2024-09-13 17:28:09,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=8.2225 2024-09-13 17:28:27,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1955.0, ans=0.40835937499999997 2024-09-13 17:28:29,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1955.0, ans=0.8315750000000001 2024-09-13 17:28:40,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1983.3333333333333, ans=0.035 2024-09-13 17:28:42,297 INFO [train.py:1198] (1/2) Epoch 1, batch 700, loss[loss=0.8972, ctc_loss=0.8333, cr_loss=0.3193, over 20959.00 frames. ], tot_loss[loss=1.11, ctc_loss=1.056, cr_loss=0.2711, over 3973346.94 frames. ], batch size: 50, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:28:49,208 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.650e+02 3.567e+02 5.343e+02 1.045e+03, threshold=7.135e+02, percent-clipped=3.0 2024-09-13 17:29:12,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.556e-03 2024-09-13 17:29:19,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=4.816 2024-09-13 17:29:43,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2068.3333333333335, ans=0.27931666666666666 2024-09-13 17:29:49,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.89 vs. limit=6.048333333333333 2024-09-13 17:30:05,027 INFO [train.py:1198] (1/2) Epoch 1, batch 750, loss[loss=1.028, ctc_loss=0.9526, cr_loss=0.3758, over 20978.00 frames. ], tot_loss[loss=1.079, ctc_loss=1.021, cr_loss=0.2935, over 4004186.31 frames. ], batch size: 64, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:30:17,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2125.0, ans=5.53125 2024-09-13 17:30:57,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2210.0, ans=0.82265 2024-09-13 17:30:57,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=8.32875 2024-09-13 17:31:04,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=9.1575 2024-09-13 17:31:17,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2238.3333333333335, ans=0.08601041666666667 2024-09-13 17:31:17,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2238.3333333333335, ans=0.395078125 2024-09-13 17:31:24,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=9.17875 2024-09-13 17:31:28,484 INFO [train.py:1198] (1/2) Epoch 1, batch 800, loss[loss=0.8905, ctc_loss=0.8066, cr_loss=0.4197, over 20994.00 frames. ], tot_loss[loss=1.046, ctc_loss=0.9831, cr_loss=0.3133, over 4021001.33 frames. ], batch size: 55, lr: 3.99e-02, grad_scale: 16.0 2024-09-13 17:31:38,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.948e+02 4.044e+02 5.901e+02 1.230e+03, threshold=8.087e+02, percent-clipped=16.0 2024-09-13 17:31:42,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=5.566666666666666 2024-09-13 17:32:15,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2323.3333333333335, ans=0.8186833333333333 2024-09-13 17:32:29,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2351.6666666666665, ans=0.389765625 2024-09-13 17:32:31,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2351.6666666666665, ans=0.2764833333333333 2024-09-13 17:32:44,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2380.0, ans=0.2025 2024-09-13 17:32:55,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2408.3333333333335, ans=6.505208333333334 2024-09-13 17:32:57,274 INFO [train.py:1198] (1/2) Epoch 1, batch 850, loss[loss=0.8223, ctc_loss=0.749, cr_loss=0.3666, over 20776.00 frames. ], tot_loss[loss=1.012, ctc_loss=0.9465, cr_loss=0.3276, over 4027889.89 frames. ], batch size: 56, lr: 3.99e-02, grad_scale: 16.0 2024-09-13 17:33:29,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2465.0, ans=0.384453125 2024-09-13 17:33:41,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2465.0, ans=0.27535 2024-09-13 17:34:18,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2521.6666666666665, ans=0.381796875 2024-09-13 17:34:21,212 INFO [train.py:1198] (1/2) Epoch 1, batch 900, loss[loss=0.8739, ctc_loss=0.7891, cr_loss=0.424, over 20996.00 frames. ], tot_loss[loss=0.976, ctc_loss=0.9081, cr_loss=0.3395, over 4037549.00 frames. ], batch size: 61, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:34:26,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2550.0, ans=0.042031250000000006 2024-09-13 17:34:29,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 3.662e+02 5.387e+02 8.238e+02 1.960e+03, threshold=1.077e+03, percent-clipped=27.0 2024-09-13 17:34:46,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2578.3333333333335, ans=0.379140625 2024-09-13 17:35:17,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2635.0, ans=0.376484375 2024-09-13 17:35:22,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2635.0, ans=0.0407125 2024-09-13 17:35:43,702 INFO [train.py:1198] (1/2) Epoch 1, batch 950, loss[loss=0.8231, ctc_loss=0.748, cr_loss=0.3755, over 20670.00 frames. ], tot_loss[loss=0.9393, ctc_loss=0.8695, cr_loss=0.3492, over 4051042.74 frames. ], batch size: 71, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:36:29,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=8.530625 2024-09-13 17:36:40,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2776.6666666666665, ans=0.095875 2024-09-13 17:36:42,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2776.6666666666665, ans=0.36984375 2024-09-13 17:36:55,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2805.0, ans=0.368515625 2024-09-13 17:37:06,969 INFO [train.py:1198] (1/2) Epoch 1, batch 1000, loss[loss=0.7981, ctc_loss=0.7088, cr_loss=0.4467, over 20938.00 frames. ], tot_loss[loss=0.9037, ctc_loss=0.832, cr_loss=0.3587, over 4059401.92 frames. ], batch size: 60, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:37:15,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 4.793e+02 6.874e+02 9.512e+02 2.241e+03, threshold=1.375e+03, percent-clipped=19.0 2024-09-13 17:37:40,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2890.0, ans=0.034975000000000006 2024-09-13 17:38:10,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2918.3333333333335, ans=0.03433749999999999 2024-09-13 17:38:25,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2946.6666666666665, ans=0.361875 2024-09-13 17:38:31,792 INFO [train.py:1198] (1/2) Epoch 1, batch 1050, loss[loss=0.6673, ctc_loss=0.5952, cr_loss=0.3607, over 20978.00 frames. ], tot_loss[loss=0.8731, ctc_loss=0.7996, cr_loss=0.3679, over 4048257.78 frames. ], batch size: 50, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:38:42,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2975.0, ans=0.08140625000000001 2024-09-13 17:39:05,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3003.3333333333335, ans=0.35921875000000003 2024-09-13 17:39:12,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=8.636875 2024-09-13 17:39:17,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=9.77375 2024-09-13 17:39:22,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=9.77375 2024-09-13 17:39:48,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3088.3333333333335, ans=0.355234375 2024-09-13 17:39:51,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3088.3333333333335, ans=0.355234375 2024-09-13 17:39:53,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=9.81625 2024-09-13 17:39:57,654 INFO [train.py:1198] (1/2) Epoch 1, batch 1100, loss[loss=0.7027, ctc_loss=0.6155, cr_loss=0.436, over 21063.00 frames. ], tot_loss[loss=0.8434, ctc_loss=0.7675, cr_loss=0.3791, over 4050413.02 frames. ], batch size: 53, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:40:05,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.321e+02 3.634e+02 5.107e+02 7.438e+02 1.793e+03, threshold=1.021e+03, percent-clipped=5.0 2024-09-13 17:40:09,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=3116.6666666666665, ans=0.07468749999999999 2024-09-13 17:40:51,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3201.6666666666665, ans=0.349921875 2024-09-13 17:40:54,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=5.280666666666667 2024-09-13 17:41:01,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3230.0, ans=0.078875 2024-09-13 17:41:18,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3258.3333333333335, ans=0.07 2024-09-13 17:41:19,644 INFO [train.py:1198] (1/2) Epoch 1, batch 1150, loss[loss=0.6525, ctc_loss=0.5781, cr_loss=0.3722, over 20976.00 frames. ], tot_loss[loss=0.8157, ctc_loss=0.738, cr_loss=0.3887, over 4039194.25 frames. ], batch size: 51, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:41:22,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=8.721875 2024-09-13 17:41:39,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3286.6666666666665, ans=0.03972916666666667 2024-09-13 17:41:50,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3315.0, ans=0.344609375 2024-09-13 17:41:50,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3315.0, ans=0.344609375 2024-09-13 17:41:58,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3315.0, ans=0.344609375 2024-09-13 17:42:04,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=9.98625 2024-09-13 17:42:41,270 INFO [train.py:1198] (1/2) Epoch 1, batch 1200, loss[loss=0.6301, ctc_loss=0.5517, cr_loss=0.3917, over 20884.00 frames. ], tot_loss[loss=0.7909, ctc_loss=0.711, cr_loss=0.3996, over 4042942.04 frames. ], batch size: 54, lr: 3.97e-02, grad_scale: 16.0 2024-09-13 17:42:47,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3400.0, ans=0.340625 2024-09-13 17:42:50,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.657e+02 3.801e+02 5.290e+02 7.416e+02 2.091e+03, threshold=1.058e+03, percent-clipped=7.0 2024-09-13 17:43:02,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=5.857083333333334 2024-09-13 17:43:20,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3456.6666666666665, ans=0.33796875000000004 2024-09-13 17:43:29,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485.0, ans=0.26515 2024-09-13 17:43:29,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=10.11375 2024-09-13 17:43:59,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=8.817499999999999 2024-09-13 17:44:04,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3541.6666666666665, ans=0.333984375 2024-09-13 17:44:06,190 INFO [train.py:1198] (1/2) Epoch 1, batch 1250, loss[loss=0.6875, ctc_loss=0.5939, cr_loss=0.4678, over 20920.00 frames. ], tot_loss[loss=0.7636, ctc_loss=0.6816, cr_loss=0.4097, over 4058642.22 frames. ], batch size: 60, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:44:08,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=8.828125 2024-09-13 17:44:21,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=8.838750000000001 2024-09-13 17:45:06,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3626.6666666666665, ans=0.32999999999999996 2024-09-13 17:45:25,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=10.24125 2024-09-13 17:45:30,177 INFO [train.py:1198] (1/2) Epoch 1, batch 1300, loss[loss=0.7007, ctc_loss=0.6122, cr_loss=0.4427, over 20073.00 frames. ], tot_loss[loss=0.7381, ctc_loss=0.6549, cr_loss=0.4161, over 4072917.70 frames. ], batch size: 80, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:45:39,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.656e+02 4.890e+02 6.590e+02 1.194e+03, threshold=9.780e+02, percent-clipped=2.0 2024-09-13 17:46:03,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740.0, ans=0.2626 2024-09-13 17:46:11,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3740.0, ans=0.3246875 2024-09-13 17:46:29,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=10.32625 2024-09-13 17:46:47,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=3796.6666666666665, ans=0.03643750000000001 2024-09-13 17:46:47,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3796.6666666666665, ans=0.7671166666666667 2024-09-13 17:46:52,041 INFO [train.py:1198] (1/2) Epoch 1, batch 1350, loss[loss=0.6059, ctc_loss=0.5249, cr_loss=0.4052, over 20890.00 frames. ], tot_loss[loss=0.7099, ctc_loss=0.6268, cr_loss=0.4156, over 4081453.28 frames. ], batch size: 57, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:46:57,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3825.0, ans=0.013937499999999992 2024-09-13 17:47:18,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3853.3333333333335, ans=0.31937499999999996 2024-09-13 17:47:26,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=8.955625 2024-09-13 17:47:29,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3881.6666666666665, ans=0.7641416666666667 2024-09-13 17:47:53,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3910.0, ans=0.76315 2024-09-13 17:48:05,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3938.3333333333335, ans=0.052312499999999984 2024-09-13 17:48:07,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=8.976875 2024-09-13 17:48:12,901 INFO [train.py:1198] (1/2) Epoch 1, batch 1400, loss[loss=0.5531, ctc_loss=0.4768, cr_loss=0.3813, over 20887.00 frames. ], tot_loss[loss=0.6886, ctc_loss=0.6053, cr_loss=0.4165, over 4093240.47 frames. ], batch size: 54, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:48:14,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3966.6666666666665, ans=0.3140625 2024-09-13 17:48:22,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.578e+02 3.553e+02 4.759e+02 7.377e+02 1.558e+03, threshold=9.517e+02, percent-clipped=7.0 2024-09-13 17:48:24,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 17:48:34,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3995.0, ans=0.312734375 2024-09-13 17:48:41,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3995.0, ans=0.037515625000000004 2024-09-13 17:49:02,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=6.0129166666666665 2024-09-13 17:49:09,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.46 vs. limit=3.6077500000000002 2024-09-13 17:49:33,870 INFO [train.py:1198] (1/2) Epoch 1, batch 1450, loss[loss=0.688, ctc_loss=0.6015, cr_loss=0.4326, over 14057.00 frames. ], tot_loss[loss=0.6679, ctc_loss=0.5844, cr_loss=0.4175, over 4088507.11 frames. ], batch size: 149, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:49:49,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=10.602500000000001 2024-09-13 17:50:53,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4221.666666666667, ans=0.2577833333333333 2024-09-13 17:51:00,782 INFO [train.py:1198] (1/2) Epoch 1, batch 1500, loss[loss=0.594, ctc_loss=0.5071, cr_loss=0.4346, over 20667.00 frames. ], tot_loss[loss=0.6496, ctc_loss=0.5659, cr_loss=0.4181, over 4085103.51 frames. ], batch size: 66, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:51:01,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4250.0, ans=0.30078125 2024-09-13 17:51:10,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=10.6875 2024-09-13 17:51:10,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.402e+02 3.618e+02 4.755e+02 6.931e+02 1.256e+03, threshold=9.511e+02, percent-clipped=8.0 2024-09-13 17:51:14,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4250.0, ans=0.2075 2024-09-13 17:51:14,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4250.0, ans=6.0625 2024-09-13 17:51:30,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4278.333333333333, ans=0.299453125 2024-09-13 17:51:33,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4306.666666666667, ans=0.298125 2024-09-13 17:52:18,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4363.333333333333, ans=0.7472833333333334 2024-09-13 17:52:19,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=3.6545 2024-09-13 17:52:21,672 INFO [train.py:1198] (1/2) Epoch 1, batch 1550, loss[loss=0.6166, ctc_loss=0.5312, cr_loss=0.4272, over 18222.00 frames. ], tot_loss[loss=0.6334, ctc_loss=0.5496, cr_loss=0.4191, over 4084762.88 frames. ], batch size: 108, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:52:26,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4391.666666666667, ans=0.25608333333333333 2024-09-13 17:52:28,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=9.146875 2024-09-13 17:52:31,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4391.666666666667, ans=0.20608333333333334 2024-09-13 17:53:11,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=9.17875 2024-09-13 17:53:32,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=9.189375 2024-09-13 17:53:40,766 INFO [train.py:1198] (1/2) Epoch 1, batch 1600, loss[loss=0.5303, ctc_loss=0.4472, cr_loss=0.4155, over 20937.00 frames. ], tot_loss[loss=0.6185, ctc_loss=0.5342, cr_loss=0.4212, over 4084066.95 frames. ], batch size: 60, lr: 3.96e-02, grad_scale: 16.0 2024-09-13 17:53:50,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.359e+02 4.265e+02 5.483e+02 1.620e+03, threshold=8.530e+02, percent-clipped=7.0 2024-09-13 17:53:54,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4533.333333333333, ans=7.833333333333333 2024-09-13 17:53:57,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4561.666666666667, ans=0.009877898550724637 2024-09-13 17:54:02,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=4561.666666666667, ans=0.025 2024-09-13 17:54:03,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4561.666666666667, ans=0.7956166666666666 2024-09-13 17:54:06,901 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 17:54:14,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4590.0, ans=0.009871739130434782 2024-09-13 17:54:32,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4618.333333333333, ans=6.154583333333333 2024-09-13 17:55:00,290 INFO [train.py:1198] (1/2) Epoch 1, batch 1650, loss[loss=0.5796, ctc_loss=0.4974, cr_loss=0.4113, over 21021.00 frames. ], tot_loss[loss=0.6043, ctc_loss=0.5197, cr_loss=0.4227, over 4090581.79 frames. ], batch size: 63, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:55:15,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4703.333333333333, ans=0.009847101449275362 2024-09-13 17:55:39,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4731.666666666667, ans=0.2526833333333333 2024-09-13 17:55:56,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4760.0, ans=0.04683333333333334 2024-09-13 17:56:15,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4788.333333333333, ans=0.009828623188405798 2024-09-13 17:56:15,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4788.333333333333, ans=0.27554687499999997 2024-09-13 17:56:20,006 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 17:56:22,795 INFO [train.py:1198] (1/2) Epoch 1, batch 1700, loss[loss=0.5929, ctc_loss=0.5107, cr_loss=0.4113, over 20949.00 frames. ], tot_loss[loss=0.5927, ctc_loss=0.5078, cr_loss=0.4246, over 4098817.25 frames. ], batch size: 60, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:56:32,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.320e+02 4.432e+02 5.892e+02 9.131e+02, threshold=8.864e+02, percent-clipped=5.0 2024-09-13 17:57:20,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4901.666666666667, ans=0.270234375 2024-09-13 17:57:30,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=9.348749999999999 2024-09-13 17:57:45,911 INFO [train.py:1198] (1/2) Epoch 1, batch 1750, loss[loss=0.6051, ctc_loss=0.5144, cr_loss=0.4535, over 20766.00 frames. ], tot_loss[loss=0.5814, ctc_loss=0.4962, cr_loss=0.426, over 4111023.85 frames. ], batch size: 71, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:57:46,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=9.359375 2024-09-13 17:57:47,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4958.333333333333, ans=0.7264583333333334 2024-09-13 17:58:52,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5071.666666666667, ans=0.262265625 2024-09-13 17:59:05,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5100.0, ans=0.7215 2024-09-13 17:59:06,789 INFO [train.py:1198] (1/2) Epoch 1, batch 1800, loss[loss=0.51, ctc_loss=0.4209, cr_loss=0.4457, over 21027.00 frames. ], tot_loss[loss=0.5713, ctc_loss=0.4858, cr_loss=0.4274, over 4105943.81 frames. ], batch size: 52, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 17:59:15,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.324e+02 4.368e+02 6.109e+02 1.538e+03, threshold=8.735e+02, percent-clipped=7.0 2024-09-13 17:59:45,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5156.666666666667, ans=0.25828125 2024-09-13 17:59:58,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=6.074 2024-09-13 18:00:09,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=9.455 2024-09-13 18:00:26,410 INFO [train.py:1198] (1/2) Epoch 1, batch 1850, loss[loss=0.6598, ctc_loss=0.571, cr_loss=0.444, over 14667.00 frames. ], tot_loss[loss=0.5627, ctc_loss=0.477, cr_loss=0.4283, over 4102439.70 frames. ], batch size: 150, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:00:29,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5241.666666666667, ans=0.254296875 2024-09-13 18:01:45,268 INFO [train.py:1198] (1/2) Epoch 1, batch 1900, loss[loss=0.5453, ctc_loss=0.4539, cr_loss=0.4567, over 21072.00 frames. ], tot_loss[loss=0.5527, ctc_loss=0.467, cr_loss=0.4282, over 4097542.78 frames. ], batch size: 56, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:01:51,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5383.333333333333, ans=0.24765625000000002 2024-09-13 18:01:58,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.212e+02 4.078e+02 5.502e+02 9.178e+02, threshold=8.156e+02, percent-clipped=1.0 2024-09-13 18:02:09,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5411.666666666667, ans=0.246328125 2024-09-13 18:02:11,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5411.666666666667, ans=0.246328125 2024-09-13 18:02:12,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5411.666666666667, ans=0.246328125 2024-09-13 18:02:22,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5440.0, ans=0.044000000000000004 2024-09-13 18:02:27,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.86 vs. limit=7.720000000000001 2024-09-13 18:02:33,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=5440.0, ans=0.025 2024-09-13 18:02:53,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5496.666666666667, ans=0.24503333333333333 2024-09-13 18:03:10,643 INFO [train.py:1198] (1/2) Epoch 1, batch 1950, loss[loss=0.4679, ctc_loss=0.3862, cr_loss=0.4085, over 19955.00 frames. ], tot_loss[loss=0.5461, ctc_loss=0.4602, cr_loss=0.4295, over 4090490.80 frames. ], batch size: 44, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:03:39,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5553.333333333333, ans=0.04352777777777778 2024-09-13 18:03:59,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5610.0, ans=0.7036500000000001 2024-09-13 18:04:15,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5638.333333333333, ans=0.235703125 2024-09-13 18:04:30,032 INFO [train.py:1198] (1/2) Epoch 1, batch 2000, loss[loss=0.5059, ctc_loss=0.4257, cr_loss=0.4009, over 20882.00 frames. ], tot_loss[loss=0.5408, ctc_loss=0.4546, cr_loss=0.4308, over 4098869.54 frames. ], batch size: 54, lr: 3.93e-02, grad_scale: 32.0 2024-09-13 18:04:39,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 3.144e+02 3.828e+02 5.126e+02 9.280e+02, threshold=7.656e+02, percent-clipped=2.0 2024-09-13 18:05:25,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5751.666666666667, ans=0.23039062500000002 2024-09-13 18:05:34,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5780.0, ans=0.00961304347826087 2024-09-13 18:05:48,742 INFO [train.py:1198] (1/2) Epoch 1, batch 2050, loss[loss=0.4411, ctc_loss=0.3632, cr_loss=0.3896, over 21010.00 frames. ], tot_loss[loss=0.5344, ctc_loss=0.4479, cr_loss=0.4325, over 4110179.11 frames. ], batch size: 52, lr: 3.93e-02, grad_scale: 16.0 2024-09-13 18:05:58,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=9.678125 2024-09-13 18:06:23,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5865.0, ans=0.694725 2024-09-13 18:06:34,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5893.333333333333, ans=0.24106666666666665 2024-09-13 18:06:36,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5893.333333333333, ans=0.22375 2024-09-13 18:06:39,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5893.333333333333, ans=0.22375 2024-09-13 18:06:53,734 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:07:07,429 INFO [train.py:1198] (1/2) Epoch 1, batch 2100, loss[loss=0.5085, ctc_loss=0.4242, cr_loss=0.4217, over 21007.00 frames. ], tot_loss[loss=0.5281, ctc_loss=0.4414, cr_loss=0.4335, over 4122593.10 frames. ], batch size: 63, lr: 3.93e-02, grad_scale: 16.0 2024-09-13 18:07:14,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5950.0, ans=0.2405 2024-09-13 18:07:18,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.500e+02 4.627e+02 5.729e+02 1.082e+03, threshold=9.254e+02, percent-clipped=7.0 2024-09-13 18:07:57,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:08:16,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=9.77375 2024-09-13 18:08:29,536 INFO [train.py:1198] (1/2) Epoch 1, batch 2150, loss[loss=0.5712, ctc_loss=0.4733, cr_loss=0.4894, over 20663.00 frames. ], tot_loss[loss=0.5205, ctc_loss=0.434, cr_loss=0.4325, over 4112859.09 frames. ], batch size: 71, lr: 3.92e-02, grad_scale: 16.0 2024-09-13 18:09:08,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=6148.333333333333, ans=0.211796875 2024-09-13 18:09:25,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6176.666666666667, ans=0.23823333333333332 2024-09-13 18:09:32,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=9.81625 2024-09-13 18:09:37,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=9.826875 2024-09-13 18:09:43,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=9.826875 2024-09-13 18:09:50,805 INFO [train.py:1198] (1/2) Epoch 1, batch 2200, loss[loss=0.5266, ctc_loss=0.4406, cr_loss=0.43, over 20148.00 frames. ], tot_loss[loss=0.5181, ctc_loss=0.4313, cr_loss=0.4343, over 4102986.91 frames. ], batch size: 80, lr: 3.92e-02, grad_scale: 16.0 2024-09-13 18:09:59,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=6.493333333333333 2024-09-13 18:10:01,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 3.094e+02 3.918e+02 5.156e+02 9.762e+02, threshold=7.835e+02, percent-clipped=1.0 2024-09-13 18:10:02,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=6233.333333333333, ans=0.00951449275362319 2024-09-13 18:10:10,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=9.848125 2024-09-13 18:10:21,269 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:10:43,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=6318.333333333333, ans=0.20382812500000003 2024-09-13 18:10:43,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6318.333333333333, ans=0.20382812500000003 2024-09-13 18:10:52,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=6346.666666666667, ans=0.2025 2024-09-13 18:11:08,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=12.28125 2024-09-13 18:11:09,048 INFO [train.py:1198] (1/2) Epoch 1, batch 2250, loss[loss=0.4728, ctc_loss=0.394, cr_loss=0.3941, over 21046.00 frames. ], tot_loss[loss=0.5127, ctc_loss=0.4258, cr_loss=0.4345, over 4109852.80 frames. ], batch size: 56, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:11:40,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=6431.666666666667, ans=0.025 2024-09-13 18:11:51,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6431.666666666667, ans=0.23568333333333333 2024-09-13 18:12:00,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6460.0, ans=0.19718750000000002 2024-09-13 18:12:13,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=9.933125 2024-09-13 18:12:23,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=9.933125 2024-09-13 18:12:26,880 INFO [train.py:1198] (1/2) Epoch 1, batch 2300, loss[loss=0.5505, ctc_loss=0.456, cr_loss=0.4729, over 21007.00 frames. ], tot_loss[loss=0.5065, ctc_loss=0.4196, cr_loss=0.4343, over 4119750.95 frames. ], batch size: 61, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:12:29,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=9.94375 2024-09-13 18:12:37,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.017e+02 3.614e+02 5.021e+02 8.740e+02, threshold=7.228e+02, percent-clipped=1.0 2024-09-13 18:12:45,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6545.0, ans=0.19320312499999998 2024-09-13 18:13:47,104 INFO [train.py:1198] (1/2) Epoch 1, batch 2350, loss[loss=0.5149, ctc_loss=0.4166, cr_loss=0.4914, over 20642.00 frames. ], tot_loss[loss=0.5024, ctc_loss=0.4155, cr_loss=0.4346, over 4117631.82 frames. ], batch size: 68, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:13:57,047 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:14:18,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:14:37,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.02875 2024-09-13 18:14:38,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=6.697333333333333 2024-09-13 18:15:07,810 INFO [train.py:1198] (1/2) Epoch 1, batch 2400, loss[loss=0.4971, ctc_loss=0.4027, cr_loss=0.472, over 20840.00 frames. ], tot_loss[loss=0.4999, ctc_loss=0.4126, cr_loss=0.4367, over 4123428.06 frames. ], batch size: 65, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:15:18,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 3.072e+02 3.665e+02 4.911e+02 8.026e+02, threshold=7.330e+02, percent-clipped=1.0 2024-09-13 18:16:24,716 INFO [train.py:1198] (1/2) Epoch 1, batch 2450, loss[loss=0.5576, ctc_loss=0.4562, cr_loss=0.5068, over 19438.00 frames. ], tot_loss[loss=0.4989, ctc_loss=0.4113, cr_loss=0.4379, over 4119165.19 frames. ], batch size: 90, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:16:40,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6970.0, ans=0.2303 2024-09-13 18:17:10,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=8.513333333333334 2024-09-13 18:17:42,462 INFO [train.py:1198] (1/2) Epoch 1, batch 2500, loss[loss=0.4267, ctc_loss=0.3417, cr_loss=0.4251, over 20977.00 frames. ], tot_loss[loss=0.4966, ctc_loss=0.4087, cr_loss=0.4392, over 4110962.67 frames. ], batch size: 48, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:17:53,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.001e+02 3.886e+02 5.271e+02 1.281e+03, threshold=7.771e+02, percent-clipped=14.0 2024-09-13 18:17:53,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=7083.333333333333, ans=0.16796875 2024-09-13 18:18:01,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=7111.666666666667, ans=0.166640625 2024-09-13 18:18:04,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7111.666666666667, ans=0.22888333333333333 2024-09-13 18:18:21,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=7140.0, ans=0.30710000000000004 2024-09-13 18:18:35,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7168.333333333333, ans=0.22831666666666667 2024-09-13 18:18:58,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=7225.0, ans=0.161328125 2024-09-13 18:18:59,479 INFO [train.py:1198] (1/2) Epoch 1, batch 2550, loss[loss=0.5327, ctc_loss=0.4338, cr_loss=0.4945, over 20927.00 frames. ], tot_loss[loss=0.4941, ctc_loss=0.4061, cr_loss=0.4401, over 4095561.91 frames. ], batch size: 60, lr: 3.89e-02, grad_scale: 32.0 2024-09-13 18:19:04,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=7225.0, ans=0.161328125 2024-09-13 18:19:09,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=6.890000000000001 2024-09-13 18:19:17,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=4.088 2024-09-13 18:20:12,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=7338.333333333333, ans=0.156015625 2024-09-13 18:20:18,627 INFO [train.py:1198] (1/2) Epoch 1, batch 2600, loss[loss=0.4223, ctc_loss=0.3378, cr_loss=0.4224, over 20944.00 frames. ], tot_loss[loss=0.4909, ctc_loss=0.4028, cr_loss=0.4406, over 4103319.48 frames. ], batch size: 50, lr: 3.89e-02, grad_scale: 32.0 2024-09-13 18:20:29,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.919e+02 3.619e+02 4.799e+02 9.059e+02, threshold=7.237e+02, percent-clipped=3.0 2024-09-13 18:21:03,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=8.711666666666666 2024-09-13 18:21:15,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7451.666666666667, ans=0.2254833333333333 2024-09-13 18:21:16,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=7451.666666666667, ans=0.6391916666666666 2024-09-13 18:21:37,848 INFO [train.py:1198] (1/2) Epoch 1, batch 2650, loss[loss=0.517, ctc_loss=0.4302, cr_loss=0.4344, over 18511.00 frames. ], tot_loss[loss=0.4894, ctc_loss=0.4012, cr_loss=0.4408, over 4083570.00 frames. ], batch size: 108, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:21:50,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=7508.333333333333, ans=0.03538194444444445 2024-09-13 18:21:51,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=7536.666666666667, ans=0.009231159420289856 2024-09-13 18:21:52,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=10.32625 2024-09-13 18:21:54,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=7536.666666666667, ans=0.009231159420289856 2024-09-13 18:22:18,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.98 vs. limit=8.7825 2024-09-13 18:22:36,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=7593.333333333333, ans=0.04949747468305833 2024-09-13 18:22:54,214 INFO [train.py:1198] (1/2) Epoch 1, batch 2700, loss[loss=0.5472, ctc_loss=0.4425, cr_loss=0.5232, over 20947.00 frames. ], tot_loss[loss=0.4864, ctc_loss=0.3983, cr_loss=0.4402, over 4092214.56 frames. ], batch size: 67, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:23:04,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.346e+02 2.928e+02 3.699e+02 4.745e+02 9.041e+02, threshold=7.397e+02, percent-clipped=3.0 2024-09-13 18:23:06,837 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:23:12,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7678.333333333333, ans=0.22321666666666667 2024-09-13 18:23:34,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=7706.666666666667, ans=0.034555555555555555 2024-09-13 18:23:44,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=10.400625 2024-09-13 18:23:47,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=7735.0, ans=0.13742187500000003 2024-09-13 18:23:55,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7763.333333333333, ans=0.22236666666666666 2024-09-13 18:24:06,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=10.411249999999999 2024-09-13 18:24:10,415 INFO [train.py:1198] (1/2) Epoch 1, batch 2750, loss[loss=0.4248, ctc_loss=0.3428, cr_loss=0.41, over 20959.00 frames. ], tot_loss[loss=0.4825, ctc_loss=0.3946, cr_loss=0.4393, over 4100221.00 frames. ], batch size: 49, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:24:41,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=7848.333333333333, ans=0.6253083333333334 2024-09-13 18:24:42,711 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:24:44,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=4.17725 2024-09-13 18:25:09,846 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:25:29,556 INFO [train.py:1198] (1/2) Epoch 1, batch 2800, loss[loss=0.446, ctc_loss=0.3668, cr_loss=0.3959, over 20779.00 frames. ], tot_loss[loss=0.4786, ctc_loss=0.3909, cr_loss=0.4389, over 4099223.12 frames. ], batch size: 56, lr: 3.87e-02, grad_scale: 32.0 2024-09-13 18:25:40,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.868e+02 3.680e+02 5.040e+02 8.412e+02, threshold=7.361e+02, percent-clipped=5.0 2024-09-13 18:26:03,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=10.49625 2024-09-13 18:26:33,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=8046.666666666667, ans=0.8304666666666667 2024-09-13 18:26:47,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=10.528125 2024-09-13 18:26:48,210 INFO [train.py:1198] (1/2) Epoch 1, batch 2850, loss[loss=0.522, ctc_loss=0.4344, cr_loss=0.4381, over 20157.00 frames. ], tot_loss[loss=0.4763, ctc_loss=0.3884, cr_loss=0.4396, over 4100515.99 frames. ], batch size: 80, lr: 3.87e-02, grad_scale: 32.0 2024-09-13 18:26:54,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=8075.0, ans=0.125 2024-09-13 18:27:07,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=8103.333333333333, ans=0.03290277777777778 2024-09-13 18:28:04,310 INFO [train.py:1198] (1/2) Epoch 1, batch 2900, loss[loss=0.4727, ctc_loss=0.3818, cr_loss=0.4547, over 21067.00 frames. ], tot_loss[loss=0.4749, ctc_loss=0.3867, cr_loss=0.4407, over 4115681.41 frames. ], batch size: 59, lr: 3.86e-02, grad_scale: 32.0 2024-09-13 18:28:04,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8216.666666666666, ans=0.125 2024-09-13 18:28:09,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8216.666666666666, ans=0.125 2024-09-13 18:28:14,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.332e+02 3.038e+02 4.095e+02 5.397e+02 9.687e+02, threshold=8.190e+02, percent-clipped=4.0 2024-09-13 18:28:24,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8245.0, ans=0.21755 2024-09-13 18:28:25,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=8245.0, ans=0.025 2024-09-13 18:28:55,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=8301.666666666666, ans=0.125 2024-09-13 18:28:57,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=8301.666666666666, ans=0.125 2024-09-13 18:29:08,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=4.2495 2024-09-13 18:29:20,173 INFO [train.py:1198] (1/2) Epoch 1, batch 2950, loss[loss=0.4758, ctc_loss=0.3832, cr_loss=0.463, over 20837.00 frames. ], tot_loss[loss=0.472, ctc_loss=0.3839, cr_loss=0.4402, over 4115426.35 frames. ], batch size: 59, lr: 3.86e-02, grad_scale: 32.0 2024-09-13 18:29:25,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=8358.333333333334, ans=0.125 2024-09-13 18:29:31,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=8358.333333333334, ans=0.125 2024-09-13 18:29:52,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=8415.0, ans=0.025 2024-09-13 18:30:22,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=8471.666666666666, ans=0.03136805555555556 2024-09-13 18:30:31,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=8471.666666666666, ans=0.02 2024-09-13 18:30:35,546 INFO [train.py:1198] (1/2) Epoch 1, batch 3000, loss[loss=0.4592, ctc_loss=0.3665, cr_loss=0.4634, over 21001.00 frames. ], tot_loss[loss=0.4711, ctc_loss=0.383, cr_loss=0.4406, over 4103222.55 frames. ], batch size: 55, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:30:35,546 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 18:30:57,555 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=0.1613, ctc_loss=0.1613, cr_loss=9.026e-15, over 944034.00 frames. 2024-09-13 18:30:57,556 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 18:31:08,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.778e+02 3.342e+02 4.333e+02 9.090e+02, threshold=6.683e+02, percent-clipped=2.0 2024-09-13 18:31:19,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=10.698125000000001 2024-09-13 18:31:43,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=8585.0, ans=0.009003260869565217 2024-09-13 18:32:16,938 INFO [train.py:1198] (1/2) Epoch 1, batch 3050, loss[loss=0.4765, ctc_loss=0.3852, cr_loss=0.4565, over 20691.00 frames. ], tot_loss[loss=0.4684, ctc_loss=0.3803, cr_loss=0.4405, over 4114305.90 frames. ], batch size: 71, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:32:21,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8641.666666666666, ans=0.21358333333333335 2024-09-13 18:32:26,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=8641.666666666666, ans=0.030659722222222227 2024-09-13 18:32:35,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8670.0, ans=0.125 2024-09-13 18:33:08,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=8726.666666666666, ans=0.125 2024-09-13 18:33:19,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=7.502 2024-09-13 18:33:32,806 INFO [train.py:1198] (1/2) Epoch 1, batch 3100, loss[loss=0.4604, ctc_loss=0.3745, cr_loss=0.4293, over 20731.00 frames. ], tot_loss[loss=0.4667, ctc_loss=0.3786, cr_loss=0.4405, over 4119442.17 frames. ], batch size: 71, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:33:43,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 3.029e+02 3.778e+02 4.657e+02 9.135e+02, threshold=7.556e+02, percent-clipped=6.0 2024-09-13 18:34:05,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=8840.0, ans=0.029833333333333337 2024-09-13 18:34:12,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8840.0, ans=0.2116 2024-09-13 18:34:48,949 INFO [train.py:1198] (1/2) Epoch 1, batch 3150, loss[loss=0.4397, ctc_loss=0.3532, cr_loss=0.4329, over 21053.00 frames. ], tot_loss[loss=0.4643, ctc_loss=0.3763, cr_loss=0.4401, over 4116307.96 frames. ], batch size: 53, lr: 3.84e-02, grad_scale: 32.0 2024-09-13 18:35:10,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=8953.333333333334, ans=0.09899494936611666 2024-09-13 18:35:16,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=8953.333333333334, ans=0.16046666666666665 2024-09-13 18:35:16,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8953.333333333334, ans=0.125 2024-09-13 18:35:34,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9010.0, ans=0.2099 2024-09-13 18:35:37,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9010.0, ans=0.125 2024-09-13 18:35:40,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=9010.0, ans=0.125 2024-09-13 18:36:05,212 INFO [train.py:1198] (1/2) Epoch 1, batch 3200, loss[loss=0.4209, ctc_loss=0.3329, cr_loss=0.4402, over 20965.00 frames. ], tot_loss[loss=0.4607, ctc_loss=0.373, cr_loss=0.4387, over 4116393.62 frames. ], batch size: 51, lr: 3.84e-02, grad_scale: 32.0 2024-09-13 18:36:15,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.896e+02 3.667e+02 4.425e+02 9.257e+02, threshold=7.334e+02, percent-clipped=1.0 2024-09-13 18:36:28,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=9095.0, ans=0.125 2024-09-13 18:36:34,579 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:36:46,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=9123.333333333334, ans=0.125 2024-09-13 18:36:49,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=9123.333333333334, ans=0.125 2024-09-13 18:36:53,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=9151.666666666666, ans=0.028534722222222225 2024-09-13 18:37:12,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=9180.0, ans=0.02841666666666667 2024-09-13 18:37:21,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=9180.0, ans=0.125 2024-09-13 18:37:24,566 INFO [train.py:1198] (1/2) Epoch 1, batch 3250, loss[loss=0.4732, ctc_loss=0.3809, cr_loss=0.4616, over 20857.00 frames. ], tot_loss[loss=0.4585, ctc_loss=0.3708, cr_loss=0.4382, over 4112067.47 frames. ], batch size: 57, lr: 3.83e-02, grad_scale: 32.0 2024-09-13 18:38:18,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9293.333333333334, ans=0.125 2024-09-13 18:38:42,161 INFO [train.py:1198] (1/2) Epoch 1, batch 3300, loss[loss=0.4239, ctc_loss=0.3449, cr_loss=0.3948, over 20980.00 frames. ], tot_loss[loss=0.4578, ctc_loss=0.3698, cr_loss=0.4395, over 4108649.25 frames. ], batch size: 52, lr: 3.83e-02, grad_scale: 32.0 2024-09-13 18:38:52,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 2.738e+02 3.079e+02 3.871e+02 7.046e+02, threshold=6.159e+02, percent-clipped=0.0 2024-09-13 18:39:30,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=9435.0, ans=10.0 2024-09-13 18:39:43,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=7.365833333333334 2024-09-13 18:39:53,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9463.333333333334, ans=0.20536666666666664 2024-09-13 18:39:57,825 INFO [train.py:1198] (1/2) Epoch 1, batch 3350, loss[loss=0.4246, ctc_loss=0.3389, cr_loss=0.4287, over 20869.00 frames. ], tot_loss[loss=0.4548, ctc_loss=0.3671, cr_loss=0.439, over 4119991.69 frames. ], batch size: 57, lr: 3.82e-02, grad_scale: 32.0 2024-09-13 18:39:59,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=9491.666666666666, ans=0.5677916666666667 2024-09-13 18:40:00,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=11.059375 2024-09-13 18:40:21,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9520.0, ans=0.20479999999999998 2024-09-13 18:40:35,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=11.080625 2024-09-13 18:40:37,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=9548.333333333334, ans=0.026881944444444444 2024-09-13 18:40:43,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=9576.666666666666, ans=0.125 2024-09-13 18:40:54,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=9576.666666666666, ans=0.125 2024-09-13 18:41:13,481 INFO [train.py:1198] (1/2) Epoch 1, batch 3400, loss[loss=0.4441, ctc_loss=0.3529, cr_loss=0.4558, over 20995.00 frames. ], tot_loss[loss=0.4536, ctc_loss=0.3658, cr_loss=0.439, over 4127952.23 frames. ], batch size: 63, lr: 3.82e-02, grad_scale: 32.0 2024-09-13 18:41:23,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.247e+02 2.740e+02 3.694e+02 4.563e+02 9.142e+02, threshold=7.388e+02, percent-clipped=10.0 2024-09-13 18:42:31,710 INFO [train.py:1198] (1/2) Epoch 1, batch 3450, loss[loss=0.3956, ctc_loss=0.3169, cr_loss=0.3938, over 20972.00 frames. ], tot_loss[loss=0.4534, ctc_loss=0.3656, cr_loss=0.4393, over 4107242.83 frames. ], batch size: 52, lr: 3.81e-02, grad_scale: 32.0 2024-09-13 18:42:54,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=9803.333333333334, ans=0.5568833333333334 2024-09-13 18:43:16,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=9860.0, ans=0.025583333333333336 2024-09-13 18:43:16,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=11.1975 2024-09-13 18:43:17,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=9860.0, ans=0.025583333333333336 2024-09-13 18:43:18,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=7.465 2024-09-13 18:43:50,185 INFO [train.py:1198] (1/2) Epoch 1, batch 3500, loss[loss=0.4443, ctc_loss=0.358, cr_loss=0.4317, over 21063.00 frames. ], tot_loss[loss=0.4508, ctc_loss=0.3631, cr_loss=0.4386, over 4103344.17 frames. ], batch size: 56, lr: 3.81e-02, grad_scale: 32.0 2024-09-13 18:44:00,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 2.941e+02 3.448e+02 4.907e+02 9.728e+02, threshold=6.896e+02, percent-clipped=8.0 2024-09-13 18:44:04,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=11.229375000000001 2024-09-13 18:44:46,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=10001.666666666666, ans=0.02499305555555556 2024-09-13 18:45:05,987 INFO [train.py:1198] (1/2) Epoch 1, batch 3550, loss[loss=0.4752, ctc_loss=0.3861, cr_loss=0.4453, over 20833.00 frames. ], tot_loss[loss=0.4501, ctc_loss=0.3624, cr_loss=0.4389, over 4096710.43 frames. ], batch size: 59, lr: 3.80e-02, grad_scale: 32.0 2024-09-13 18:45:42,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10115.0, ans=0.19885 2024-09-13 18:46:10,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=10171.666666666666, ans=0.125 2024-09-13 18:46:13,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=8.068666666666665 2024-09-13 18:46:22,260 INFO [train.py:1198] (1/2) Epoch 1, batch 3600, loss[loss=0.4413, ctc_loss=0.3514, cr_loss=0.4494, over 20882.00 frames. ], tot_loss[loss=0.4502, ctc_loss=0.3623, cr_loss=0.4391, over 4088884.52 frames. ], batch size: 57, lr: 3.80e-02, grad_scale: 32.0 2024-09-13 18:46:32,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 2.835e+02 3.634e+02 4.353e+02 8.135e+02, threshold=7.267e+02, percent-clipped=1.0 2024-09-13 18:46:36,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10228.333333333334, ans=0.125 2024-09-13 18:46:48,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=10228.333333333334, ans=0.008646014492753623 2024-09-13 18:47:17,337 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:47:24,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=10313.333333333334, ans=0.008627536231884059 2024-09-13 18:47:38,066 INFO [train.py:1198] (1/2) Epoch 1, batch 3650, loss[loss=0.461, ctc_loss=0.3721, cr_loss=0.4447, over 20630.00 frames. ], tot_loss[loss=0.4508, ctc_loss=0.3628, cr_loss=0.4403, over 4081761.14 frames. ], batch size: 71, lr: 3.79e-02, grad_scale: 32.0 2024-09-13 18:47:45,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=10341.666666666666, ans=0.125 2024-09-13 18:47:59,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10370.0, ans=0.125 2024-09-13 18:48:56,668 INFO [train.py:1198] (1/2) Epoch 1, batch 3700, loss[loss=0.455, ctc_loss=0.3604, cr_loss=0.4726, over 20841.00 frames. ], tot_loss[loss=0.4499, ctc_loss=0.3617, cr_loss=0.4412, over 4080122.73 frames. ], batch size: 59, lr: 3.79e-02, grad_scale: 32.0 2024-09-13 18:49:09,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=10483.333333333334, ans=0.125 2024-09-13 18:49:10,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.323e+02 2.808e+02 3.446e+02 4.369e+02 7.538e+02, threshold=6.891e+02, percent-clipped=1.0 2024-09-13 18:49:14,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=4.5767500000000005 2024-09-13 18:49:38,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=10540.0, ans=0.125 2024-09-13 18:49:58,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.42625 2024-09-13 18:50:15,998 INFO [train.py:1198] (1/2) Epoch 1, batch 3750, loss[loss=0.4164, ctc_loss=0.3278, cr_loss=0.4432, over 20997.00 frames. ], tot_loss[loss=0.4454, ctc_loss=0.3576, cr_loss=0.4391, over 4088557.33 frames. ], batch size: 52, lr: 3.78e-02, grad_scale: 32.0 2024-09-13 18:50:22,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=10625.0, ans=0.125 2024-09-13 18:50:42,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=10653.333333333334, ans=0.022277777777777775 2024-09-13 18:50:44,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=10653.333333333334, ans=0.125 2024-09-13 18:50:46,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=8.272666666666666 2024-09-13 18:51:00,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=10710.0, ans=0.022041666666666668 2024-09-13 18:51:00,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=10710.0, ans=0.52515 2024-09-13 18:51:08,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=10710.0, ans=0.022041666666666668 2024-09-13 18:51:08,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=10710.0, ans=0.09899494936611666 2024-09-13 18:51:14,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=10710.0, ans=0.36065 2024-09-13 18:51:32,145 INFO [train.py:1198] (1/2) Epoch 1, batch 3800, loss[loss=0.4651, ctc_loss=0.3673, cr_loss=0.489, over 20931.00 frames. ], tot_loss[loss=0.4449, ctc_loss=0.3569, cr_loss=0.4399, over 4092318.23 frames. ], batch size: 60, lr: 3.78e-02, grad_scale: 32.0 2024-09-13 18:51:35,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=10766.666666666666, ans=0.5231666666666668 2024-09-13 18:51:42,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.832e+02 3.344e+02 4.065e+02 7.707e+02, threshold=6.689e+02, percent-clipped=3.0 2024-09-13 18:52:47,784 INFO [train.py:1198] (1/2) Epoch 1, batch 3850, loss[loss=0.4022, ctc_loss=0.3271, cr_loss=0.3754, over 20969.00 frames. ], tot_loss[loss=0.4423, ctc_loss=0.3546, cr_loss=0.4386, over 4100986.91 frames. ], batch size: 51, lr: 3.77e-02, grad_scale: 32.0 2024-09-13 18:53:25,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10965.0, ans=0.19035 2024-09-13 18:53:53,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=11021.666666666666, ans=0.125 2024-09-13 18:54:06,677 INFO [train.py:1198] (1/2) Epoch 1, batch 3900, loss[loss=0.406, ctc_loss=0.3213, cr_loss=0.4238, over 21061.00 frames. ], tot_loss[loss=0.442, ctc_loss=0.3541, cr_loss=0.4394, over 4106862.47 frames. ], batch size: 56, lr: 3.77e-02, grad_scale: 32.0 2024-09-13 18:54:15,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=11050.0, ans=11.64375 2024-09-13 18:54:17,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 2.905e+02 3.463e+02 4.744e+02 8.766e+02, threshold=6.927e+02, percent-clipped=3.0 2024-09-13 18:54:31,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.80875 2024-09-13 18:54:40,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11106.666666666666, ans=0.125 2024-09-13 18:54:42,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=11.665 2024-09-13 18:54:55,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=11135.0, ans=0.125 2024-09-13 18:55:13,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=11163.333333333334, ans=0.5092833333333333 2024-09-13 18:55:15,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=11163.333333333334, ans=0.125 2024-09-13 18:55:19,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11163.333333333334, ans=0.18836666666666668 2024-09-13 18:55:24,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=11191.666666666666, ans=0.020034722222222228 2024-09-13 18:55:25,364 INFO [train.py:1198] (1/2) Epoch 1, batch 3950, loss[loss=0.4057, ctc_loss=0.3146, cr_loss=0.4558, over 20981.00 frames. ], tot_loss[loss=0.4409, ctc_loss=0.3531, cr_loss=0.4389, over 4093589.85 frames. ], batch size: 55, lr: 3.76e-02, grad_scale: 32.0 2024-09-13 18:55:44,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=7.805 2024-09-13 18:55:48,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=11220.0, ans=0.125 2024-09-13 18:56:35,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=11305.0, ans=0.019562500000000003 2024-09-13 18:56:39,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.79 vs. limit=10.6525 2024-09-13 18:56:42,533 INFO [train.py:1198] (1/2) Epoch 1, batch 4000, loss[loss=0.4303, ctc_loss=0.344, cr_loss=0.4318, over 20889.00 frames. ], tot_loss[loss=0.4394, ctc_loss=0.3516, cr_loss=0.4393, over 4105088.02 frames. ], batch size: 57, lr: 3.76e-02, grad_scale: 32.0 2024-09-13 18:56:53,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.621e+02 3.193e+02 4.122e+02 6.110e+02, threshold=6.387e+02, percent-clipped=0.0 2024-09-13 18:57:18,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=11390.0, ans=0.019208333333333338 2024-09-13 18:57:21,263 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:57:25,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=11390.0, ans=0.125 2024-09-13 18:57:58,257 INFO [train.py:1198] (1/2) Epoch 1, batch 4050, loss[loss=0.5122, ctc_loss=0.4302, cr_loss=0.4101, over 14295.00 frames. ], tot_loss[loss=0.4412, ctc_loss=0.353, cr_loss=0.4409, over 4084997.62 frames. ], batch size: 149, lr: 3.75e-02, grad_scale: 64.0 2024-09-13 18:58:01,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=11475.0, ans=0.125 2024-09-13 18:58:04,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=11475.0, ans=0.05 2024-09-13 18:58:23,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=10.751666666666667 2024-09-13 18:59:09,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11588.333333333334, ans=0.18411666666666665 2024-09-13 18:59:10,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=10.794166666666667 2024-09-13 18:59:12,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=11616.666666666666, ans=0.018263888888888892 2024-09-13 18:59:14,034 INFO [train.py:1198] (1/2) Epoch 1, batch 4100, loss[loss=0.411, ctc_loss=0.3208, cr_loss=0.4508, over 20992.00 frames. ], tot_loss[loss=0.4367, ctc_loss=0.3489, cr_loss=0.4387, over 4093250.15 frames. ], batch size: 51, lr: 3.75e-02, grad_scale: 64.0 2024-09-13 18:59:27,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 2.785e+02 3.445e+02 4.527e+02 6.726e+02, threshold=6.889e+02, percent-clipped=4.0 2024-09-13 18:59:35,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11645.0, ans=0.125 2024-09-13 18:59:53,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=11673.333333333334, ans=0.125 2024-09-13 18:59:57,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=11673.333333333334, ans=0.3751 2024-09-13 19:00:11,393 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:00:18,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.92 vs. limit=4.7595 2024-09-13 19:00:20,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11730.0, ans=0.125 2024-09-13 19:00:35,234 INFO [train.py:1198] (1/2) Epoch 1, batch 4150, loss[loss=0.4506, ctc_loss=0.3622, cr_loss=0.442, over 20677.00 frames. ], tot_loss[loss=0.4363, ctc_loss=0.3484, cr_loss=0.4393, over 4097225.18 frames. ], batch size: 68, lr: 3.74e-02, grad_scale: 64.0 2024-09-13 19:00:56,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=16.34 2024-09-13 19:01:04,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=11815.0, ans=0.07 2024-09-13 19:01:08,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=11815.0, ans=0.025 2024-09-13 19:01:19,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.13 vs. limit=16.3825 2024-09-13 19:01:37,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=11871.666666666666, ans=0.008288768115942029 2024-09-13 19:01:40,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11871.666666666666, ans=0.0 2024-09-13 19:01:50,540 INFO [train.py:1198] (1/2) Epoch 1, batch 4200, loss[loss=0.4071, ctc_loss=0.3195, cr_loss=0.438, over 21056.00 frames. ], tot_loss[loss=0.4362, ctc_loss=0.3483, cr_loss=0.4399, over 4098245.38 frames. ], batch size: 53, lr: 3.74e-02, grad_scale: 64.0 2024-09-13 19:02:01,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.906e+02 3.671e+02 5.051e+02 9.182e+02, threshold=7.342e+02, percent-clipped=6.0 2024-09-13 19:02:04,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11928.333333333334, ans=0.18071666666666666 2024-09-13 19:03:05,788 INFO [train.py:1198] (1/2) Epoch 1, batch 4250, loss[loss=0.4659, ctc_loss=0.3794, cr_loss=0.4325, over 20838.00 frames. ], tot_loss[loss=0.4343, ctc_loss=0.3465, cr_loss=0.4389, over 4099923.67 frames. ], batch size: 59, lr: 3.73e-02, grad_scale: 64.0 2024-09-13 19:03:47,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12098.333333333334, ans=0.17901666666666666 2024-09-13 19:04:21,253 INFO [train.py:1198] (1/2) Epoch 1, batch 4300, loss[loss=0.4697, ctc_loss=0.3831, cr_loss=0.4328, over 18405.00 frames. ], tot_loss[loss=0.4345, ctc_loss=0.3466, cr_loss=0.4394, over 4076465.75 frames. ], batch size: 108, lr: 3.73e-02, grad_scale: 32.0 2024-09-13 19:04:27,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=12183.333333333334, ans=0.025 2024-09-13 19:04:33,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.684e+02 3.342e+02 4.037e+02 7.738e+02, threshold=6.683e+02, percent-clipped=1.0 2024-09-13 19:04:35,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=12211.666666666666, ans=0.125 2024-09-13 19:05:05,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12268.333333333334, ans=0.17731666666666668 2024-09-13 19:05:24,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=12296.666666666666, ans=0.125 2024-09-13 19:05:39,980 INFO [train.py:1198] (1/2) Epoch 1, batch 4350, loss[loss=0.397, ctc_loss=0.313, cr_loss=0.4198, over 19922.00 frames. ], tot_loss[loss=0.4322, ctc_loss=0.3446, cr_loss=0.438, over 4080524.26 frames. ], batch size: 44, lr: 3.72e-02, grad_scale: 32.0 2024-09-13 19:06:03,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=4.853 2024-09-13 19:06:19,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12381.666666666666, ans=0.125 2024-09-13 19:06:57,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12466.666666666666, ans=0.125 2024-09-13 19:06:58,206 INFO [train.py:1198] (1/2) Epoch 1, batch 4400, loss[loss=0.4251, ctc_loss=0.3368, cr_loss=0.4416, over 20780.00 frames. ], tot_loss[loss=0.4299, ctc_loss=0.3424, cr_loss=0.4372, over 4080121.21 frames. ], batch size: 53, lr: 3.71e-02, grad_scale: 32.0 2024-09-13 19:07:10,281 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.621e+02 3.031e+02 4.010e+02 7.445e+02, threshold=6.062e+02, percent-clipped=2.0 2024-09-13 19:08:04,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=12.217500000000001 2024-09-13 19:08:08,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=12580.0, ans=0.008134782608695653 2024-09-13 19:08:14,420 INFO [train.py:1198] (1/2) Epoch 1, batch 4450, loss[loss=0.4312, ctc_loss=0.3403, cr_loss=0.4545, over 20840.00 frames. ], tot_loss[loss=0.4299, ctc_loss=0.3423, cr_loss=0.438, over 4084397.29 frames. ], batch size: 65, lr: 3.71e-02, grad_scale: 32.0 2024-09-13 19:08:21,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.60 vs. limit=11.304166666666667 2024-09-13 19:08:23,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=12608.333333333334, ans=0.0 2024-09-13 19:08:30,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12636.666666666666, ans=0.17363333333333333 2024-09-13 19:08:36,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12636.666666666666, ans=0.17363333333333333 2024-09-13 19:09:13,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=12721.666666666666, ans=0.8772166666666666 2024-09-13 19:09:29,970 INFO [train.py:1198] (1/2) Epoch 1, batch 4500, loss[loss=0.4704, ctc_loss=0.3809, cr_loss=0.4475, over 20865.00 frames. ], tot_loss[loss=0.4286, ctc_loss=0.3411, cr_loss=0.4372, over 4079553.41 frames. ], batch size: 65, lr: 3.70e-02, grad_scale: 32.0 2024-09-13 19:09:41,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.801e+02 3.289e+02 4.523e+02 8.110e+02, threshold=6.578e+02, percent-clipped=6.0 2024-09-13 19:09:49,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12778.333333333334, ans=0.125 2024-09-13 19:10:04,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=12806.666666666666, ans=0.125 2024-09-13 19:10:39,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12863.333333333334, ans=0.125 2024-09-13 19:10:45,623 INFO [train.py:1198] (1/2) Epoch 1, batch 4550, loss[loss=0.4364, ctc_loss=0.3431, cr_loss=0.4663, over 19997.00 frames. ], tot_loss[loss=0.4281, ctc_loss=0.3405, cr_loss=0.438, over 4084136.47 frames. ], batch size: 80, lr: 3.70e-02, grad_scale: 32.0 2024-09-13 19:11:09,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12920.0, ans=0.125 2024-09-13 19:11:26,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=4.94225 2024-09-13 19:11:47,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=13005.0, ans=0.008042391304347826 2024-09-13 19:11:47,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=17.25375 2024-09-13 19:11:50,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=13005.0, ans=0.008042391304347826 2024-09-13 19:11:57,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13005.0, ans=0.125 2024-09-13 19:11:58,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=12.376875 2024-09-13 19:12:06,691 INFO [train.py:1198] (1/2) Epoch 1, batch 4600, loss[loss=0.3505, ctc_loss=0.2683, cr_loss=0.4109, over 20958.00 frames. ], tot_loss[loss=0.4268, ctc_loss=0.3392, cr_loss=0.438, over 4090228.04 frames. ], batch size: 48, lr: 3.69e-02, grad_scale: 32.0 2024-09-13 19:12:18,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.656e+02 3.075e+02 3.759e+02 8.333e+02, threshold=6.151e+02, percent-clipped=3.0 2024-09-13 19:12:39,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=12.408750000000001 2024-09-13 19:12:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=13090.0, ans=0.012125000000000004 2024-09-13 19:12:48,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=13090.0, ans=0.012125000000000004 2024-09-13 19:13:08,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=17.36 2024-09-13 19:13:14,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=12.43 2024-09-13 19:13:22,999 INFO [train.py:1198] (1/2) Epoch 1, batch 4650, loss[loss=0.4175, ctc_loss=0.3286, cr_loss=0.4446, over 20982.00 frames. ], tot_loss[loss=0.4259, ctc_loss=0.3384, cr_loss=0.4378, over 4096408.92 frames. ], batch size: 55, lr: 3.69e-02, grad_scale: 32.0 2024-09-13 19:13:32,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13175.0, ans=0.16824999999999998 2024-09-13 19:13:50,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=13203.333333333334, ans=0.00799927536231884 2024-09-13 19:13:56,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.461875 2024-09-13 19:14:10,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=12.4725 2024-09-13 19:14:25,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13288.333333333334, ans=0.125 2024-09-13 19:14:39,320 INFO [train.py:1198] (1/2) Epoch 1, batch 4700, loss[loss=0.3984, ctc_loss=0.3209, cr_loss=0.3876, over 20770.00 frames. ], tot_loss[loss=0.4258, ctc_loss=0.3382, cr_loss=0.4377, over 4091211.22 frames. ], batch size: 56, lr: 3.68e-02, grad_scale: 32.0 2024-09-13 19:14:51,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 2.668e+02 3.528e+02 4.722e+02 7.719e+02, threshold=7.056e+02, percent-clipped=7.0 2024-09-13 19:14:59,225 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:15:05,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=12.504375 2024-09-13 19:15:12,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=13373.333333333334, ans=0.010944444444444444 2024-09-13 19:15:26,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=13401.666666666666, ans=0.4309416666666667 2024-09-13 19:15:54,739 INFO [train.py:1198] (1/2) Epoch 1, batch 4750, loss[loss=0.4409, ctc_loss=0.3485, cr_loss=0.4617, over 19960.00 frames. ], tot_loss[loss=0.4239, ctc_loss=0.3366, cr_loss=0.4366, over 4091445.59 frames. ], batch size: 80, lr: 3.68e-02, grad_scale: 32.0 2024-09-13 19:16:00,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.546875 2024-09-13 19:16:48,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13543.333333333334, ans=0.125 2024-09-13 19:17:08,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=13571.666666666666, ans=0.01011805555555556 2024-09-13 19:17:13,797 INFO [train.py:1198] (1/2) Epoch 1, batch 4800, loss[loss=0.4163, ctc_loss=0.3268, cr_loss=0.4477, over 20766.00 frames. ], tot_loss[loss=0.4243, ctc_loss=0.337, cr_loss=0.4367, over 4087258.09 frames. ], batch size: 56, lr: 3.67e-02, grad_scale: 32.0 2024-09-13 19:17:25,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.230e+02 2.828e+02 3.314e+02 4.204e+02 6.768e+02, threshold=6.629e+02, percent-clipped=0.0 2024-09-13 19:17:56,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13656.666666666666, ans=0.125 2024-09-13 19:18:32,419 INFO [train.py:1198] (1/2) Epoch 1, batch 4850, loss[loss=0.468, ctc_loss=0.3682, cr_loss=0.4991, over 21020.00 frames. ], tot_loss[loss=0.4233, ctc_loss=0.3359, cr_loss=0.4368, over 4091376.22 frames. ], batch size: 62, lr: 3.67e-02, grad_scale: 32.0 2024-09-13 19:18:33,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=17.80625 2024-09-13 19:18:35,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=13741.666666666666, ans=0.4190416666666667 2024-09-13 19:18:41,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.653125 2024-09-13 19:18:43,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=13741.666666666666, ans=0.009409722222222229 2024-09-13 19:18:53,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=13770.0, ans=0.00929166666666667 2024-09-13 19:19:16,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=12.684999999999999 2024-09-13 19:19:31,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=9.542 2024-09-13 19:19:47,747 INFO [train.py:1198] (1/2) Epoch 1, batch 4900, loss[loss=0.415, ctc_loss=0.3236, cr_loss=0.4569, over 20976.00 frames. ], tot_loss[loss=0.4226, ctc_loss=0.3352, cr_loss=0.4372, over 4091427.55 frames. ], batch size: 55, lr: 3.66e-02, grad_scale: 16.0 2024-09-13 19:20:01,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.268e+02 2.911e+02 3.326e+02 4.192e+02 7.786e+02, threshold=6.652e+02, percent-clipped=4.0 2024-09-13 19:20:16,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=13940.0, ans=0.0 2024-09-13 19:20:24,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=8.485 2024-09-13 19:20:33,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.738125 2024-09-13 19:20:38,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13968.333333333334, ans=0.16031666666666666 2024-09-13 19:21:02,051 INFO [train.py:1198] (1/2) Epoch 1, batch 4950, loss[loss=0.3902, ctc_loss=0.3076, cr_loss=0.4131, over 20787.00 frames. ], tot_loss[loss=0.4228, ctc_loss=0.3355, cr_loss=0.4369, over 4076304.18 frames. ], batch size: 56, lr: 3.65e-02, grad_scale: 16.0 2024-09-13 19:21:22,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=14053.333333333334, ans=0.04949747468305833 2024-09-13 19:21:24,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=14053.333333333334, ans=0.125 2024-09-13 19:21:52,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=12.79125 2024-09-13 19:22:15,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=12.801874999999999 2024-09-13 19:22:17,891 INFO [train.py:1198] (1/2) Epoch 1, batch 5000, loss[loss=0.4383, ctc_loss=0.3446, cr_loss=0.4684, over 21042.00 frames. ], tot_loss[loss=0.4218, ctc_loss=0.3345, cr_loss=0.4364, over 4078388.22 frames. ], batch size: 61, lr: 3.65e-02, grad_scale: 16.0 2024-09-13 19:22:31,795 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.674e+02 3.166e+02 4.204e+02 7.328e+02, threshold=6.333e+02, percent-clipped=1.0 2024-09-13 19:22:41,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14195.0, ans=0.15805000000000002 2024-09-13 19:23:15,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=14251.666666666666, ans=0.007771376811594203 2024-09-13 19:23:20,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=14280.0, ans=0.125 2024-09-13 19:23:29,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=14280.0, ans=0.007166666666666668 2024-09-13 19:23:33,086 INFO [train.py:1198] (1/2) Epoch 1, batch 5050, loss[loss=0.4568, ctc_loss=0.3628, cr_loss=0.4698, over 20653.00 frames. ], tot_loss[loss=0.4208, ctc_loss=0.3336, cr_loss=0.4362, over 4092727.84 frames. ], batch size: 71, lr: 3.64e-02, grad_scale: 16.0 2024-09-13 19:23:41,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=9.723333333333333 2024-09-13 19:23:54,298 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:24:32,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=14393.333333333334, ans=0.125 2024-09-13 19:24:35,829 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:24:50,237 INFO [train.py:1198] (1/2) Epoch 1, batch 5100, loss[loss=0.4638, ctc_loss=0.3722, cr_loss=0.4581, over 21048.00 frames. ], tot_loss[loss=0.4204, ctc_loss=0.333, cr_loss=0.4371, over 4098569.46 frames. ], batch size: 62, lr: 3.64e-02, grad_scale: 16.0 2024-09-13 19:24:59,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=14450.0, ans=0.39425 2024-09-13 19:25:03,776 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 2.731e+02 3.124e+02 4.100e+02 6.990e+02, threshold=6.248e+02, percent-clipped=3.0 2024-09-13 19:25:04,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14478.333333333334, ans=0.125 2024-09-13 19:25:14,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=18.35875 2024-09-13 19:25:20,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14506.666666666666, ans=0.125 2024-09-13 19:25:23,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=14506.666666666666, ans=0.05 2024-09-13 19:25:33,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=12.940000000000001 2024-09-13 19:25:50,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=14563.333333333334, ans=10.0 2024-09-13 19:25:50,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=14563.333333333334, ans=0.007703623188405798 2024-09-13 19:26:03,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14563.333333333334, ans=0.125 2024-09-13 19:26:05,719 INFO [train.py:1198] (1/2) Epoch 1, batch 5150, loss[loss=0.4192, ctc_loss=0.3307, cr_loss=0.4426, over 20248.00 frames. ], tot_loss[loss=0.4201, ctc_loss=0.3327, cr_loss=0.4373, over 4101895.53 frames. ], batch size: 74, lr: 3.63e-02, grad_scale: 16.0 2024-09-13 19:26:18,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14591.666666666666, ans=0.125 2024-09-13 19:26:30,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14620.0, ans=0.15380000000000002 2024-09-13 19:26:32,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=12.9825 2024-09-13 19:27:22,878 INFO [train.py:1198] (1/2) Epoch 1, batch 5200, loss[loss=0.4192, ctc_loss=0.3364, cr_loss=0.4143, over 20282.00 frames. ], tot_loss[loss=0.4181, ctc_loss=0.3308, cr_loss=0.4367, over 4103268.25 frames. ], batch size: 74, lr: 3.63e-02, grad_scale: 32.0 2024-09-13 19:27:36,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.534e+02 3.157e+02 3.626e+02 7.530e+02, threshold=6.315e+02, percent-clipped=3.0 2024-09-13 19:27:48,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=14761.666666666666, ans=0.0 2024-09-13 19:27:51,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=8.6975 2024-09-13 19:28:37,334 INFO [train.py:1198] (1/2) Epoch 1, batch 5250, loss[loss=0.4183, ctc_loss=0.3288, cr_loss=0.4476, over 20814.00 frames. ], tot_loss[loss=0.4171, ctc_loss=0.3299, cr_loss=0.4359, over 4093884.18 frames. ], batch size: 59, lr: 3.62e-02, grad_scale: 32.0 2024-09-13 19:28:54,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=14903.333333333334, ans=0.125 2024-09-13 19:29:28,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=13.11 2024-09-13 19:29:43,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=14988.333333333334, ans=0.004215277777777776 2024-09-13 19:29:44,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=5.2482500000000005 2024-09-13 19:29:51,361 INFO [train.py:1198] (1/2) Epoch 1, batch 5300, loss[loss=0.3892, ctc_loss=0.3024, cr_loss=0.4343, over 20786.00 frames. ], tot_loss[loss=0.4163, ctc_loss=0.3292, cr_loss=0.4353, over 4089134.40 frames. ], batch size: 56, lr: 3.61e-02, grad_scale: 16.0 2024-09-13 19:30:05,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 2.803e+02 3.346e+02 4.130e+02 6.068e+02, threshold=6.692e+02, percent-clipped=0.0 2024-09-13 19:30:22,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=15073.333333333334, ans=0.125 2024-09-13 19:30:31,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15073.333333333334, ans=0.125 2024-09-13 19:30:51,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=15130.0, ans=10.0 2024-09-13 19:31:05,734 INFO [train.py:1198] (1/2) Epoch 1, batch 5350, loss[loss=0.4103, ctc_loss=0.3192, cr_loss=0.4554, over 21076.00 frames. ], tot_loss[loss=0.4167, ctc_loss=0.3295, cr_loss=0.436, over 4085102.38 frames. ], batch size: 53, lr: 3.61e-02, grad_scale: 16.0 2024-09-13 19:31:17,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=15158.333333333334, ans=0.36945833333333333 2024-09-13 19:31:20,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=15186.666666666666, ans=0.025 2024-09-13 19:31:35,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=15215.0, ans=0.125 2024-09-13 19:31:47,873 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:32:19,508 INFO [train.py:1198] (1/2) Epoch 1, batch 5400, loss[loss=0.3817, ctc_loss=0.2977, cr_loss=0.4198, over 20898.00 frames. ], tot_loss[loss=0.4166, ctc_loss=0.3294, cr_loss=0.4362, over 4067561.11 frames. ], batch size: 54, lr: 3.60e-02, grad_scale: 16.0 2024-09-13 19:32:34,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.674e+02 3.317e+02 3.992e+02 6.991e+02, threshold=6.635e+02, percent-clipped=1.0 2024-09-13 19:32:34,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=18.99625 2024-09-13 19:32:45,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15328.333333333334, ans=0.125 2024-09-13 19:33:16,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=15385.0, ans=0.125 2024-09-13 19:33:32,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15441.666666666666, ans=0.125 2024-09-13 19:33:33,893 INFO [train.py:1198] (1/2) Epoch 1, batch 5450, loss[loss=0.4177, ctc_loss=0.3413, cr_loss=0.3816, over 21016.00 frames. ], tot_loss[loss=0.416, ctc_loss=0.3288, cr_loss=0.4361, over 4073245.19 frames. ], batch size: 61, lr: 3.60e-02, grad_scale: 16.0 2024-09-13 19:34:02,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=8.8675 2024-09-13 19:34:06,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=15498.333333333334, ans=0.3575583333333333 2024-09-13 19:34:20,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=15526.666666666666, ans=0.3565666666666667 2024-09-13 19:34:22,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=15526.666666666666, ans=0.125 2024-09-13 19:34:33,174 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:34:43,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15555.0, ans=0.14445000000000002 2024-09-13 19:34:47,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=7.111000000000001 2024-09-13 19:34:50,819 INFO [train.py:1198] (1/2) Epoch 1, batch 5500, loss[loss=0.372, ctc_loss=0.2919, cr_loss=0.4006, over 20949.00 frames. ], tot_loss[loss=0.4132, ctc_loss=0.3261, cr_loss=0.4356, over 4093285.16 frames. ], batch size: 50, lr: 3.59e-02, grad_scale: 16.0 2024-09-13 19:35:05,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.556e+02 3.075e+02 3.989e+02 8.223e+02, threshold=6.149e+02, percent-clipped=3.0 2024-09-13 19:35:57,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15696.666666666666, ans=0.125 2024-09-13 19:36:07,836 INFO [train.py:1198] (1/2) Epoch 1, batch 5550, loss[loss=0.4366, ctc_loss=0.3418, cr_loss=0.4741, over 20659.00 frames. ], tot_loss[loss=0.4111, ctc_loss=0.3239, cr_loss=0.4357, over 4103666.02 frames. ], batch size: 71, lr: 3.59e-02, grad_scale: 16.0 2024-09-13 19:36:09,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=15725.0, ans=0.05 2024-09-13 19:36:39,711 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:36:40,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=13.418125 2024-09-13 19:36:53,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=13.42875 2024-09-13 19:37:01,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=15810.0, ans=0.125 2024-09-13 19:37:22,224 INFO [train.py:1198] (1/2) Epoch 1, batch 5600, loss[loss=0.4383, ctc_loss=0.3441, cr_loss=0.4708, over 20757.00 frames. ], tot_loss[loss=0.4113, ctc_loss=0.3242, cr_loss=0.4354, over 4099226.99 frames. ], batch size: 71, lr: 3.58e-02, grad_scale: 32.0 2024-09-13 19:37:32,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15866.666666666666, ans=0.0 2024-09-13 19:37:36,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 2.772e+02 3.418e+02 4.529e+02 7.633e+02, threshold=6.837e+02, percent-clipped=8.0 2024-09-13 19:37:44,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=15895.0, ans=0.34367500000000006 2024-09-13 19:38:37,298 INFO [train.py:1198] (1/2) Epoch 1, batch 5650, loss[loss=0.3564, ctc_loss=0.2787, cr_loss=0.3882, over 20929.00 frames. ], tot_loss[loss=0.4099, ctc_loss=0.3231, cr_loss=0.434, over 4097373.36 frames. ], batch size: 49, lr: 3.57e-02, grad_scale: 32.0 2024-09-13 19:38:55,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16036.666666666666, ans=0.13963333333333333 2024-09-13 19:38:57,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=13.51375 2024-09-13 19:39:05,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16065.0, ans=0.125 2024-09-13 19:39:05,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=16065.0, ans=0.125 2024-09-13 19:39:24,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=16093.333333333334, ans=0.125 2024-09-13 19:39:40,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16121.666666666666, ans=0.13878333333333334 2024-09-13 19:39:40,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=13.545625000000001 2024-09-13 19:39:46,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=16121.666666666666, ans=0.125 2024-09-13 19:39:50,602 INFO [train.py:1198] (1/2) Epoch 1, batch 5700, loss[loss=0.4441, ctc_loss=0.3469, cr_loss=0.486, over 20938.00 frames. ], tot_loss[loss=0.4087, ctc_loss=0.322, cr_loss=0.4337, over 4098707.83 frames. ], batch size: 64, lr: 3.57e-02, grad_scale: 32.0 2024-09-13 19:40:05,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.575e+02 2.972e+02 3.924e+02 7.518e+02, threshold=5.944e+02, percent-clipped=1.0 2024-09-13 19:41:03,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16291.666666666666, ans=0.13708333333333333 2024-09-13 19:41:04,671 INFO [train.py:1198] (1/2) Epoch 1, batch 5750, loss[loss=0.4282, ctc_loss=0.3317, cr_loss=0.4822, over 20682.00 frames. ], tot_loss[loss=0.4088, ctc_loss=0.3221, cr_loss=0.4337, over 4091670.15 frames. ], batch size: 68, lr: 3.56e-02, grad_scale: 32.0 2024-09-13 19:41:27,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=16320.0, ans=0.125 2024-09-13 19:41:37,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16348.333333333334, ans=0.0 2024-09-13 19:41:48,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=16376.666666666666, ans=0.44565 2024-09-13 19:42:10,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16405.0, ans=0.13595000000000002 2024-09-13 19:42:19,127 INFO [train.py:1198] (1/2) Epoch 1, batch 5800, loss[loss=0.35, ctc_loss=0.2753, cr_loss=0.3733, over 20946.00 frames. ], tot_loss[loss=0.4058, ctc_loss=0.3193, cr_loss=0.4326, over 4101366.83 frames. ], batch size: 48, lr: 3.56e-02, grad_scale: 32.0 2024-09-13 19:42:26,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=16433.333333333332, ans=0.125 2024-09-13 19:42:30,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=16433.333333333332, ans=0.125 2024-09-13 19:42:33,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=16433.333333333332, ans=0.125 2024-09-13 19:42:35,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.659e+02 3.253e+02 3.888e+02 7.350e+02, threshold=6.505e+02, percent-clipped=4.0 2024-09-13 19:42:41,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=16461.666666666668, ans=0.0 2024-09-13 19:42:46,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=16461.666666666668, ans=0.04949747468305833 2024-09-13 19:43:17,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=16518.333333333332, ans=0.0 2024-09-13 19:43:20,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=16546.666666666668, ans=0.32086666666666674 2024-09-13 19:43:34,870 INFO [train.py:1198] (1/2) Epoch 1, batch 5850, loss[loss=0.4066, ctc_loss=0.317, cr_loss=0.4475, over 19905.00 frames. ], tot_loss[loss=0.4072, ctc_loss=0.3205, cr_loss=0.4334, over 4083337.58 frames. ], batch size: 80, lr: 3.55e-02, grad_scale: 16.0 2024-09-13 19:43:49,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=16603.333333333332, ans=0.0 2024-09-13 19:43:52,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=16603.333333333332, ans=0.3188833333333334 2024-09-13 19:44:19,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16631.666666666668, ans=0.125 2024-09-13 19:44:24,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=16660.0, ans=0.007247826086956522 2024-09-13 19:44:29,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16660.0, ans=0.125 2024-09-13 19:44:32,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=16660.0, ans=0.125 2024-09-13 19:44:39,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=16688.333333333332, ans=0.31590833333333346 2024-09-13 19:44:51,395 INFO [train.py:1198] (1/2) Epoch 1, batch 5900, loss[loss=0.4078, ctc_loss=0.3231, cr_loss=0.4238, over 20967.00 frames. ], tot_loss[loss=0.4073, ctc_loss=0.3205, cr_loss=0.4344, over 4094310.69 frames. ], batch size: 55, lr: 3.55e-02, grad_scale: 16.0 2024-09-13 19:45:06,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16745.0, ans=0.13255 2024-09-13 19:45:07,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.487e+02 2.847e+02 3.551e+02 7.104e+02, threshold=5.695e+02, percent-clipped=2.0 2024-09-13 19:45:32,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=16773.333333333332, ans=0.0 2024-09-13 19:45:33,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=16773.333333333332, ans=0.125 2024-09-13 19:45:34,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16801.666666666668, ans=0.1319833333333333 2024-09-13 19:45:51,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16830.0, ans=0.1317 2024-09-13 19:46:06,670 INFO [train.py:1198] (1/2) Epoch 1, batch 5950, loss[loss=0.3844, ctc_loss=0.2964, cr_loss=0.44, over 21049.00 frames. ], tot_loss[loss=0.4055, ctc_loss=0.3186, cr_loss=0.4342, over 4109049.47 frames. ], batch size: 56, lr: 3.54e-02, grad_scale: 16.0 2024-09-13 19:46:43,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16915.0, ans=0.125 2024-09-13 19:46:54,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=13.853749999999998 2024-09-13 19:47:20,165 INFO [train.py:1198] (1/2) Epoch 1, batch 6000, loss[loss=0.437, ctc_loss=0.3468, cr_loss=0.4507, over 20973.00 frames. ], tot_loss[loss=0.4061, ctc_loss=0.3192, cr_loss=0.4344, over 4090161.45 frames. ], batch size: 64, lr: 3.53e-02, grad_scale: 32.0 2024-09-13 19:47:20,165 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 19:47:28,078 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1257, 4.0325, 3.9688, 4.0667], device='cuda:1') 2024-09-13 19:47:37,004 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2868, 4.8956, 5.0089, 4.7983], device='cuda:1') 2024-09-13 19:47:38,757 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=0.1271, ctc_loss=0.1271, cr_loss=9.075e-15, over 944034.00 frames. 2024-09-13 19:47:38,758 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 19:47:45,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17000.0, ans=0.13 2024-09-13 19:47:55,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.209e+02 2.871e+02 3.340e+02 4.119e+02 8.577e+02, threshold=6.680e+02, percent-clipped=5.0 2024-09-13 19:48:26,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=13.906875 2024-09-13 19:48:53,444 INFO [train.py:1198] (1/2) Epoch 1, batch 6050, loss[loss=0.4305, ctc_loss=0.3359, cr_loss=0.4732, over 21033.00 frames. ], tot_loss[loss=0.4024, ctc_loss=0.316, cr_loss=0.4319, over 4097726.21 frames. ], batch size: 62, lr: 3.53e-02, grad_scale: 32.0 2024-09-13 19:48:55,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=17141.666666666668, ans=0.30004166666666676 2024-09-13 19:48:58,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=17141.666666666668, ans=0.125 2024-09-13 19:49:16,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17170.0, ans=0.125 2024-09-13 19:49:59,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=17255.0, ans=0.125 2024-09-13 19:50:08,048 INFO [train.py:1198] (1/2) Epoch 1, batch 6100, loss[loss=0.3814, ctc_loss=0.2938, cr_loss=0.4379, over 20964.00 frames. ], tot_loss[loss=0.4007, ctc_loss=0.3143, cr_loss=0.432, over 4110786.37 frames. ], batch size: 52, lr: 3.52e-02, grad_scale: 32.0 2024-09-13 19:50:25,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=13.991875 2024-09-13 19:50:25,916 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.620e+02 3.206e+02 3.814e+02 6.311e+02, threshold=6.411e+02, percent-clipped=1.0 2024-09-13 19:51:02,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=17368.333333333332, ans=0.125 2024-09-13 19:51:12,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=17396.666666666668, ans=0.0 2024-09-13 19:51:24,073 INFO [train.py:1198] (1/2) Epoch 1, batch 6150, loss[loss=0.4466, ctc_loss=0.3505, cr_loss=0.4805, over 19984.00 frames. ], tot_loss[loss=0.3998, ctc_loss=0.3137, cr_loss=0.4308, over 4095413.63 frames. ], batch size: 80, lr: 3.52e-02, grad_scale: 32.0 2024-09-13 19:51:40,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=17453.333333333332, ans=0.125 2024-09-13 19:51:56,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=17481.666666666668, ans=0.125 2024-09-13 19:52:17,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=14.06625 2024-09-13 19:52:39,305 INFO [train.py:1198] (1/2) Epoch 1, batch 6200, loss[loss=0.4579, ctc_loss=0.3724, cr_loss=0.4274, over 17822.00 frames. ], tot_loss[loss=0.401, ctc_loss=0.3148, cr_loss=0.4311, over 4062860.01 frames. ], batch size: 108, lr: 3.51e-02, grad_scale: 32.0 2024-09-13 19:52:39,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17566.666666666668, ans=0.0 2024-09-13 19:52:50,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=17566.666666666668, ans=20.675 2024-09-13 19:52:55,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.483e+02 2.892e+02 3.680e+02 7.766e+02, threshold=5.783e+02, percent-clipped=2.0 2024-09-13 19:53:31,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=17651.666666666668, ans=0.125 2024-09-13 19:53:53,212 INFO [train.py:1198] (1/2) Epoch 1, batch 6250, loss[loss=0.357, ctc_loss=0.2724, cr_loss=0.4228, over 21008.00 frames. ], tot_loss[loss=0.4046, ctc_loss=0.3181, cr_loss=0.4329, over 4051087.78 frames. ], batch size: 52, lr: 3.51e-02, grad_scale: 32.0 2024-09-13 19:54:01,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=9.427083333333332 2024-09-13 19:54:17,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=17736.666666666668, ans=0.025 2024-09-13 19:54:27,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=20.82375 2024-09-13 19:54:46,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=17793.333333333332, ans=0.007001449275362319 2024-09-13 19:55:06,019 INFO [train.py:1198] (1/2) Epoch 1, batch 6300, loss[loss=0.4429, ctc_loss=0.3592, cr_loss=0.4183, over 18271.00 frames. ], tot_loss[loss=0.4094, ctc_loss=0.3225, cr_loss=0.4341, over 3981615.70 frames. ], batch size: 108, lr: 3.50e-02, grad_scale: 16.0 2024-09-13 19:55:17,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17850.0, ans=0.125 2024-09-13 19:55:21,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=17878.333333333332, ans=0.2742583333333334 2024-09-13 19:55:22,870 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.696e+02 3.200e+02 4.525e+02 7.555e+02, threshold=6.401e+02, percent-clipped=7.0 2024-09-13 19:56:17,281 INFO [train.py:1198] (1/2) Epoch 1, batch 6350, loss[loss=0.4679, ctc_loss=0.3833, cr_loss=0.4232, over 14008.00 frames. ], tot_loss[loss=0.4189, ctc_loss=0.3315, cr_loss=0.4371, over 3853320.92 frames. ], batch size: 149, lr: 3.49e-02, grad_scale: 16.0 2024-09-13 19:56:31,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=18020.0, ans=0.2693 2024-09-13 19:58:00,851 INFO [train.py:1198] (1/2) Epoch 2, batch 0, loss[loss=0.3897, ctc_loss=0.302, cr_loss=0.4385, over 20787.00 frames. ], tot_loss[loss=0.3897, ctc_loss=0.302, cr_loss=0.4385, over 20787.00 frames. ], batch size: 56, lr: 3.42e-02, grad_scale: 32.0 2024-09-13 19:58:00,851 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 19:58:19,501 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.1284, ctc_loss=0.1284, cr_loss=9.944e-15, over 944034.00 frames. 2024-09-13 19:58:19,502 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 19:58:19,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=18107.833333333332, ans=0.02 2024-09-13 19:58:51,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.429e+02 2.766e+02 3.397e+02 7.112e+02, threshold=5.531e+02, percent-clipped=1.0 2024-09-13 19:58:52,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=18164.5, ans=0.125 2024-09-13 19:58:53,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=18164.5, ans=0.0 2024-09-13 19:59:08,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=18192.833333333332, ans=0.05 2024-09-13 19:59:08,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=18192.833333333332, ans=0.125 2024-09-13 19:59:12,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=18192.833333333332, ans=0.125 2024-09-13 19:59:34,685 INFO [train.py:1198] (1/2) Epoch 2, batch 50, loss[loss=0.3531, ctc_loss=0.2742, cr_loss=0.3949, over 20956.00 frames. ], tot_loss[loss=0.4017, ctc_loss=0.3158, cr_loss=0.4294, over 916700.10 frames. ], batch size: 49, lr: 3.42e-02, grad_scale: 32.0 2024-09-13 20:00:50,783 INFO [train.py:1198] (1/2) Epoch 2, batch 100, loss[loss=0.3429, ctc_loss=0.2684, cr_loss=0.3726, over 20934.00 frames. ], tot_loss[loss=0.3972, ctc_loss=0.3115, cr_loss=0.4285, over 1622541.89 frames. ], batch size: 51, lr: 3.41e-02, grad_scale: 32.0 2024-09-13 20:00:55,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18391.166666666668, ans=0.11608833333333332 2024-09-13 20:00:58,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=18391.166666666668, ans=0.0 2024-09-13 20:01:05,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=14.4073125 2024-09-13 20:01:08,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=18419.5, ans=0.125 2024-09-13 20:01:10,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=18419.5, ans=0.02961962500000001 2024-09-13 20:01:14,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=14.4073125 2024-09-13 20:01:16,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=18419.5, ans=0.125 2024-09-13 20:01:22,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.606e+02 3.253e+02 4.380e+02 9.911e+02, threshold=6.506e+02, percent-clipped=8.0 2024-09-13 20:01:30,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.37 vs. limit=14.4179375 2024-09-13 20:01:32,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=5.767175 2024-09-13 20:01:36,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=18476.166666666668, ans=0.006853007246376811 2024-09-13 20:02:06,278 INFO [train.py:1198] (1/2) Epoch 2, batch 150, loss[loss=0.4952, ctc_loss=0.4035, cr_loss=0.4585, over 14790.00 frames. ], tot_loss[loss=0.3976, ctc_loss=0.3118, cr_loss=0.4291, over 2152372.75 frames. ], batch size: 150, lr: 3.40e-02, grad_scale: 32.0 2024-09-13 20:02:17,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=14.4498125 2024-09-13 20:02:21,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=18532.833333333332, ans=0.125 2024-09-13 20:02:26,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=11.424466666666667 2024-09-13 20:02:48,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=18589.5, ans=0.025 2024-09-13 20:02:52,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=14.4710625 2024-09-13 20:02:55,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=18617.833333333332, ans=0.125 2024-09-13 20:03:06,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=18617.833333333332, ans=0.9361783333333333 2024-09-13 20:03:24,007 INFO [train.py:1198] (1/2) Epoch 2, batch 200, loss[loss=0.3764, ctc_loss=0.2889, cr_loss=0.4374, over 21027.00 frames. ], tot_loss[loss=0.3958, ctc_loss=0.3099, cr_loss=0.4297, over 2589836.29 frames. ], batch size: 52, lr: 3.40e-02, grad_scale: 32.0 2024-09-13 20:03:51,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=18702.833333333332, ans=0.2454008333333334 2024-09-13 20:03:55,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.554e+02 3.174e+02 4.190e+02 6.448e+02, threshold=6.348e+02, percent-clipped=0.0 2024-09-13 20:04:11,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18759.5, ans=0.112405 2024-09-13 20:04:38,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=18787.833333333332, ans=0.125 2024-09-13 20:04:41,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=21.612125 2024-09-13 20:04:42,698 INFO [train.py:1198] (1/2) Epoch 2, batch 250, loss[loss=0.3654, ctc_loss=0.28, cr_loss=0.4269, over 20926.00 frames. ], tot_loss[loss=0.396, ctc_loss=0.3099, cr_loss=0.4303, over 2917506.30 frames. ], batch size: 49, lr: 3.39e-02, grad_scale: 32.0 2024-09-13 20:05:28,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=5.835175 2024-09-13 20:05:32,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=18901.166666666668, ans=0.006760615942028985 2024-09-13 20:05:58,164 INFO [train.py:1198] (1/2) Epoch 2, batch 300, loss[loss=0.3774, ctc_loss=0.2946, cr_loss=0.414, over 20874.00 frames. ], tot_loss[loss=0.3977, ctc_loss=0.3113, cr_loss=0.432, over 3181135.39 frames. ], batch size: 57, lr: 3.39e-02, grad_scale: 32.0 2024-09-13 20:06:28,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=23.98 vs. limit=14.6304375 2024-09-13 20:06:29,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.370e+02 2.734e+02 3.151e+02 5.678e+02, threshold=5.469e+02, percent-clipped=0.0 2024-09-13 20:07:09,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=19071.166666666668, ans=0.05928833333333333 2024-09-13 20:07:13,399 INFO [train.py:1198] (1/2) Epoch 2, batch 350, loss[loss=0.4424, ctc_loss=0.3519, cr_loss=0.4524, over 18476.00 frames. ], tot_loss[loss=0.3961, ctc_loss=0.3096, cr_loss=0.4323, over 3381786.64 frames. ], batch size: 108, lr: 3.38e-02, grad_scale: 32.0 2024-09-13 20:07:31,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19127.833333333332, ans=0.10872166666666669 2024-09-13 20:07:43,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=19156.166666666668, ans=0.125 2024-09-13 20:08:32,013 INFO [train.py:1198] (1/2) Epoch 2, batch 400, loss[loss=0.3575, ctc_loss=0.2762, cr_loss=0.4066, over 20988.00 frames. ], tot_loss[loss=0.3955, ctc_loss=0.3088, cr_loss=0.4333, over 3550482.79 frames. ], batch size: 55, lr: 3.38e-02, grad_scale: 32.0 2024-09-13 20:08:55,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19269.5, ans=0.0 2024-09-13 20:09:04,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.473e+02 2.858e+02 3.592e+02 5.873e+02, threshold=5.717e+02, percent-clipped=3.0 2024-09-13 20:09:04,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=19297.833333333332, ans=0.025 2024-09-13 20:09:20,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=19326.166666666668, ans=0.125 2024-09-13 20:09:34,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19354.5, ans=0.125 2024-09-13 20:09:47,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=19354.5, ans=0.125 2024-09-13 20:09:50,756 INFO [train.py:1198] (1/2) Epoch 2, batch 450, loss[loss=0.335, ctc_loss=0.2628, cr_loss=0.3608, over 20976.00 frames. ], tot_loss[loss=0.3952, ctc_loss=0.3085, cr_loss=0.4334, over 3668656.42 frames. ], batch size: 48, lr: 3.37e-02, grad_scale: 32.0 2024-09-13 20:10:18,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19411.166666666668, ans=0.125 2024-09-13 20:10:19,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=19439.5, ans=0.125 2024-09-13 20:10:51,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=9.874041666666667 2024-09-13 20:10:57,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=19496.166666666668, ans=0.21763416666666668 2024-09-13 20:11:06,538 INFO [train.py:1198] (1/2) Epoch 2, batch 500, loss[loss=0.3962, ctc_loss=0.307, cr_loss=0.446, over 21086.00 frames. ], tot_loss[loss=0.3927, ctc_loss=0.3065, cr_loss=0.4312, over 3765744.14 frames. ], batch size: 59, lr: 3.37e-02, grad_scale: 32.0 2024-09-13 20:11:15,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=19524.5, ans=0.0 2024-09-13 20:11:22,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=19552.833333333332, ans=0.006618949275362319 2024-09-13 20:11:33,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=14.776416666666666 2024-09-13 20:11:38,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.462e+02 2.855e+02 3.643e+02 5.655e+02, threshold=5.711e+02, percent-clipped=0.0 2024-09-13 20:11:44,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19581.166666666668, ans=0.10418833333333333 2024-09-13 20:12:06,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.228375 2024-09-13 20:12:19,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19637.833333333332, ans=0.10362166666666667 2024-09-13 20:12:22,400 INFO [train.py:1198] (1/2) Epoch 2, batch 550, loss[loss=0.4357, ctc_loss=0.3328, cr_loss=0.5145, over 20954.00 frames. ], tot_loss[loss=0.393, ctc_loss=0.3066, cr_loss=0.4322, over 3842902.87 frames. ], batch size: 67, lr: 3.36e-02, grad_scale: 32.0 2024-09-13 20:12:30,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=19666.166666666668, ans=0.025 2024-09-13 20:12:36,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=19694.5, ans=0.006588152173913044 2024-09-13 20:12:41,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=19694.5, ans=0.21069250000000006 2024-09-13 20:12:41,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=19694.5, ans=0.006588152173913044 2024-09-13 20:12:52,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=19722.833333333332, ans=0.0 2024-09-13 20:13:22,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19779.5, ans=0.0 2024-09-13 20:13:38,067 INFO [train.py:1198] (1/2) Epoch 2, batch 600, loss[loss=0.3709, ctc_loss=0.2889, cr_loss=0.4101, over 21055.00 frames. ], tot_loss[loss=0.3912, ctc_loss=0.3048, cr_loss=0.4319, over 3906663.58 frames. ], batch size: 56, lr: 3.35e-02, grad_scale: 32.0 2024-09-13 20:13:40,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.09 vs. limit=9.951958333333334 2024-09-13 20:13:52,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19807.833333333332, ans=0.125 2024-09-13 20:14:00,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=14.9385625 2024-09-13 20:14:12,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.660e+02 3.440e+02 4.172e+02 6.583e+02, threshold=6.880e+02, percent-clipped=8.0 2024-09-13 20:14:20,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19864.5, ans=0.101355 2024-09-13 20:14:36,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=7.978566666666667 2024-09-13 20:14:43,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19921.166666666668, ans=0.125 2024-09-13 20:14:50,792 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:14:55,281 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:14:56,378 INFO [train.py:1198] (1/2) Epoch 2, batch 650, loss[loss=0.4194, ctc_loss=0.3306, cr_loss=0.4442, over 21002.00 frames. ], tot_loss[loss=0.3921, ctc_loss=0.3057, cr_loss=0.4322, over 3937625.99 frames. ], batch size: 63, lr: 3.35e-02, grad_scale: 32.0 2024-09-13 20:15:40,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20006.166666666668, ans=0.1 2024-09-13 20:16:15,353 INFO [train.py:1198] (1/2) Epoch 2, batch 700, loss[loss=0.369, ctc_loss=0.2862, cr_loss=0.4143, over 20976.00 frames. ], tot_loss[loss=0.3913, ctc_loss=0.3051, cr_loss=0.4312, over 3964667.98 frames. ], batch size: 51, lr: 3.34e-02, grad_scale: 32.0 2024-09-13 20:16:24,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=20091.166666666668, ans=0.2 2024-09-13 20:16:29,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-13 20:16:35,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=20119.5, ans=10.0 2024-09-13 20:16:38,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-13 20:16:47,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.108e+02 2.676e+02 3.222e+02 3.947e+02 6.924e+02, threshold=6.443e+02, percent-clipped=1.0 2024-09-13 20:16:48,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=20147.833333333332, ans=0.125 2024-09-13 20:16:53,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=20147.833333333332, ans=0.125 2024-09-13 20:17:31,015 INFO [train.py:1198] (1/2) Epoch 2, batch 750, loss[loss=0.3664, ctc_loss=0.281, cr_loss=0.4272, over 20964.00 frames. ], tot_loss[loss=0.3906, ctc_loss=0.3044, cr_loss=0.431, over 4005985.95 frames. ], batch size: 58, lr: 3.34e-02, grad_scale: 32.0 2024-09-13 20:18:03,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=20289.5, ans=0.2 2024-09-13 20:18:16,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20317.833333333332, ans=0.0 2024-09-13 20:18:36,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=20346.166666666668, ans=0.025 2024-09-13 20:18:46,838 INFO [train.py:1198] (1/2) Epoch 2, batch 800, loss[loss=0.3592, ctc_loss=0.2723, cr_loss=0.4344, over 20889.00 frames. ], tot_loss[loss=0.3909, ctc_loss=0.3046, cr_loss=0.4314, over 4025300.61 frames. ], batch size: 57, lr: 3.33e-02, grad_scale: 32.0 2024-09-13 20:19:18,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.626e+02 3.213e+02 4.247e+02 9.413e+02, threshold=6.425e+02, percent-clipped=6.0 2024-09-13 20:19:25,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20431.166666666668, ans=0.1 2024-09-13 20:19:41,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=20459.5, ans=0.125 2024-09-13 20:20:05,919 INFO [train.py:1198] (1/2) Epoch 2, batch 850, loss[loss=0.3501, ctc_loss=0.2675, cr_loss=0.4129, over 20948.00 frames. ], tot_loss[loss=0.3901, ctc_loss=0.3038, cr_loss=0.4312, over 4040669.01 frames. ], batch size: 51, lr: 3.33e-02, grad_scale: 32.0 2024-09-13 20:20:07,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=20516.166666666668, ans=0.0 2024-09-13 20:20:36,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=20572.833333333332, ans=0.0063972101449275365 2024-09-13 20:20:44,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=20572.833333333332, ans=0.125 2024-09-13 20:21:06,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=20629.5, ans=0.1 2024-09-13 20:21:14,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=20629.5, ans=0.025 2024-09-13 20:21:21,525 INFO [train.py:1198] (1/2) Epoch 2, batch 900, loss[loss=0.3933, ctc_loss=0.3019, cr_loss=0.4568, over 20969.00 frames. ], tot_loss[loss=0.3909, ctc_loss=0.3043, cr_loss=0.4329, over 4050485.85 frames. ], batch size: 58, lr: 3.32e-02, grad_scale: 32.0 2024-09-13 20:21:56,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.569e+02 2.898e+02 3.447e+02 5.755e+02, threshold=5.795e+02, percent-clipped=0.0 2024-09-13 20:22:15,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-13 20:22:40,380 INFO [train.py:1198] (1/2) Epoch 2, batch 950, loss[loss=0.3907, ctc_loss=0.2982, cr_loss=0.4621, over 20767.00 frames. ], tot_loss[loss=0.3891, ctc_loss=0.3026, cr_loss=0.4322, over 4072066.38 frames. ], batch size: 56, lr: 3.32e-02, grad_scale: 32.0 2024-09-13 20:23:09,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-13 20:23:13,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=20856.166666666668, ans=0.125 2024-09-13 20:23:33,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-09-13 20:23:39,139 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:23:49,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=20912.833333333332, ans=0.025 2024-09-13 20:23:49,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=15.0 2024-09-13 20:23:55,288 INFO [train.py:1198] (1/2) Epoch 2, batch 1000, loss[loss=0.3167, ctc_loss=0.241, cr_loss=0.3787, over 19993.00 frames. ], tot_loss[loss=0.3902, ctc_loss=0.3037, cr_loss=0.4325, over 4066012.13 frames. ], batch size: 44, lr: 3.31e-02, grad_scale: 32.0 2024-09-13 20:24:09,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-13 20:24:26,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.78 vs. limit=22.5 2024-09-13 20:24:26,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.521e+02 3.115e+02 3.641e+02 5.149e+02, threshold=6.231e+02, percent-clipped=0.0 2024-09-13 20:24:30,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-13 20:24:31,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=12.0 2024-09-13 20:24:44,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=12.0 2024-09-13 20:25:05,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.25 vs. limit=15.0 2024-09-13 20:25:10,146 INFO [train.py:1198] (1/2) Epoch 2, batch 1050, loss[loss=0.3438, ctc_loss=0.2586, cr_loss=0.4262, over 20975.00 frames. ], tot_loss[loss=0.3896, ctc_loss=0.3029, cr_loss=0.4332, over 4078998.36 frames. ], batch size: 49, lr: 3.30e-02, grad_scale: 32.0 2024-09-13 20:25:16,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21082.833333333332, ans=0.0 2024-09-13 20:26:11,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21167.833333333332, ans=0.125 2024-09-13 20:26:17,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=21196.166666666668, ans=0.125 2024-09-13 20:26:24,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=21196.166666666668, ans=0.125 2024-09-13 20:26:27,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=21224.5, ans=0.0 2024-09-13 20:26:28,982 INFO [train.py:1198] (1/2) Epoch 2, batch 1100, loss[loss=0.3903, ctc_loss=0.3022, cr_loss=0.4406, over 20843.00 frames. ], tot_loss[loss=0.3894, ctc_loss=0.3029, cr_loss=0.4327, over 4069703.63 frames. ], batch size: 65, lr: 3.30e-02, grad_scale: 32.0 2024-09-13 20:26:47,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=21252.833333333332, ans=0.0 2024-09-13 20:27:00,764 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.640e+02 3.406e+02 4.590e+02 7.864e+02, threshold=6.813e+02, percent-clipped=6.0 2024-09-13 20:27:11,902 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:27:16,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=21309.5, ans=0.05 2024-09-13 20:27:34,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=21337.833333333332, ans=0.006230905797101449 2024-09-13 20:27:47,727 INFO [train.py:1198] (1/2) Epoch 2, batch 1150, loss[loss=0.3404, ctc_loss=0.2604, cr_loss=0.4001, over 19033.00 frames. ], tot_loss[loss=0.388, ctc_loss=0.3015, cr_loss=0.4324, over 4077493.35 frames. ], batch size: 42, lr: 3.29e-02, grad_scale: 32.0 2024-09-13 20:28:20,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-09-13 20:29:02,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=21507.833333333332, ans=0.125 2024-09-13 20:29:03,484 INFO [train.py:1198] (1/2) Epoch 2, batch 1200, loss[loss=0.3661, ctc_loss=0.2835, cr_loss=0.4132, over 20896.00 frames. ], tot_loss[loss=0.3878, ctc_loss=0.3015, cr_loss=0.4316, over 4077321.90 frames. ], batch size: 57, lr: 3.29e-02, grad_scale: 32.0 2024-09-13 20:29:16,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=21507.833333333332, ans=0.025 2024-09-13 20:29:35,541 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.574e+02 2.852e+02 3.470e+02 7.772e+02, threshold=5.704e+02, percent-clipped=1.0 2024-09-13 20:30:13,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=21621.166666666668, ans=0.125 2024-09-13 20:30:19,071 INFO [train.py:1198] (1/2) Epoch 2, batch 1250, loss[loss=0.3813, ctc_loss=0.2987, cr_loss=0.4128, over 21024.00 frames. ], tot_loss[loss=0.3883, ctc_loss=0.3019, cr_loss=0.4319, over 4077958.18 frames. ], batch size: 62, lr: 3.28e-02, grad_scale: 32.0 2024-09-13 20:30:37,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-13 20:30:53,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=21706.166666666668, ans=0.006150833333333333 2024-09-13 20:31:25,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=21762.833333333332, ans=0.09899494936611666 2024-09-13 20:31:34,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=21762.833333333332, ans=0.125 2024-09-13 20:31:37,300 INFO [train.py:1198] (1/2) Epoch 2, batch 1300, loss[loss=0.4437, ctc_loss=0.353, cr_loss=0.4538, over 20720.00 frames. ], tot_loss[loss=0.386, ctc_loss=0.2999, cr_loss=0.4304, over 4094973.03 frames. ], batch size: 71, lr: 3.28e-02, grad_scale: 32.0 2024-09-13 20:31:42,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=21791.166666666668, ans=0.125 2024-09-13 20:31:45,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=21791.166666666668, ans=0.125 2024-09-13 20:31:48,447 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:31:54,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21819.5, ans=0.0 2024-09-13 20:31:56,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-13 20:32:09,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.667e+02 3.109e+02 3.694e+02 5.570e+02, threshold=6.217e+02, percent-clipped=0.0 2024-09-13 20:32:52,983 INFO [train.py:1198] (1/2) Epoch 2, batch 1350, loss[loss=0.3359, ctc_loss=0.2543, cr_loss=0.4083, over 20992.00 frames. ], tot_loss[loss=0.3846, ctc_loss=0.2987, cr_loss=0.4295, over 4100857.29 frames. ], batch size: 50, lr: 3.27e-02, grad_scale: 32.0 2024-09-13 20:33:05,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=21932.833333333332, ans=0.006101557971014493 2024-09-13 20:33:53,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-09-13 20:33:55,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-09-13 20:34:11,368 INFO [train.py:1198] (1/2) Epoch 2, batch 1400, loss[loss=0.4144, ctc_loss=0.324, cr_loss=0.452, over 20964.00 frames. ], tot_loss[loss=0.3855, ctc_loss=0.2994, cr_loss=0.4304, over 4105585.57 frames. ], batch size: 58, lr: 3.27e-02, grad_scale: 32.0 2024-09-13 20:34:31,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22102.833333333332, ans=0.125 2024-09-13 20:34:43,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.664e+02 3.154e+02 3.884e+02 7.129e+02, threshold=6.308e+02, percent-clipped=2.0 2024-09-13 20:34:55,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=22159.5, ans=0.025 2024-09-13 20:35:26,908 INFO [train.py:1198] (1/2) Epoch 2, batch 1450, loss[loss=0.4007, ctc_loss=0.3106, cr_loss=0.4503, over 20711.00 frames. ], tot_loss[loss=0.3862, ctc_loss=0.3002, cr_loss=0.4302, over 4093713.85 frames. ], batch size: 68, lr: 3.26e-02, grad_scale: 32.0 2024-09-13 20:35:58,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=22272.833333333332, ans=0.2 2024-09-13 20:36:06,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=22272.833333333332, ans=0.025 2024-09-13 20:36:12,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=22301.166666666668, ans=0.0 2024-09-13 20:36:22,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=22301.166666666668, ans=0.2 2024-09-13 20:36:28,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=22329.5, ans=0.0060153260869565215 2024-09-13 20:36:33,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=22329.5, ans=0.125 2024-09-13 20:36:36,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22329.5, ans=0.1 2024-09-13 20:36:42,125 INFO [train.py:1198] (1/2) Epoch 2, batch 1500, loss[loss=0.3287, ctc_loss=0.2525, cr_loss=0.3812, over 20960.00 frames. ], tot_loss[loss=0.3854, ctc_loss=0.2994, cr_loss=0.4301, over 4103580.30 frames. ], batch size: 49, lr: 3.26e-02, grad_scale: 32.0 2024-09-13 20:36:43,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=22357.833333333332, ans=0.006009166666666667 2024-09-13 20:36:54,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=22357.833333333332, ans=0.125 2024-09-13 20:37:04,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-09-13 20:37:07,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=22386.166666666668, ans=0.2 2024-09-13 20:37:15,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=22414.5, ans=0.125 2024-09-13 20:37:16,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.427e+02 2.958e+02 3.513e+02 7.679e+02, threshold=5.916e+02, percent-clipped=2.0 2024-09-13 20:37:16,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=22414.5, ans=0.1 2024-09-13 20:37:35,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22442.833333333332, ans=0.125 2024-09-13 20:37:47,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=22471.166666666668, ans=0.0 2024-09-13 20:37:47,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=22471.166666666668, ans=0.125 2024-09-13 20:37:50,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22471.166666666668, ans=0.1 2024-09-13 20:37:59,591 INFO [train.py:1198] (1/2) Epoch 2, batch 1550, loss[loss=0.3463, ctc_loss=0.2663, cr_loss=0.3997, over 20964.00 frames. ], tot_loss[loss=0.3858, ctc_loss=0.2998, cr_loss=0.4303, over 4095113.92 frames. ], batch size: 51, lr: 3.25e-02, grad_scale: 16.0 2024-09-13 20:38:09,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=22499.5, ans=0.2 2024-09-13 20:38:12,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=22499.5, ans=0.125 2024-09-13 20:38:13,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=22527.833333333332, ans=0.125 2024-09-13 20:38:21,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=22527.833333333332, ans=0.025 2024-09-13 20:38:22,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=22527.833333333332, ans=0.025 2024-09-13 20:38:31,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=22556.166666666668, ans=0.125 2024-09-13 20:38:57,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=22584.5, ans=0.0 2024-09-13 20:39:17,596 INFO [train.py:1198] (1/2) Epoch 2, batch 1600, loss[loss=0.362, ctc_loss=0.2762, cr_loss=0.429, over 20968.00 frames. ], tot_loss[loss=0.3872, ctc_loss=0.3008, cr_loss=0.4319, over 4086646.52 frames. ], batch size: 52, lr: 3.24e-02, grad_scale: 32.0 2024-09-13 20:39:29,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=22641.166666666668, ans=0.005947572463768116 2024-09-13 20:39:51,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.460e+02 2.947e+02 3.352e+02 5.392e+02, threshold=5.894e+02, percent-clipped=0.0 2024-09-13 20:39:56,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=22697.833333333332, ans=0.0 2024-09-13 20:40:33,862 INFO [train.py:1198] (1/2) Epoch 2, batch 1650, loss[loss=0.4066, ctc_loss=0.3111, cr_loss=0.4775, over 21031.00 frames. ], tot_loss[loss=0.3852, ctc_loss=0.2991, cr_loss=0.4308, over 4091219.92 frames. ], batch size: 62, lr: 3.24e-02, grad_scale: 32.0 2024-09-13 20:40:57,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=22811.166666666668, ans=0.125 2024-09-13 20:41:13,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=22839.5, ans=0.0 2024-09-13 20:41:15,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22839.5, ans=0.125 2024-09-13 20:41:49,981 INFO [train.py:1198] (1/2) Epoch 2, batch 1700, loss[loss=0.4006, ctc_loss=0.3106, cr_loss=0.45, over 20944.00 frames. ], tot_loss[loss=0.3853, ctc_loss=0.2991, cr_loss=0.4309, over 4103637.89 frames. ], batch size: 67, lr: 3.23e-02, grad_scale: 32.0 2024-09-13 20:42:19,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-09-13 20:42:22,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.632e+02 3.164e+02 4.211e+02 6.486e+02, threshold=6.328e+02, percent-clipped=4.0 2024-09-13 20:42:30,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22981.166666666668, ans=0.1 2024-09-13 20:43:08,307 INFO [train.py:1198] (1/2) Epoch 2, batch 1750, loss[loss=0.4045, ctc_loss=0.3084, cr_loss=0.4804, over 20274.00 frames. ], tot_loss[loss=0.3866, ctc_loss=0.3002, cr_loss=0.4323, over 4092354.55 frames. ], batch size: 74, lr: 3.23e-02, grad_scale: 32.0 2024-09-13 20:43:50,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-09-13 20:44:17,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=23179.5, ans=0.125 2024-09-13 20:44:25,827 INFO [train.py:1198] (1/2) Epoch 2, batch 1800, loss[loss=0.4129, ctc_loss=0.3214, cr_loss=0.4573, over 21024.00 frames. ], tot_loss[loss=0.387, ctc_loss=0.3006, cr_loss=0.432, over 4091892.46 frames. ], batch size: 63, lr: 3.22e-02, grad_scale: 32.0 2024-09-13 20:44:44,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=23236.166666666668, ans=0.2 2024-09-13 20:44:53,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23236.166666666668, ans=0.125 2024-09-13 20:44:59,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.462e+02 2.981e+02 3.679e+02 7.398e+02, threshold=5.962e+02, percent-clipped=3.0 2024-09-13 20:45:07,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=23264.5, ans=0.125 2024-09-13 20:45:10,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=23292.833333333332, ans=0.0 2024-09-13 20:45:27,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-13 20:45:30,176 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:45:41,630 INFO [train.py:1198] (1/2) Epoch 2, batch 1850, loss[loss=0.4117, ctc_loss=0.3175, cr_loss=0.471, over 20652.00 frames. ], tot_loss[loss=0.3855, ctc_loss=0.2992, cr_loss=0.4314, over 4104621.07 frames. ], batch size: 68, lr: 3.22e-02, grad_scale: 32.0 2024-09-13 20:46:01,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=23377.833333333332, ans=0.04949747468305833 2024-09-13 20:46:17,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-09-13 20:46:23,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=23406.166666666668, ans=0.125 2024-09-13 20:46:40,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2024-09-13 20:46:57,718 INFO [train.py:1198] (1/2) Epoch 2, batch 1900, loss[loss=0.3969, ctc_loss=0.3014, cr_loss=0.4776, over 20967.00 frames. ], tot_loss[loss=0.3835, ctc_loss=0.2972, cr_loss=0.4315, over 4112673.14 frames. ], batch size: 58, lr: 3.21e-02, grad_scale: 32.0 2024-09-13 20:47:06,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=23491.166666666668, ans=0.125 2024-09-13 20:47:30,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.489e+02 3.030e+02 3.557e+02 7.221e+02, threshold=6.060e+02, percent-clipped=3.0 2024-09-13 20:47:33,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=23547.833333333332, ans=0.0 2024-09-13 20:48:12,885 INFO [train.py:1198] (1/2) Epoch 2, batch 1950, loss[loss=0.3417, ctc_loss=0.262, cr_loss=0.3986, over 20980.00 frames. ], tot_loss[loss=0.3827, ctc_loss=0.2963, cr_loss=0.4318, over 4116820.88 frames. ], batch size: 48, lr: 3.21e-02, grad_scale: 32.0 2024-09-13 20:48:16,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=23632.833333333332, ans=0.05 2024-09-13 20:48:56,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-09-13 20:49:02,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2024-09-13 20:49:31,258 INFO [train.py:1198] (1/2) Epoch 2, batch 2000, loss[loss=0.4056, ctc_loss=0.3151, cr_loss=0.4524, over 21065.00 frames. ], tot_loss[loss=0.3841, ctc_loss=0.2974, cr_loss=0.4332, over 4116292.07 frames. ], batch size: 59, lr: 3.20e-02, grad_scale: 32.0 2024-09-13 20:50:02,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=23802.833333333332, ans=0.125 2024-09-13 20:50:07,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.510e+02 2.846e+02 3.575e+02 7.268e+02, threshold=5.692e+02, percent-clipped=1.0 2024-09-13 20:50:07,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=23831.166666666668, ans=0.125 2024-09-13 20:50:38,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=23887.833333333332, ans=0.005676557971014493 2024-09-13 20:50:38,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=23887.833333333332, ans=0.125 2024-09-13 20:50:49,554 INFO [train.py:1198] (1/2) Epoch 2, batch 2050, loss[loss=0.4038, ctc_loss=0.3151, cr_loss=0.4434, over 20950.00 frames. ], tot_loss[loss=0.383, ctc_loss=0.2965, cr_loss=0.4327, over 4099944.21 frames. ], batch size: 60, lr: 3.20e-02, grad_scale: 32.0 2024-09-13 20:50:52,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=23916.166666666668, ans=0.125 2024-09-13 20:50:57,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23916.166666666668, ans=0.1 2024-09-13 20:51:03,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=23944.5, ans=10.0 2024-09-13 20:51:06,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23944.5, ans=0.1 2024-09-13 20:51:12,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=23944.5, ans=0.025 2024-09-13 20:51:18,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=23972.833333333332, ans=0.125 2024-09-13 20:51:26,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=23972.833333333332, ans=0.125 2024-09-13 20:51:28,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2024-09-13 20:51:59,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.14 vs. limit=22.5 2024-09-13 20:52:04,323 INFO [train.py:1198] (1/2) Epoch 2, batch 2100, loss[loss=0.3917, ctc_loss=0.3033, cr_loss=0.4419, over 20972.00 frames. ], tot_loss[loss=0.3821, ctc_loss=0.2958, cr_loss=0.4315, over 4101294.17 frames. ], batch size: 58, lr: 3.19e-02, grad_scale: 32.0 2024-09-13 20:52:37,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.482e+02 2.856e+02 3.518e+02 6.098e+02, threshold=5.713e+02, percent-clipped=2.0 2024-09-13 20:52:39,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=24114.5, ans=0.125 2024-09-13 20:53:19,408 INFO [train.py:1198] (1/2) Epoch 2, batch 2150, loss[loss=0.3354, ctc_loss=0.2564, cr_loss=0.3946, over 20985.00 frames. ], tot_loss[loss=0.3818, ctc_loss=0.2955, cr_loss=0.4318, over 4099963.52 frames. ], batch size: 55, lr: 3.19e-02, grad_scale: 32.0 2024-09-13 20:53:50,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=24256.166666666668, ans=0.125 2024-09-13 20:54:12,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-13 20:54:37,397 INFO [train.py:1198] (1/2) Epoch 2, batch 2200, loss[loss=0.4406, ctc_loss=0.3444, cr_loss=0.4811, over 20209.00 frames. ], tot_loss[loss=0.3841, ctc_loss=0.2975, cr_loss=0.433, over 4091808.03 frames. ], batch size: 74, lr: 3.18e-02, grad_scale: 32.0 2024-09-13 20:54:37,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24341.166666666668, ans=0.125 2024-09-13 20:54:46,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24341.166666666668, ans=0.125 2024-09-13 20:55:09,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=24397.833333333332, ans=0.005565688405797102 2024-09-13 20:55:10,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.531e+02 2.986e+02 3.747e+02 8.423e+02, threshold=5.972e+02, percent-clipped=7.0 2024-09-13 20:55:26,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=24426.166666666668, ans=0.125 2024-09-13 20:55:49,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-09-13 20:55:55,516 INFO [train.py:1198] (1/2) Epoch 2, batch 2250, loss[loss=0.3724, ctc_loss=0.2835, cr_loss=0.4447, over 20953.00 frames. ], tot_loss[loss=0.3827, ctc_loss=0.2964, cr_loss=0.4314, over 4094636.89 frames. ], batch size: 60, lr: 3.18e-02, grad_scale: 32.0 2024-09-13 20:55:55,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=24482.833333333332, ans=0.125 2024-09-13 20:56:04,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24482.833333333332, ans=0.125 2024-09-13 20:56:04,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24482.833333333332, ans=0.1 2024-09-13 20:56:34,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=24539.5, ans=15.0 2024-09-13 20:56:35,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=24539.5, ans=0.125 2024-09-13 20:56:38,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=24539.5, ans=0.2 2024-09-13 20:56:41,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=24567.833333333332, ans=0.125 2024-09-13 20:56:59,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.17 vs. limit=6.0 2024-09-13 20:56:59,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=15.0 2024-09-13 20:57:10,395 INFO [train.py:1198] (1/2) Epoch 2, batch 2300, loss[loss=0.4064, ctc_loss=0.3178, cr_loss=0.4429, over 20977.00 frames. ], tot_loss[loss=0.3835, ctc_loss=0.2971, cr_loss=0.4318, over 4093625.23 frames. ], batch size: 58, lr: 3.17e-02, grad_scale: 32.0 2024-09-13 20:57:43,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.578e+02 3.081e+02 3.650e+02 6.598e+02, threshold=6.161e+02, percent-clipped=1.0 2024-09-13 20:57:46,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=24681.166666666668, ans=0.07 2024-09-13 20:57:55,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-09-13 20:58:03,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=24709.5, ans=0.125 2024-09-13 20:58:14,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24737.833333333332, ans=0.1 2024-09-13 20:58:25,702 INFO [train.py:1198] (1/2) Epoch 2, batch 2350, loss[loss=0.4095, ctc_loss=0.3203, cr_loss=0.4462, over 21037.00 frames. ], tot_loss[loss=0.3821, ctc_loss=0.2959, cr_loss=0.4314, over 4099499.56 frames. ], batch size: 62, lr: 3.17e-02, grad_scale: 32.0 2024-09-13 20:58:29,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=24766.166666666668, ans=0.2 2024-09-13 20:58:30,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=12.0 2024-09-13 20:58:35,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24766.166666666668, ans=0.1 2024-09-13 20:58:55,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24822.833333333332, ans=0.1 2024-09-13 20:59:00,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-13 20:59:23,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-13 20:59:26,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2024-09-13 20:59:34,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=24879.5, ans=0.125 2024-09-13 20:59:42,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=24907.833333333332, ans=0.025 2024-09-13 20:59:43,376 INFO [train.py:1198] (1/2) Epoch 2, batch 2400, loss[loss=0.3374, ctc_loss=0.2587, cr_loss=0.3939, over 20966.00 frames. ], tot_loss[loss=0.3839, ctc_loss=0.2975, cr_loss=0.4323, over 4069493.90 frames. ], batch size: 50, lr: 3.16e-02, grad_scale: 32.0 2024-09-13 20:59:43,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24907.833333333332, ans=0.1 2024-09-13 20:59:45,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24907.833333333332, ans=0.125 2024-09-13 21:00:00,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=24936.166666666668, ans=0.125 2024-09-13 21:00:00,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24936.166666666668, ans=0.1 2024-09-13 21:00:06,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24936.166666666668, ans=0.1 2024-09-13 21:00:06,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=24936.166666666668, ans=0.125 2024-09-13 21:00:16,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.468e+02 2.899e+02 3.440e+02 7.814e+02, threshold=5.798e+02, percent-clipped=2.0 2024-09-13 21:00:18,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=15.0 2024-09-13 21:00:27,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=24992.833333333332, ans=0.125 2024-09-13 21:00:41,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-09-13 21:00:49,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=25021.166666666668, ans=0.035 2024-09-13 21:00:58,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=22.5 2024-09-13 21:00:58,961 INFO [train.py:1198] (1/2) Epoch 2, batch 2450, loss[loss=0.34, ctc_loss=0.2628, cr_loss=0.3865, over 20777.00 frames. ], tot_loss[loss=0.382, ctc_loss=0.2959, cr_loss=0.4308, over 4072977.16 frames. ], batch size: 53, lr: 3.15e-02, grad_scale: 32.0 2024-09-13 21:01:23,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=25077.833333333332, ans=0.0 2024-09-13 21:02:00,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-09-13 21:02:04,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=25162.833333333332, ans=10.0 2024-09-13 21:02:16,417 INFO [train.py:1198] (1/2) Epoch 2, batch 2500, loss[loss=0.3573, ctc_loss=0.2783, cr_loss=0.3955, over 21071.00 frames. ], tot_loss[loss=0.3825, ctc_loss=0.2964, cr_loss=0.4306, over 4062382.02 frames. ], batch size: 56, lr: 3.15e-02, grad_scale: 32.0 2024-09-13 21:02:16,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=25191.166666666668, ans=0.005393224637681159 2024-09-13 21:02:45,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=25247.833333333332, ans=0.125 2024-09-13 21:02:49,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.439e+02 2.976e+02 3.826e+02 6.695e+02, threshold=5.952e+02, percent-clipped=3.0 2024-09-13 21:03:11,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=25276.166666666668, ans=0.125 2024-09-13 21:03:32,056 INFO [train.py:1198] (1/2) Epoch 2, batch 2550, loss[loss=0.3635, ctc_loss=0.2761, cr_loss=0.4369, over 20940.00 frames. ], tot_loss[loss=0.3815, ctc_loss=0.2954, cr_loss=0.4305, over 4070509.73 frames. ], batch size: 49, lr: 3.14e-02, grad_scale: 32.0 2024-09-13 21:04:28,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-13 21:04:33,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=25446.166666666668, ans=0.125 2024-09-13 21:04:39,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=25446.166666666668, ans=0.125 2024-09-13 21:04:47,643 INFO [train.py:1198] (1/2) Epoch 2, batch 2600, loss[loss=0.3846, ctc_loss=0.2949, cr_loss=0.4481, over 20880.00 frames. ], tot_loss[loss=0.383, ctc_loss=0.2967, cr_loss=0.4318, over 4073151.87 frames. ], batch size: 57, lr: 3.14e-02, grad_scale: 32.0 2024-09-13 21:04:50,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=25474.5, ans=0.125 2024-09-13 21:05:04,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=25502.833333333332, ans=0.2 2024-09-13 21:05:04,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=25502.833333333332, ans=0.07 2024-09-13 21:05:07,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-13 21:05:10,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25502.833333333332, ans=0.125 2024-09-13 21:05:20,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.492e+02 2.894e+02 3.636e+02 6.965e+02, threshold=5.787e+02, percent-clipped=1.0 2024-09-13 21:05:34,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=25559.5, ans=0.5 2024-09-13 21:05:42,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=25559.5, ans=0.0053131521739130435 2024-09-13 21:05:58,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=25587.833333333332, ans=0.125 2024-09-13 21:06:05,897 INFO [train.py:1198] (1/2) Epoch 2, batch 2650, loss[loss=0.3529, ctc_loss=0.2698, cr_loss=0.4156, over 20773.00 frames. ], tot_loss[loss=0.3817, ctc_loss=0.2953, cr_loss=0.4319, over 4080021.28 frames. ], batch size: 53, lr: 3.13e-02, grad_scale: 32.0 2024-09-13 21:06:28,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=25644.5, ans=0.005294673913043478 2024-09-13 21:06:43,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=8.0 2024-09-13 21:07:02,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=25701.166666666668, ans=0.125 2024-09-13 21:07:20,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25729.5, ans=0.0 2024-09-13 21:07:24,667 INFO [train.py:1198] (1/2) Epoch 2, batch 2700, loss[loss=0.4035, ctc_loss=0.3177, cr_loss=0.4291, over 20950.00 frames. ], tot_loss[loss=0.3794, ctc_loss=0.2933, cr_loss=0.4306, over 4091011.37 frames. ], batch size: 64, lr: 3.13e-02, grad_scale: 32.0 2024-09-13 21:07:29,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-09-13 21:07:57,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=25814.5, ans=0.125 2024-09-13 21:07:58,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.583e+02 3.121e+02 3.841e+02 6.542e+02, threshold=6.242e+02, percent-clipped=1.0 2024-09-13 21:08:06,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=25814.5, ans=0.005257717391304348 2024-09-13 21:08:27,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25871.166666666668, ans=0.1 2024-09-13 21:08:30,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=25871.166666666668, ans=0.0 2024-09-13 21:08:40,568 INFO [train.py:1198] (1/2) Epoch 2, batch 2750, loss[loss=0.342, ctc_loss=0.2678, cr_loss=0.3708, over 20944.00 frames. ], tot_loss[loss=0.3781, ctc_loss=0.2921, cr_loss=0.43, over 4085352.73 frames. ], batch size: 49, lr: 3.12e-02, grad_scale: 32.0 2024-09-13 21:08:43,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=25899.5, ans=0.0 2024-09-13 21:09:15,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25956.166666666668, ans=0.1 2024-09-13 21:09:32,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=25984.5, ans=0.125 2024-09-13 21:09:42,698 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:09:55,783 INFO [train.py:1198] (1/2) Epoch 2, batch 2800, loss[loss=0.3512, ctc_loss=0.2652, cr_loss=0.43, over 20955.00 frames. ], tot_loss[loss=0.3785, ctc_loss=0.2923, cr_loss=0.431, over 4092400.91 frames. ], batch size: 50, lr: 3.12e-02, grad_scale: 32.0 2024-09-13 21:10:28,887 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.420e+02 2.869e+02 3.483e+02 5.905e+02, threshold=5.737e+02, percent-clipped=0.0 2024-09-13 21:10:33,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=26097.833333333332, ans=0.0051961231884057974 2024-09-13 21:11:11,187 INFO [train.py:1198] (1/2) Epoch 2, batch 2850, loss[loss=0.3766, ctc_loss=0.2909, cr_loss=0.4282, over 20783.00 frames. ], tot_loss[loss=0.3774, ctc_loss=0.2914, cr_loss=0.4301, over 4089808.45 frames. ], batch size: 56, lr: 3.11e-02, grad_scale: 32.0 2024-09-13 21:11:57,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=26267.833333333332, ans=0.005159166666666667 2024-09-13 21:12:11,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-13 21:12:29,209 INFO [train.py:1198] (1/2) Epoch 2, batch 2900, loss[loss=0.4249, ctc_loss=0.3276, cr_loss=0.4864, over 18309.00 frames. ], tot_loss[loss=0.3775, ctc_loss=0.2913, cr_loss=0.4311, over 4099888.02 frames. ], batch size: 108, lr: 3.11e-02, grad_scale: 32.0 2024-09-13 21:12:35,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26324.5, ans=0.125 2024-09-13 21:13:04,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.375e+02 2.665e+02 3.126e+02 5.503e+02, threshold=5.330e+02, percent-clipped=0.0 2024-09-13 21:13:35,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=26437.833333333332, ans=0.2 2024-09-13 21:13:46,947 INFO [train.py:1198] (1/2) Epoch 2, batch 2950, loss[loss=0.3779, ctc_loss=0.2943, cr_loss=0.418, over 21066.00 frames. ], tot_loss[loss=0.3773, ctc_loss=0.2912, cr_loss=0.4307, over 4099011.24 frames. ], batch size: 56, lr: 3.10e-02, grad_scale: 32.0 2024-09-13 21:14:03,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=26494.5, ans=0.125 2024-09-13 21:14:34,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=26551.166666666668, ans=0.0 2024-09-13 21:14:52,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=26579.5, ans=0.125 2024-09-13 21:14:54,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=26579.5, ans=0.125 2024-09-13 21:15:02,759 INFO [train.py:1198] (1/2) Epoch 2, batch 3000, loss[loss=0.3972, ctc_loss=0.3083, cr_loss=0.4445, over 21006.00 frames. ], tot_loss[loss=0.3787, ctc_loss=0.2923, cr_loss=0.4316, over 4092641.62 frames. ], batch size: 55, lr: 3.10e-02, grad_scale: 32.0 2024-09-13 21:15:02,760 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 21:15:29,476 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.1081, ctc_loss=0.1081, cr_loss=9.791e-15, over 944034.00 frames. 2024-09-13 21:15:29,477 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 21:16:03,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 2.524e+02 3.002e+02 3.755e+02 5.598e+02, threshold=6.003e+02, percent-clipped=2.0 2024-09-13 21:16:05,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2024-09-13 21:16:09,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=26664.5, ans=0.125 2024-09-13 21:16:45,084 INFO [train.py:1198] (1/2) Epoch 2, batch 3050, loss[loss=0.3985, ctc_loss=0.3114, cr_loss=0.4354, over 20961.00 frames. ], tot_loss[loss=0.3767, ctc_loss=0.2905, cr_loss=0.4309, over 4102772.48 frames. ], batch size: 58, lr: 3.09e-02, grad_scale: 32.0 2024-09-13 21:16:58,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-09-13 21:17:03,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=26777.833333333332, ans=0.05 2024-09-13 21:17:17,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-13 21:18:03,496 INFO [train.py:1198] (1/2) Epoch 2, batch 3100, loss[loss=0.3254, ctc_loss=0.2501, cr_loss=0.3763, over 20983.00 frames. ], tot_loss[loss=0.3769, ctc_loss=0.2907, cr_loss=0.4312, over 4105492.16 frames. ], batch size: 50, lr: 3.09e-02, grad_scale: 32.0 2024-09-13 21:18:39,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.192e+02 2.723e+02 3.199e+02 3.771e+02 6.385e+02, threshold=6.398e+02, percent-clipped=1.0 2024-09-13 21:19:22,442 INFO [train.py:1198] (1/2) Epoch 2, batch 3150, loss[loss=0.3372, ctc_loss=0.258, cr_loss=0.3961, over 20961.00 frames. ], tot_loss[loss=0.3778, ctc_loss=0.2914, cr_loss=0.4321, over 4103323.58 frames. ], batch size: 49, lr: 3.08e-02, grad_scale: 32.0 2024-09-13 21:20:03,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27089.5, ans=0.125 2024-09-13 21:20:38,014 INFO [train.py:1198] (1/2) Epoch 2, batch 3200, loss[loss=0.3721, ctc_loss=0.2894, cr_loss=0.4137, over 19923.00 frames. ], tot_loss[loss=0.3774, ctc_loss=0.291, cr_loss=0.4318, over 4102084.26 frames. ], batch size: 44, lr: 3.08e-02, grad_scale: 32.0 2024-09-13 21:20:41,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=27174.5, ans=0.125 2024-09-13 21:21:04,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=27202.833333333332, ans=0.004955905797101449 2024-09-13 21:21:08,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=27231.166666666668, ans=0.125 2024-09-13 21:21:10,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27231.166666666668, ans=0.125 2024-09-13 21:21:11,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.544e+02 3.082e+02 3.821e+02 6.642e+02, threshold=6.165e+02, percent-clipped=1.0 2024-09-13 21:21:26,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=27259.5, ans=0.004943586956521739 2024-09-13 21:21:44,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=27287.833333333332, ans=0.2 2024-09-13 21:21:53,314 INFO [train.py:1198] (1/2) Epoch 2, batch 3250, loss[loss=0.3373, ctc_loss=0.2594, cr_loss=0.3894, over 20931.00 frames. ], tot_loss[loss=0.379, ctc_loss=0.2925, cr_loss=0.4326, over 4089620.47 frames. ], batch size: 50, lr: 3.07e-02, grad_scale: 32.0 2024-09-13 21:21:58,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=27316.166666666668, ans=0.125 2024-09-13 21:22:28,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=27372.833333333332, ans=0.025 2024-09-13 21:22:32,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=27372.833333333332, ans=0.125 2024-09-13 21:22:52,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27401.166666666668, ans=0.125 2024-09-13 21:22:55,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27429.5, ans=0.125 2024-09-13 21:23:11,246 INFO [train.py:1198] (1/2) Epoch 2, batch 3300, loss[loss=0.4315, ctc_loss=0.332, cr_loss=0.4971, over 19554.00 frames. ], tot_loss[loss=0.3792, ctc_loss=0.2926, cr_loss=0.4333, over 4082458.51 frames. ], batch size: 90, lr: 3.07e-02, grad_scale: 16.0 2024-09-13 21:23:13,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=27457.833333333332, ans=0.004900471014492754 2024-09-13 21:23:21,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=27457.833333333332, ans=0.025 2024-09-13 21:23:37,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=27486.166666666668, ans=0.2 2024-09-13 21:23:38,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27486.166666666668, ans=0.1 2024-09-13 21:23:45,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.393e+02 2.950e+02 3.326e+02 5.687e+02, threshold=5.901e+02, percent-clipped=0.0 2024-09-13 21:24:10,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27542.833333333332, ans=0.1 2024-09-13 21:24:30,029 INFO [train.py:1198] (1/2) Epoch 2, batch 3350, loss[loss=0.3818, ctc_loss=0.2908, cr_loss=0.4553, over 20944.00 frames. ], tot_loss[loss=0.3778, ctc_loss=0.2913, cr_loss=0.4323, over 4086466.42 frames. ], batch size: 60, lr: 3.06e-02, grad_scale: 16.0 2024-09-13 21:24:36,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=27599.5, ans=0.2 2024-09-13 21:24:45,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2024-09-13 21:25:09,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=27656.166666666668, ans=0.125 2024-09-13 21:25:15,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=27684.5, ans=0.0 2024-09-13 21:25:21,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=27684.5, ans=0.0 2024-09-13 21:25:44,722 INFO [train.py:1198] (1/2) Epoch 2, batch 3400, loss[loss=0.3583, ctc_loss=0.2738, cr_loss=0.4227, over 20902.00 frames. ], tot_loss[loss=0.377, ctc_loss=0.2907, cr_loss=0.4316, over 4088850.37 frames. ], batch size: 54, lr: 3.06e-02, grad_scale: 16.0 2024-09-13 21:26:10,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=27769.5, ans=0.125 2024-09-13 21:26:11,904 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:26:19,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.622e+02 3.125e+02 3.859e+02 8.234e+02, threshold=6.251e+02, percent-clipped=1.0 2024-09-13 21:26:51,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27854.5, ans=0.1 2024-09-13 21:26:51,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27854.5, ans=0.1 2024-09-13 21:27:00,199 INFO [train.py:1198] (1/2) Epoch 2, batch 3450, loss[loss=0.4156, ctc_loss=0.3214, cr_loss=0.471, over 20669.00 frames. ], tot_loss[loss=0.3782, ctc_loss=0.2914, cr_loss=0.4336, over 4096018.66 frames. ], batch size: 68, lr: 3.05e-02, grad_scale: 16.0 2024-09-13 21:27:12,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=27882.833333333332, ans=0.125 2024-09-13 21:27:26,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27911.166666666668, ans=0.1 2024-09-13 21:28:02,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=27996.166666666668, ans=0.0 2024-09-13 21:28:02,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=27996.166666666668, ans=0.2 2024-09-13 21:28:16,034 INFO [train.py:1198] (1/2) Epoch 2, batch 3500, loss[loss=0.3801, ctc_loss=0.2911, cr_loss=0.445, over 20976.00 frames. ], tot_loss[loss=0.3771, ctc_loss=0.2906, cr_loss=0.4325, over 4097727.69 frames. ], batch size: 64, lr: 3.05e-02, grad_scale: 16.0 2024-09-13 21:28:50,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=28081.166666666668, ans=0.125 2024-09-13 21:28:53,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.524e+02 3.059e+02 3.692e+02 6.896e+02, threshold=6.118e+02, percent-clipped=2.0 2024-09-13 21:29:02,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=28109.5, ans=0.125 2024-09-13 21:29:33,975 INFO [train.py:1198] (1/2) Epoch 2, batch 3550, loss[loss=0.4483, ctc_loss=0.3652, cr_loss=0.4153, over 14448.00 frames. ], tot_loss[loss=0.3767, ctc_loss=0.2903, cr_loss=0.432, over 4085668.62 frames. ], batch size: 149, lr: 3.04e-02, grad_scale: 16.0 2024-09-13 21:29:58,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=28194.5, ans=10.0 2024-09-13 21:29:58,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=28194.5, ans=0.125 2024-09-13 21:30:12,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=28222.833333333332, ans=0.125 2024-09-13 21:30:35,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-09-13 21:30:40,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=28279.5, ans=0.125 2024-09-13 21:30:52,729 INFO [train.py:1198] (1/2) Epoch 2, batch 3600, loss[loss=0.3973, ctc_loss=0.2991, cr_loss=0.4908, over 20924.00 frames. ], tot_loss[loss=0.3751, ctc_loss=0.2888, cr_loss=0.4315, over 4097634.06 frames. ], batch size: 60, lr: 3.04e-02, grad_scale: 32.0 2024-09-13 21:31:27,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.592e+02 2.935e+02 3.605e+02 6.016e+02, threshold=5.870e+02, percent-clipped=0.0 2024-09-13 21:32:08,406 INFO [train.py:1198] (1/2) Epoch 2, batch 3650, loss[loss=0.4405, ctc_loss=0.3519, cr_loss=0.4426, over 18463.00 frames. ], tot_loss[loss=0.3745, ctc_loss=0.2883, cr_loss=0.4314, over 4104229.19 frames. ], batch size: 108, lr: 3.03e-02, grad_scale: 32.0 2024-09-13 21:32:24,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-13 21:32:32,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=28477.833333333332, ans=0.0 2024-09-13 21:32:36,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-13 21:32:41,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=28506.166666666668, ans=0.2 2024-09-13 21:32:47,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28506.166666666668, ans=0.1 2024-09-13 21:32:58,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=28534.5, ans=0.0 2024-09-13 21:33:23,180 INFO [train.py:1198] (1/2) Epoch 2, batch 3700, loss[loss=0.4107, ctc_loss=0.3176, cr_loss=0.4652, over 20068.00 frames. ], tot_loss[loss=0.3747, ctc_loss=0.2885, cr_loss=0.4311, over 4099205.29 frames. ], batch size: 80, lr: 3.03e-02, grad_scale: 32.0 2024-09-13 21:34:01,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.509e+02 2.913e+02 3.469e+02 6.670e+02, threshold=5.827e+02, percent-clipped=1.0 2024-09-13 21:34:01,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28647.833333333332, ans=0.1 2024-09-13 21:34:16,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28676.166666666668, ans=0.1 2024-09-13 21:34:23,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28676.166666666668, ans=0.1 2024-09-13 21:34:27,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-09-13 21:34:31,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=28704.5, ans=0.00462945652173913 2024-09-13 21:34:41,823 INFO [train.py:1198] (1/2) Epoch 2, batch 3750, loss[loss=0.3467, ctc_loss=0.2699, cr_loss=0.3838, over 20963.00 frames. ], tot_loss[loss=0.3753, ctc_loss=0.2891, cr_loss=0.4308, over 4097866.96 frames. ], batch size: 50, lr: 3.02e-02, grad_scale: 32.0 2024-09-13 21:35:22,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=28789.5, ans=0.025 2024-09-13 21:35:46,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-09-13 21:35:47,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-13 21:36:00,609 INFO [train.py:1198] (1/2) Epoch 2, batch 3800, loss[loss=0.3939, ctc_loss=0.3068, cr_loss=0.4358, over 20972.00 frames. ], tot_loss[loss=0.3737, ctc_loss=0.2877, cr_loss=0.4302, over 4107398.05 frames. ], batch size: 58, lr: 3.02e-02, grad_scale: 32.0 2024-09-13 21:36:05,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=28874.5, ans=0.0 2024-09-13 21:36:17,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=28902.833333333332, ans=0.004586340579710145 2024-09-13 21:36:31,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=28931.166666666668, ans=0.125 2024-09-13 21:36:35,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.554e+02 2.968e+02 3.753e+02 5.956e+02, threshold=5.936e+02, percent-clipped=1.0 2024-09-13 21:36:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=28959.5, ans=0.05 2024-09-13 21:36:59,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=28987.833333333332, ans=0.125 2024-09-13 21:37:15,817 INFO [train.py:1198] (1/2) Epoch 2, batch 3850, loss[loss=0.3777, ctc_loss=0.2882, cr_loss=0.4477, over 20841.00 frames. ], tot_loss[loss=0.3757, ctc_loss=0.2892, cr_loss=0.4324, over 4096591.35 frames. ], batch size: 59, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:37:49,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-13 21:37:55,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=29072.833333333332, ans=0.125 2024-09-13 21:38:10,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=29101.166666666668, ans=0.004543224637681159 2024-09-13 21:38:31,303 INFO [train.py:1198] (1/2) Epoch 2, batch 3900, loss[loss=0.4219, ctc_loss=0.3277, cr_loss=0.4708, over 21008.00 frames. ], tot_loss[loss=0.3741, ctc_loss=0.2878, cr_loss=0.4312, over 4112166.89 frames. ], batch size: 63, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:38:37,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=29157.833333333332, ans=0.025 2024-09-13 21:38:51,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29186.166666666668, ans=0.1 2024-09-13 21:39:06,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.522e+02 2.954e+02 3.504e+02 5.465e+02, threshold=5.909e+02, percent-clipped=0.0 2024-09-13 21:39:18,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29242.833333333332, ans=0.1 2024-09-13 21:39:47,318 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:39:50,079 INFO [train.py:1198] (1/2) Epoch 2, batch 3950, loss[loss=0.3785, ctc_loss=0.2916, cr_loss=0.4347, over 20814.00 frames. ], tot_loss[loss=0.3734, ctc_loss=0.2871, cr_loss=0.4315, over 4110383.21 frames. ], batch size: 59, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:40:19,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=29356.166666666668, ans=0.0 2024-09-13 21:40:55,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=29412.833333333332, ans=0.125 2024-09-13 21:41:05,505 INFO [train.py:1198] (1/2) Epoch 2, batch 4000, loss[loss=0.3756, ctc_loss=0.2848, cr_loss=0.454, over 20922.00 frames. ], tot_loss[loss=0.3727, ctc_loss=0.2865, cr_loss=0.4311, over 4116563.01 frames. ], batch size: 60, lr: 3.00e-02, grad_scale: 32.0 2024-09-13 21:41:35,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=29469.5, ans=0.125 2024-09-13 21:41:40,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=29497.833333333332, ans=0.125 2024-09-13 21:41:43,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.438e+02 2.873e+02 3.463e+02 6.218e+02, threshold=5.747e+02, percent-clipped=1.0 2024-09-13 21:42:04,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=29526.166666666668, ans=0.125 2024-09-13 21:42:16,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29554.5, ans=0.1 2024-09-13 21:42:18,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=15.0 2024-09-13 21:42:23,818 INFO [train.py:1198] (1/2) Epoch 2, batch 4050, loss[loss=0.371, ctc_loss=0.2906, cr_loss=0.4018, over 20749.00 frames. ], tot_loss[loss=0.3735, ctc_loss=0.2872, cr_loss=0.4313, over 4109854.30 frames. ], batch size: 71, lr: 3.00e-02, grad_scale: 32.0 2024-09-13 21:42:29,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=29582.833333333332, ans=0.125 2024-09-13 21:42:51,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=29611.166666666668, ans=0.125 2024-09-13 21:43:14,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=29667.833333333332, ans=0.125 2024-09-13 21:43:39,653 INFO [train.py:1198] (1/2) Epoch 2, batch 4100, loss[loss=0.3546, ctc_loss=0.2702, cr_loss=0.422, over 20882.00 frames. ], tot_loss[loss=0.374, ctc_loss=0.2877, cr_loss=0.4317, over 4102058.52 frames. ], batch size: 54, lr: 2.99e-02, grad_scale: 32.0 2024-09-13 21:43:44,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=29724.5, ans=0.0 2024-09-13 21:44:09,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29781.166666666668, ans=0.125 2024-09-13 21:44:14,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.475e+02 2.989e+02 3.810e+02 6.440e+02, threshold=5.979e+02, percent-clipped=4.0 2024-09-13 21:44:28,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2024-09-13 21:44:36,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=29809.5, ans=0.5 2024-09-13 21:44:55,205 INFO [train.py:1198] (1/2) Epoch 2, batch 4150, loss[loss=0.3757, ctc_loss=0.2862, cr_loss=0.4479, over 20833.00 frames. ], tot_loss[loss=0.3751, ctc_loss=0.2885, cr_loss=0.433, over 4086535.05 frames. ], batch size: 59, lr: 2.99e-02, grad_scale: 32.0 2024-09-13 21:45:00,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-13 21:45:13,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=29894.5, ans=0.125 2024-09-13 21:45:24,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29922.833333333332, ans=0.125 2024-09-13 21:45:45,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2024-09-13 21:46:10,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=29979.5, ans=0.0 2024-09-13 21:46:12,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30007.833333333332, ans=0.125 2024-09-13 21:46:13,561 INFO [train.py:1198] (1/2) Epoch 2, batch 4200, loss[loss=0.395, ctc_loss=0.305, cr_loss=0.4499, over 20688.00 frames. ], tot_loss[loss=0.3765, ctc_loss=0.2895, cr_loss=0.4347, over 4087974.75 frames. ], batch size: 66, lr: 2.98e-02, grad_scale: 32.0 2024-09-13 21:46:51,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.607e+02 3.097e+02 3.773e+02 6.203e+02, threshold=6.195e+02, percent-clipped=1.0 2024-09-13 21:46:51,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=30064.5, ans=0.0 2024-09-13 21:47:32,374 INFO [train.py:1198] (1/2) Epoch 2, batch 4250, loss[loss=0.31, ctc_loss=0.2317, cr_loss=0.3915, over 20963.00 frames. ], tot_loss[loss=0.376, ctc_loss=0.289, cr_loss=0.4348, over 4100921.47 frames. ], batch size: 48, lr: 2.98e-02, grad_scale: 32.0 2024-09-13 21:47:53,729 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:48:01,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30206.166666666668, ans=0.1 2024-09-13 21:48:14,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30206.166666666668, ans=0.1 2024-09-13 21:48:22,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=30234.5, ans=0.004296847826086956 2024-09-13 21:48:47,563 INFO [train.py:1198] (1/2) Epoch 2, batch 4300, loss[loss=0.3973, ctc_loss=0.3087, cr_loss=0.4432, over 20062.00 frames. ], tot_loss[loss=0.3748, ctc_loss=0.2879, cr_loss=0.4344, over 4109155.67 frames. ], batch size: 80, lr: 2.97e-02, grad_scale: 32.0 2024-09-13 21:48:58,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=30291.166666666668, ans=15.0 2024-09-13 21:49:01,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=30319.5, ans=0.125 2024-09-13 21:49:04,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=30319.5, ans=0.0 2024-09-13 21:49:22,292 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.448e+02 2.886e+02 3.660e+02 4.843e+02, threshold=5.772e+02, percent-clipped=0.0 2024-09-13 21:49:54,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30404.5, ans=0.125 2024-09-13 21:49:57,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30404.5, ans=0.1 2024-09-13 21:49:57,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=30404.5, ans=10.0 2024-09-13 21:50:03,213 INFO [train.py:1198] (1/2) Epoch 2, batch 4350, loss[loss=0.382, ctc_loss=0.2918, cr_loss=0.451, over 20845.00 frames. ], tot_loss[loss=0.3744, ctc_loss=0.2876, cr_loss=0.4342, over 4100111.28 frames. ], batch size: 65, lr: 2.97e-02, grad_scale: 32.0 2024-09-13 21:51:01,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30546.166666666668, ans=0.1 2024-09-13 21:51:06,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=30546.166666666668, ans=0.0 2024-09-13 21:51:17,954 INFO [train.py:1198] (1/2) Epoch 2, batch 4400, loss[loss=0.4069, ctc_loss=0.3142, cr_loss=0.4634, over 21076.00 frames. ], tot_loss[loss=0.3741, ctc_loss=0.2874, cr_loss=0.4336, over 4097732.75 frames. ], batch size: 59, lr: 2.96e-02, grad_scale: 32.0 2024-09-13 21:51:55,783 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.494e+02 2.792e+02 3.361e+02 5.398e+02, threshold=5.584e+02, percent-clipped=0.0 2024-09-13 21:52:31,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=30687.833333333332, ans=0.0 2024-09-13 21:52:37,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30687.833333333332, ans=0.0 2024-09-13 21:52:39,911 INFO [train.py:1198] (1/2) Epoch 2, batch 4450, loss[loss=0.3843, ctc_loss=0.2986, cr_loss=0.4284, over 21043.00 frames. ], tot_loss[loss=0.3732, ctc_loss=0.2867, cr_loss=0.4328, over 4091432.73 frames. ], batch size: 62, lr: 2.96e-02, grad_scale: 32.0 2024-09-13 21:52:40,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-09-13 21:52:43,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=30716.166666666668, ans=10.0 2024-09-13 21:52:43,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-09-13 21:52:50,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=30716.166666666668, ans=0.125 2024-09-13 21:53:13,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=30772.833333333332, ans=0.0 2024-09-13 21:53:22,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30772.833333333332, ans=0.1 2024-09-13 21:53:25,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=30801.166666666668, ans=0.004173659420289856 2024-09-13 21:53:31,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=30801.166666666668, ans=0.004173659420289856 2024-09-13 21:53:31,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30801.166666666668, ans=0.125 2024-09-13 21:53:32,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30801.166666666668, ans=0.125 2024-09-13 21:53:55,512 INFO [train.py:1198] (1/2) Epoch 2, batch 4500, loss[loss=0.3476, ctc_loss=0.2665, cr_loss=0.4052, over 20294.00 frames. ], tot_loss[loss=0.372, ctc_loss=0.2856, cr_loss=0.4322, over 4087530.98 frames. ], batch size: 45, lr: 2.95e-02, grad_scale: 16.0 2024-09-13 21:54:05,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-13 21:54:31,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.641e+02 3.317e+02 3.996e+02 5.933e+02, threshold=6.633e+02, percent-clipped=3.0 2024-09-13 21:54:41,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30942.833333333332, ans=0.125 2024-09-13 21:54:53,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=30942.833333333332, ans=0.125 2024-09-13 21:55:02,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=30971.166666666668, ans=0.125 2024-09-13 21:55:03,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-13 21:55:11,451 INFO [train.py:1198] (1/2) Epoch 2, batch 4550, loss[loss=0.3709, ctc_loss=0.2838, cr_loss=0.4358, over 20929.00 frames. ], tot_loss[loss=0.3693, ctc_loss=0.2832, cr_loss=0.4303, over 4101699.70 frames. ], batch size: 60, lr: 2.95e-02, grad_scale: 16.0 2024-09-13 21:55:18,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-13 21:55:48,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=31056.166666666668, ans=0.125 2024-09-13 21:56:04,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31084.5, ans=0.1 2024-09-13 21:56:11,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31112.833333333332, ans=0.1 2024-09-13 21:56:23,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=31112.833333333332, ans=0.125 2024-09-13 21:56:27,109 INFO [train.py:1198] (1/2) Epoch 2, batch 4600, loss[loss=0.3667, ctc_loss=0.288, cr_loss=0.3932, over 20373.00 frames. ], tot_loss[loss=0.3697, ctc_loss=0.2834, cr_loss=0.4312, over 4109075.92 frames. ], batch size: 74, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:56:28,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=31141.166666666668, ans=0.125 2024-09-13 21:56:36,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=31141.166666666668, ans=22.5 2024-09-13 21:56:39,668 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:56:46,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2024-09-13 21:57:02,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-09-13 21:57:03,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.477e+02 2.896e+02 3.591e+02 8.040e+02, threshold=5.792e+02, percent-clipped=3.0 2024-09-13 21:57:03,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=31197.833333333332, ans=0.2 2024-09-13 21:57:09,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=31197.833333333332, ans=0.0 2024-09-13 21:57:45,445 INFO [train.py:1198] (1/2) Epoch 2, batch 4650, loss[loss=0.3284, ctc_loss=0.248, cr_loss=0.4021, over 20877.00 frames. ], tot_loss[loss=0.3699, ctc_loss=0.2835, cr_loss=0.4319, over 4106178.28 frames. ], batch size: 57, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:57:47,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=31282.833333333332, ans=0.125 2024-09-13 21:58:48,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=31396.166666666668, ans=0.04949747468305833 2024-09-13 21:58:54,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=31396.166666666668, ans=0.125 2024-09-13 21:59:04,604 INFO [train.py:1198] (1/2) Epoch 2, batch 4700, loss[loss=0.417, ctc_loss=0.3176, cr_loss=0.4969, over 20073.00 frames. ], tot_loss[loss=0.3694, ctc_loss=0.2832, cr_loss=0.4311, over 4103148.23 frames. ], batch size: 80, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:59:05,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-13 21:59:40,797 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.365e+02 2.733e+02 3.647e+02 6.319e+02, threshold=5.467e+02, percent-clipped=3.0 2024-09-13 22:00:20,081 INFO [train.py:1198] (1/2) Epoch 2, batch 4750, loss[loss=0.3958, ctc_loss=0.3044, cr_loss=0.4571, over 20672.00 frames. ], tot_loss[loss=0.3692, ctc_loss=0.283, cr_loss=0.4311, over 4112198.66 frames. ], batch size: 71, lr: 2.93e-02, grad_scale: 16.0 2024-09-13 22:00:20,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-13 22:00:22,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-09-13 22:00:25,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31566.166666666668, ans=0.0 2024-09-13 22:00:41,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=31594.5, ans=0.2 2024-09-13 22:00:46,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=31594.5, ans=0.0 2024-09-13 22:00:57,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31622.833333333332, ans=0.1 2024-09-13 22:00:57,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=31622.833333333332, ans=0.2 2024-09-13 22:01:26,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=31679.5, ans=0.2 2024-09-13 22:01:35,366 INFO [train.py:1198] (1/2) Epoch 2, batch 4800, loss[loss=0.353, ctc_loss=0.2736, cr_loss=0.3965, over 20982.00 frames. ], tot_loss[loss=0.3703, ctc_loss=0.2841, cr_loss=0.4312, over 4094713.74 frames. ], batch size: 55, lr: 2.93e-02, grad_scale: 32.0 2024-09-13 22:01:41,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=31707.833333333332, ans=0.125 2024-09-13 22:02:01,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=31736.166666666668, ans=0.125 2024-09-13 22:02:06,584 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:02:09,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31764.5, ans=0.125 2024-09-13 22:02:12,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.403e+02 2.827e+02 3.591e+02 5.312e+02, threshold=5.653e+02, percent-clipped=0.0 2024-09-13 22:02:23,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=31792.833333333332, ans=0.003958079710144928 2024-09-13 22:02:24,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31792.833333333332, ans=0.1 2024-09-13 22:02:42,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31821.166666666668, ans=0.0 2024-09-13 22:02:45,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31821.166666666668, ans=0.125 2024-09-13 22:02:51,331 INFO [train.py:1198] (1/2) Epoch 2, batch 4850, loss[loss=0.3166, ctc_loss=0.238, cr_loss=0.3926, over 20968.00 frames. ], tot_loss[loss=0.3706, ctc_loss=0.2843, cr_loss=0.4316, over 4096213.05 frames. ], batch size: 51, lr: 2.92e-02, grad_scale: 32.0 2024-09-13 22:02:59,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-09-13 22:03:15,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=31877.833333333332, ans=0.1 2024-09-13 22:03:25,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=31906.166666666668, ans=0.025 2024-09-13 22:04:13,416 INFO [train.py:1198] (1/2) Epoch 2, batch 4900, loss[loss=0.4055, ctc_loss=0.3115, cr_loss=0.4703, over 20737.00 frames. ], tot_loss[loss=0.3704, ctc_loss=0.2842, cr_loss=0.4309, over 4080281.09 frames. ], batch size: 71, lr: 2.92e-02, grad_scale: 32.0 2024-09-13 22:04:38,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=32019.5, ans=0.0 2024-09-13 22:04:49,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.447e+02 2.815e+02 3.133e+02 5.324e+02, threshold=5.630e+02, percent-clipped=0.0 2024-09-13 22:04:59,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32076.166666666668, ans=0.1 2024-09-13 22:05:05,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=32076.166666666668, ans=0.003896485507246376 2024-09-13 22:05:12,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=32104.5, ans=0.125 2024-09-13 22:05:23,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32104.5, ans=0.125 2024-09-13 22:05:27,988 INFO [train.py:1198] (1/2) Epoch 2, batch 4950, loss[loss=0.3473, ctc_loss=0.265, cr_loss=0.4117, over 21008.00 frames. ], tot_loss[loss=0.3678, ctc_loss=0.2818, cr_loss=0.4301, over 4095436.26 frames. ], batch size: 61, lr: 2.91e-02, grad_scale: 32.0 2024-09-13 22:06:03,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32189.5, ans=0.1 2024-09-13 22:06:11,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=32217.833333333332, ans=15.0 2024-09-13 22:06:14,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32217.833333333332, ans=0.125 2024-09-13 22:06:15,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=32217.833333333332, ans=0.0038656884057971016 2024-09-13 22:06:42,391 INFO [train.py:1198] (1/2) Epoch 2, batch 5000, loss[loss=0.3068, ctc_loss=0.2291, cr_loss=0.3887, over 19936.00 frames. ], tot_loss[loss=0.3681, ctc_loss=0.2821, cr_loss=0.4297, over 4101824.78 frames. ], batch size: 44, lr: 2.91e-02, grad_scale: 32.0 2024-09-13 22:06:42,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=32274.5, ans=0.05 2024-09-13 22:07:11,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2024-09-13 22:07:18,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.453e+02 2.845e+02 3.442e+02 5.343e+02, threshold=5.689e+02, percent-clipped=0.0 2024-09-13 22:07:21,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=32331.166666666668, ans=0.125 2024-09-13 22:07:21,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=32331.166666666668, ans=0.025 2024-09-13 22:07:57,345 INFO [train.py:1198] (1/2) Epoch 2, batch 5050, loss[loss=0.3842, ctc_loss=0.2958, cr_loss=0.4418, over 21033.00 frames. ], tot_loss[loss=0.3692, ctc_loss=0.283, cr_loss=0.4307, over 4103313.96 frames. ], batch size: 61, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:07:59,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=32416.166666666668, ans=0.05 2024-09-13 22:08:52,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=32501.166666666668, ans=0.125 2024-09-13 22:09:10,932 INFO [train.py:1198] (1/2) Epoch 2, batch 5100, loss[loss=0.3495, ctc_loss=0.2602, cr_loss=0.4465, over 20881.00 frames. ], tot_loss[loss=0.3687, ctc_loss=0.2826, cr_loss=0.4306, over 4103680.77 frames. ], batch size: 54, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:09:29,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=32586.166666666668, ans=0.0 2024-09-13 22:09:46,715 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.554e+02 2.891e+02 3.791e+02 6.853e+02, threshold=5.783e+02, percent-clipped=7.0 2024-09-13 22:09:47,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=32614.5, ans=0.0 2024-09-13 22:10:25,634 INFO [train.py:1198] (1/2) Epoch 2, batch 5150, loss[loss=0.3688, ctc_loss=0.2826, cr_loss=0.4314, over 20781.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.2813, cr_loss=0.4294, over 4100737.97 frames. ], batch size: 53, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:10:38,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-13 22:10:41,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-13 22:11:02,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=32756.166666666668, ans=0.125 2024-09-13 22:11:08,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=32784.5, ans=0.0037424999999999993 2024-09-13 22:11:38,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=32841.166666666664, ans=0.125 2024-09-13 22:11:39,987 INFO [train.py:1198] (1/2) Epoch 2, batch 5200, loss[loss=0.3537, ctc_loss=0.2681, cr_loss=0.4282, over 20780.00 frames. ], tot_loss[loss=0.3673, ctc_loss=0.2816, cr_loss=0.4287, over 4088520.86 frames. ], batch size: 56, lr: 2.89e-02, grad_scale: 32.0 2024-09-13 22:12:18,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.583e+02 3.016e+02 3.697e+02 5.995e+02, threshold=6.031e+02, percent-clipped=1.0 2024-09-13 22:12:45,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=32954.5, ans=0.2 2024-09-13 22:12:57,136 INFO [train.py:1198] (1/2) Epoch 2, batch 5250, loss[loss=0.4087, ctc_loss=0.3138, cr_loss=0.4743, over 20859.00 frames. ], tot_loss[loss=0.3674, ctc_loss=0.2816, cr_loss=0.4293, over 4091006.26 frames. ], batch size: 65, lr: 2.89e-02, grad_scale: 32.0 2024-09-13 22:13:04,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=32982.833333333336, ans=0.2 2024-09-13 22:13:10,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-13 22:13:32,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33039.5, ans=0.125 2024-09-13 22:13:51,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=33067.833333333336, ans=0.125 2024-09-13 22:14:13,851 INFO [train.py:1198] (1/2) Epoch 2, batch 5300, loss[loss=0.3843, ctc_loss=0.2969, cr_loss=0.4369, over 20873.00 frames. ], tot_loss[loss=0.3678, ctc_loss=0.2818, cr_loss=0.4298, over 4097208.46 frames. ], batch size: 57, lr: 2.88e-02, grad_scale: 32.0 2024-09-13 22:14:25,776 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:14:33,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=33152.833333333336, ans=0.0 2024-09-13 22:14:49,499 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.444e+02 2.819e+02 3.355e+02 5.739e+02, threshold=5.638e+02, percent-clipped=0.0 2024-09-13 22:15:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=33209.5, ans=0.125 2024-09-13 22:15:28,413 INFO [train.py:1198] (1/2) Epoch 2, batch 5350, loss[loss=0.3858, ctc_loss=0.293, cr_loss=0.4641, over 20677.00 frames. ], tot_loss[loss=0.3695, ctc_loss=0.283, cr_loss=0.4324, over 4099717.46 frames. ], batch size: 71, lr: 2.88e-02, grad_scale: 32.0 2024-09-13 22:15:46,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=33294.5, ans=0.0 2024-09-13 22:16:23,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=33351.166666666664, ans=0.2 2024-09-13 22:16:24,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=33351.166666666664, ans=0.125 2024-09-13 22:16:26,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=33379.5, ans=0.125 2024-09-13 22:16:42,195 INFO [train.py:1198] (1/2) Epoch 2, batch 5400, loss[loss=0.3718, ctc_loss=0.2828, cr_loss=0.4451, over 20996.00 frames. ], tot_loss[loss=0.366, ctc_loss=0.2802, cr_loss=0.4292, over 4106767.83 frames. ], batch size: 64, lr: 2.87e-02, grad_scale: 32.0 2024-09-13 22:16:46,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=33407.833333333336, ans=0.003606992753623188 2024-09-13 22:16:54,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=33407.833333333336, ans=0.0 2024-09-13 22:16:57,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=33436.166666666664, ans=0.025 2024-09-13 22:16:57,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=33436.166666666664, ans=10.0 2024-09-13 22:17:17,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.537e+02 2.920e+02 3.595e+02 6.063e+02, threshold=5.839e+02, percent-clipped=1.0 2024-09-13 22:17:42,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-13 22:17:56,119 INFO [train.py:1198] (1/2) Epoch 2, batch 5450, loss[loss=0.4448, ctc_loss=0.3541, cr_loss=0.4535, over 14146.00 frames. ], tot_loss[loss=0.3672, ctc_loss=0.2813, cr_loss=0.4295, over 4082468.45 frames. ], batch size: 150, lr: 2.87e-02, grad_scale: 32.0 2024-09-13 22:18:06,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=33549.5, ans=0.0 2024-09-13 22:18:20,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=33577.833333333336, ans=0.125 2024-09-13 22:19:04,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=33662.833333333336, ans=0.025 2024-09-13 22:19:05,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2024-09-13 22:19:10,428 INFO [train.py:1198] (1/2) Epoch 2, batch 5500, loss[loss=0.4151, ctc_loss=0.3241, cr_loss=0.4551, over 20665.00 frames. ], tot_loss[loss=0.3676, ctc_loss=0.2815, cr_loss=0.4302, over 4095541.13 frames. ], batch size: 68, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:19:40,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=33747.833333333336, ans=0.0 2024-09-13 22:19:46,229 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.566e+02 3.176e+02 3.818e+02 6.045e+02, threshold=6.352e+02, percent-clipped=1.0 2024-09-13 22:19:51,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33747.833333333336, ans=0.125 2024-09-13 22:19:59,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=33776.166666666664, ans=0.0 2024-09-13 22:20:23,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-09-13 22:20:24,749 INFO [train.py:1198] (1/2) Epoch 2, batch 5550, loss[loss=0.4321, ctc_loss=0.3359, cr_loss=0.481, over 18261.00 frames. ], tot_loss[loss=0.3665, ctc_loss=0.2806, cr_loss=0.4296, over 4097362.23 frames. ], batch size: 108, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:21:02,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=33889.5, ans=0.025 2024-09-13 22:21:19,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=33917.833333333336, ans=0.125 2024-09-13 22:21:41,738 INFO [train.py:1198] (1/2) Epoch 2, batch 5600, loss[loss=0.3192, ctc_loss=0.247, cr_loss=0.3612, over 20938.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.2811, cr_loss=0.43, over 4096908.90 frames. ], batch size: 50, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:22:15,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=34031.166666666664, ans=0.125 2024-09-13 22:22:17,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.493e+02 2.811e+02 3.455e+02 6.224e+02, threshold=5.621e+02, percent-clipped=0.0 2024-09-13 22:22:21,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34031.166666666664, ans=0.1 2024-09-13 22:22:29,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=34059.5, ans=0.0034653260869565213 2024-09-13 22:22:58,563 INFO [train.py:1198] (1/2) Epoch 2, batch 5650, loss[loss=0.3796, ctc_loss=0.2932, cr_loss=0.4321, over 20684.00 frames. ], tot_loss[loss=0.368, ctc_loss=0.2819, cr_loss=0.4308, over 4102245.55 frames. ], batch size: 66, lr: 2.85e-02, grad_scale: 32.0 2024-09-13 22:23:10,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=34116.166666666664, ans=0.0 2024-09-13 22:23:50,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=34201.166666666664, ans=0.125 2024-09-13 22:24:01,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34229.5, ans=0.125 2024-09-13 22:24:11,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34257.833333333336, ans=0.1 2024-09-13 22:24:13,018 INFO [train.py:1198] (1/2) Epoch 2, batch 5700, loss[loss=0.3561, ctc_loss=0.2674, cr_loss=0.4434, over 20888.00 frames. ], tot_loss[loss=0.3679, ctc_loss=0.2817, cr_loss=0.4314, over 4092931.22 frames. ], batch size: 57, lr: 2.85e-02, grad_scale: 32.0 2024-09-13 22:24:30,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34286.166666666664, ans=0.1 2024-09-13 22:24:44,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=34314.5, ans=0.09899494936611666 2024-09-13 22:24:48,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.427e+02 2.786e+02 3.610e+02 6.157e+02, threshold=5.572e+02, percent-clipped=1.0 2024-09-13 22:24:50,329 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:25:02,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34342.833333333336, ans=0.125 2024-09-13 22:25:02,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34342.833333333336, ans=0.125 2024-09-13 22:25:08,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34342.833333333336, ans=0.125 2024-09-13 22:25:27,249 INFO [train.py:1198] (1/2) Epoch 2, batch 5750, loss[loss=0.3331, ctc_loss=0.2481, cr_loss=0.4249, over 20979.00 frames. ], tot_loss[loss=0.3675, ctc_loss=0.2812, cr_loss=0.4314, over 4093047.52 frames. ], batch size: 52, lr: 2.84e-02, grad_scale: 32.0 2024-09-13 22:25:36,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=34399.5, ans=0.003391413043478261 2024-09-13 22:25:51,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=34427.833333333336, ans=0.125 2024-09-13 22:26:08,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-13 22:26:19,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-09-13 22:26:40,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=34541.166666666664, ans=0.2 2024-09-13 22:26:42,073 INFO [train.py:1198] (1/2) Epoch 2, batch 5800, loss[loss=0.3354, ctc_loss=0.2566, cr_loss=0.3942, over 20944.00 frames. ], tot_loss[loss=0.3672, ctc_loss=0.2808, cr_loss=0.4318, over 4091549.15 frames. ], batch size: 51, lr: 2.84e-02, grad_scale: 32.0 2024-09-13 22:26:48,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=34541.166666666664, ans=0.125 2024-09-13 22:26:49,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=34541.166666666664, ans=0.025 2024-09-13 22:26:58,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=34569.5, ans=0.125 2024-09-13 22:27:17,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.563e+02 3.034e+02 3.601e+02 5.396e+02, threshold=6.068e+02, percent-clipped=0.0 2024-09-13 22:27:22,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=34597.833333333336, ans=0.2 2024-09-13 22:27:44,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-13 22:27:51,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=34654.5, ans=0.125 2024-09-13 22:27:55,909 INFO [train.py:1198] (1/2) Epoch 2, batch 5850, loss[loss=0.4264, ctc_loss=0.3303, cr_loss=0.4806, over 18269.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.2808, cr_loss=0.4317, over 4085155.02 frames. ], batch size: 108, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:28:06,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=34682.833333333336, ans=0.125 2024-09-13 22:28:09,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34711.166666666664, ans=0.1 2024-09-13 22:28:35,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=34739.5, ans=0.2 2024-09-13 22:28:54,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34796.166666666664, ans=0.125 2024-09-13 22:28:55,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=34796.166666666664, ans=0.125 2024-09-13 22:29:10,272 INFO [train.py:1198] (1/2) Epoch 2, batch 5900, loss[loss=0.3529, ctc_loss=0.27, cr_loss=0.4145, over 20971.00 frames. ], tot_loss[loss=0.3674, ctc_loss=0.2812, cr_loss=0.4313, over 4081485.19 frames. ], batch size: 55, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:29:24,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-13 22:29:44,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=34881.166666666664, ans=0.025 2024-09-13 22:29:46,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.494e+02 2.855e+02 3.280e+02 5.796e+02, threshold=5.711e+02, percent-clipped=0.0 2024-09-13 22:30:27,156 INFO [train.py:1198] (1/2) Epoch 2, batch 5950, loss[loss=0.3492, ctc_loss=0.2649, cr_loss=0.4219, over 20864.00 frames. ], tot_loss[loss=0.3651, ctc_loss=0.2792, cr_loss=0.4295, over 4077099.85 frames. ], batch size: 57, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:30:37,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34966.166666666664, ans=0.1 2024-09-13 22:30:39,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=34966.166666666664, ans=0.125 2024-09-13 22:31:43,766 INFO [train.py:1198] (1/2) Epoch 2, batch 6000, loss[loss=0.345, ctc_loss=0.2639, cr_loss=0.4053, over 21071.00 frames. ], tot_loss[loss=0.3656, ctc_loss=0.2797, cr_loss=0.4294, over 4082005.26 frames. ], batch size: 56, lr: 2.82e-02, grad_scale: 32.0 2024-09-13 22:31:43,767 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 22:32:10,261 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.1047, ctc_loss=0.1047, cr_loss=9.365e-15, over 944034.00 frames. 2024-09-13 22:32:10,262 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 22:32:11,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=35107.833333333336, ans=0.95 2024-09-13 22:32:42,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=35164.5, ans=0.125 2024-09-13 22:32:45,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.538e+02 2.936e+02 3.462e+02 7.631e+02, threshold=5.872e+02, percent-clipped=1.0 2024-09-13 22:33:12,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-09-13 22:33:24,199 INFO [train.py:1198] (1/2) Epoch 2, batch 6050, loss[loss=0.2981, ctc_loss=0.2229, cr_loss=0.3762, over 20933.00 frames. ], tot_loss[loss=0.3659, ctc_loss=0.2799, cr_loss=0.43, over 4092602.37 frames. ], batch size: 48, lr: 2.82e-02, grad_scale: 32.0 2024-09-13 22:34:30,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=35362.833333333336, ans=0.1 2024-09-13 22:34:39,327 INFO [train.py:1198] (1/2) Epoch 2, batch 6100, loss[loss=0.2901, ctc_loss=0.2161, cr_loss=0.3699, over 20972.00 frames. ], tot_loss[loss=0.3653, ctc_loss=0.2793, cr_loss=0.4301, over 4091546.63 frames. ], batch size: 50, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:34:39,809 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:35:10,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35447.833333333336, ans=0.125 2024-09-13 22:35:14,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.502e+02 2.994e+02 3.822e+02 5.911e+02, threshold=5.988e+02, percent-clipped=1.0 2024-09-13 22:35:15,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2024-09-13 22:35:42,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=35504.5, ans=0.125 2024-09-13 22:35:52,642 INFO [train.py:1198] (1/2) Epoch 2, batch 6150, loss[loss=0.4034, ctc_loss=0.3121, cr_loss=0.4566, over 20670.00 frames. ], tot_loss[loss=0.3679, ctc_loss=0.2818, cr_loss=0.4305, over 4053589.06 frames. ], batch size: 66, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:36:06,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35561.166666666664, ans=0.1 2024-09-13 22:36:14,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=35561.166666666664, ans=0.95 2024-09-13 22:36:14,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35561.166666666664, ans=0.125 2024-09-13 22:36:38,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=35617.833333333336, ans=0.025 2024-09-13 22:37:03,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=35646.166666666664, ans=0.125 2024-09-13 22:37:05,869 INFO [train.py:1198] (1/2) Epoch 2, batch 6200, loss[loss=0.3419, ctc_loss=0.2637, cr_loss=0.3908, over 21079.00 frames. ], tot_loss[loss=0.3697, ctc_loss=0.2832, cr_loss=0.4321, over 4046121.41 frames. ], batch size: 59, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:37:35,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=35731.166666666664, ans=0.2 2024-09-13 22:37:41,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.462e+02 2.815e+02 3.577e+02 5.826e+02, threshold=5.630e+02, percent-clipped=0.0 2024-09-13 22:37:44,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-13 22:38:08,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35787.833333333336, ans=0.1 2024-09-13 22:38:10,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2024-09-13 22:38:19,808 INFO [train.py:1198] (1/2) Epoch 2, batch 6250, loss[loss=0.3885, ctc_loss=0.2978, cr_loss=0.4536, over 19563.00 frames. ], tot_loss[loss=0.3706, ctc_loss=0.284, cr_loss=0.4329, over 4035265.48 frames. ], batch size: 90, lr: 2.80e-02, grad_scale: 32.0 2024-09-13 22:38:23,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2024-09-13 22:38:31,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2024-09-13 22:39:03,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=35901.166666666664, ans=0.04949747468305833 2024-09-13 22:39:26,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=35929.5, ans=0.05 2024-09-13 22:39:26,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-09-13 22:39:27,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=35929.5, ans=0.025 2024-09-13 22:39:33,107 INFO [train.py:1198] (1/2) Epoch 2, batch 6300, loss[loss=0.3765, ctc_loss=0.2943, cr_loss=0.4109, over 20708.00 frames. ], tot_loss[loss=0.3748, ctc_loss=0.2878, cr_loss=0.4348, over 3980489.90 frames. ], batch size: 71, lr: 2.80e-02, grad_scale: 32.0 2024-09-13 22:39:36,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2024-09-13 22:40:03,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2024-09-13 22:40:08,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.583e+02 2.883e+02 3.459e+02 8.852e+02, threshold=5.766e+02, percent-clipped=1.0 2024-09-13 22:40:41,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36071.166666666664, ans=0.1 2024-09-13 22:40:45,364 INFO [train.py:1198] (1/2) Epoch 2, batch 6350, loss[loss=0.4355, ctc_loss=0.3471, cr_loss=0.4421, over 14054.00 frames. ], tot_loss[loss=0.3838, ctc_loss=0.2965, cr_loss=0.4365, over 3802213.21 frames. ], batch size: 150, lr: 2.79e-02, grad_scale: 32.0 2024-09-13 22:41:04,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=36127.833333333336, ans=0.025 2024-09-13 22:41:38,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36184.5, ans=0.125 2024-09-13 22:42:31,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36215.666666666664, ans=0.125 2024-09-13 22:42:32,921 INFO [train.py:1198] (1/2) Epoch 3, batch 0, loss[loss=0.3591, ctc_loss=0.2789, cr_loss=0.4011, over 20948.00 frames. ], tot_loss[loss=0.3591, ctc_loss=0.2789, cr_loss=0.4011, over 20948.00 frames. ], batch size: 49, lr: 2.65e-02, grad_scale: 32.0 2024-09-13 22:42:32,921 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-13 22:42:54,043 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.107, ctc_loss=0.107, cr_loss=1.077e-14, over 944034.00 frames. 2024-09-13 22:42:54,044 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-13 22:42:57,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=36215.666666666664, ans=0.0029965942028985513 2024-09-13 22:43:01,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=36215.666666666664, ans=0.125 2024-09-13 22:43:18,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=36244.0, ans=0.002990434782608696 2024-09-13 22:43:28,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36272.333333333336, ans=0.1 2024-09-13 22:43:44,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.481e+02 2.860e+02 3.412e+02 5.235e+02, threshold=5.719e+02, percent-clipped=0.0 2024-09-13 22:43:44,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=36300.666666666664, ans=0.0 2024-09-13 22:43:50,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=36300.666666666664, ans=0.125 2024-09-13 22:43:52,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=36300.666666666664, ans=0.0 2024-09-13 22:43:57,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=36329.0, ans=0.125 2024-09-13 22:44:08,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-13 22:44:09,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=36357.333333333336, ans=0.025 2024-09-13 22:44:10,436 INFO [train.py:1198] (1/2) Epoch 3, batch 50, loss[loss=0.3426, ctc_loss=0.2586, cr_loss=0.4198, over 20834.00 frames. ], tot_loss[loss=0.3669, ctc_loss=0.2809, cr_loss=0.4298, over 925938.20 frames. ], batch size: 59, lr: 2.65e-02, grad_scale: 32.0 2024-09-13 22:45:28,663 INFO [train.py:1198] (1/2) Epoch 3, batch 100, loss[loss=0.4299, ctc_loss=0.3423, cr_loss=0.4381, over 14523.00 frames. ], tot_loss[loss=0.3631, ctc_loss=0.2775, cr_loss=0.4281, over 1611489.76 frames. ], batch size: 149, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:45:47,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=36527.333333333336, ans=0.5 2024-09-13 22:45:54,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-13 22:46:08,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36555.666666666664, ans=0.1 2024-09-13 22:46:11,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=36555.666666666664, ans=0.125 2024-09-13 22:46:23,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.307e+02 2.521e+02 3.027e+02 4.456e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-13 22:46:47,596 INFO [train.py:1198] (1/2) Epoch 3, batch 150, loss[loss=0.3572, ctc_loss=0.2725, cr_loss=0.4239, over 20767.00 frames. ], tot_loss[loss=0.3656, ctc_loss=0.2792, cr_loss=0.4321, over 2163513.32 frames. ], batch size: 56, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:47:01,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36669.0, ans=0.125 2024-09-13 22:47:02,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=36669.0, ans=0.125 2024-09-13 22:47:45,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=36754.0, ans=0.0028795652173913043 2024-09-13 22:47:59,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=36754.0, ans=0.0 2024-09-13 22:48:02,149 INFO [train.py:1198] (1/2) Epoch 3, batch 200, loss[loss=0.3211, ctc_loss=0.2427, cr_loss=0.3918, over 20961.00 frames. ], tot_loss[loss=0.3615, ctc_loss=0.2756, cr_loss=0.4293, over 2580979.95 frames. ], batch size: 48, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:48:12,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36782.333333333336, ans=0.1 2024-09-13 22:48:43,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-13 22:48:53,338 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.432e+02 2.851e+02 3.646e+02 7.141e+02, threshold=5.701e+02, percent-clipped=3.0 2024-09-13 22:49:05,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36895.666666666664, ans=0.1 2024-09-13 22:49:17,351 INFO [train.py:1198] (1/2) Epoch 3, batch 250, loss[loss=0.3155, ctc_loss=0.2338, cr_loss=0.4081, over 20900.00 frames. ], tot_loss[loss=0.3591, ctc_loss=0.2735, cr_loss=0.4282, over 2915596.12 frames. ], batch size: 54, lr: 2.63e-02, grad_scale: 32.0 2024-09-13 22:49:20,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=36924.0, ans=0.04949747468305833 2024-09-13 22:49:57,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=36980.666666666664, ans=0.125 2024-09-13 22:49:58,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=36980.666666666664, ans=0.125 2024-09-13 22:50:19,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.51 vs. limit=22.5 2024-09-13 22:50:31,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37037.333333333336, ans=0.125 2024-09-13 22:50:33,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=37037.333333333336, ans=15.0 2024-09-13 22:50:37,205 INFO [train.py:1198] (1/2) Epoch 3, batch 300, loss[loss=0.3124, ctc_loss=0.2308, cr_loss=0.4077, over 20977.00 frames. ], tot_loss[loss=0.3583, ctc_loss=0.2727, cr_loss=0.4278, over 3177059.81 frames. ], batch size: 50, lr: 2.63e-02, grad_scale: 32.0 2024-09-13 22:50:37,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=37065.666666666664, ans=0.125 2024-09-13 22:50:38,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=37065.666666666664, ans=0.2 2024-09-13 22:50:49,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-13 22:50:59,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=37094.0, ans=0.5 2024-09-13 22:51:13,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37122.333333333336, ans=0.0 2024-09-13 22:51:27,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.474e+02 2.870e+02 3.354e+02 5.882e+02, threshold=5.740e+02, percent-clipped=1.0 2024-09-13 22:51:56,176 INFO [train.py:1198] (1/2) Epoch 3, batch 350, loss[loss=0.3613, ctc_loss=0.2745, cr_loss=0.4341, over 20778.00 frames. ], tot_loss[loss=0.3586, ctc_loss=0.2731, cr_loss=0.4275, over 3362437.88 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:52:19,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37235.666666666664, ans=0.1 2024-09-13 22:52:21,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=37235.666666666664, ans=0.0 2024-09-13 22:52:24,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=37235.666666666664, ans=0.2 2024-09-13 22:53:11,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=37349.0, ans=0.2 2024-09-13 22:53:12,716 INFO [train.py:1198] (1/2) Epoch 3, batch 400, loss[loss=0.3834, ctc_loss=0.2932, cr_loss=0.4506, over 20821.00 frames. ], tot_loss[loss=0.3587, ctc_loss=0.2732, cr_loss=0.4275, over 3536197.70 frames. ], batch size: 65, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:53:16,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=37349.0, ans=0.0027502173913043475 2024-09-13 22:53:25,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=37349.0, ans=0.0 2024-09-13 22:53:36,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=12.0 2024-09-13 22:54:04,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.394e+02 2.667e+02 3.254e+02 6.566e+02, threshold=5.335e+02, percent-clipped=1.0 2024-09-13 22:54:12,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37462.333333333336, ans=0.125 2024-09-13 22:54:12,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=37462.333333333336, ans=0.2 2024-09-13 22:54:21,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=37462.333333333336, ans=0.002725579710144926 2024-09-13 22:54:28,867 INFO [train.py:1198] (1/2) Epoch 3, batch 450, loss[loss=0.4001, ctc_loss=0.3055, cr_loss=0.4733, over 20662.00 frames. ], tot_loss[loss=0.3575, ctc_loss=0.2719, cr_loss=0.4277, over 3667855.94 frames. ], batch size: 66, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:54:42,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=37519.0, ans=0.2 2024-09-13 22:55:01,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=37547.333333333336, ans=0.0027071014492753617 2024-09-13 22:55:05,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=37547.333333333336, ans=0.125 2024-09-13 22:55:29,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=37604.0, ans=0.2 2024-09-13 22:55:38,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=37604.0, ans=0.035 2024-09-13 22:55:42,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=37604.0, ans=0.07 2024-09-13 22:55:42,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-13 22:55:44,645 INFO [train.py:1198] (1/2) Epoch 3, batch 500, loss[loss=0.3986, ctc_loss=0.3077, cr_loss=0.4547, over 21083.00 frames. ], tot_loss[loss=0.3571, ctc_loss=0.2715, cr_loss=0.4279, over 3763378.72 frames. ], batch size: 59, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:56:13,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=37660.666666666664, ans=0.125 2024-09-13 22:56:38,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.396e+02 2.755e+02 3.330e+02 7.145e+02, threshold=5.509e+02, percent-clipped=4.0 2024-09-13 22:57:02,957 INFO [train.py:1198] (1/2) Epoch 3, batch 550, loss[loss=0.346, ctc_loss=0.2603, cr_loss=0.4287, over 20756.00 frames. ], tot_loss[loss=0.3577, ctc_loss=0.2721, cr_loss=0.4283, over 3835472.81 frames. ], batch size: 56, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:58:13,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=37887.333333333336, ans=0.125 2024-09-13 22:58:14,714 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:58:21,881 INFO [train.py:1198] (1/2) Epoch 3, batch 600, loss[loss=0.3815, ctc_loss=0.2913, cr_loss=0.451, over 21063.00 frames. ], tot_loss[loss=0.3569, ctc_loss=0.2712, cr_loss=0.4285, over 3898836.56 frames. ], batch size: 59, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:58:47,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=37944.0, ans=0.2 2024-09-13 22:58:48,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=37944.0, ans=0.0 2024-09-13 22:59:03,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=37972.333333333336, ans=0.125 2024-09-13 22:59:09,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=22.5 2024-09-13 22:59:13,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.427e+02 2.740e+02 3.265e+02 5.783e+02, threshold=5.479e+02, percent-clipped=1.0 2024-09-13 22:59:37,816 INFO [train.py:1198] (1/2) Epoch 3, batch 650, loss[loss=0.37, ctc_loss=0.2821, cr_loss=0.4396, over 20638.00 frames. ], tot_loss[loss=0.3563, ctc_loss=0.2706, cr_loss=0.4282, over 3947430.50 frames. ], batch size: 66, lr: 2.60e-02, grad_scale: 32.0 2024-09-13 22:59:58,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2024-09-13 23:00:00,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=38085.666666666664, ans=0.0025900724637681173 2024-09-13 23:00:01,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=38085.666666666664, ans=0.125 2024-09-13 23:00:07,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=38114.0, ans=0.02 2024-09-13 23:00:23,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=38142.333333333336, ans=0.125 2024-09-13 23:00:24,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=38142.333333333336, ans=0.0 2024-09-13 23:00:52,651 INFO [train.py:1198] (1/2) Epoch 3, batch 700, loss[loss=0.3934, ctc_loss=0.3002, cr_loss=0.466, over 21033.00 frames. ], tot_loss[loss=0.3591, ctc_loss=0.2731, cr_loss=0.4303, over 3975354.70 frames. ], batch size: 62, lr: 2.60e-02, grad_scale: 32.0 2024-09-13 23:00:55,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=38199.0, ans=0.2 2024-09-13 23:01:03,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-13 23:01:43,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=38284.0, ans=0.0025469565217391297 2024-09-13 23:01:44,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.506e+02 2.757e+02 3.512e+02 5.332e+02, threshold=5.515e+02, percent-clipped=0.0 2024-09-13 23:02:11,182 INFO [train.py:1198] (1/2) Epoch 3, batch 750, loss[loss=0.3783, ctc_loss=0.2917, cr_loss=0.4331, over 20357.00 frames. ], tot_loss[loss=0.36, ctc_loss=0.2739, cr_loss=0.4306, over 3988594.83 frames. ], batch size: 74, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:02:19,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=38340.666666666664, ans=0.125 2024-09-13 23:02:28,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=38369.0, ans=0.0025284782608695653 2024-09-13 23:02:38,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=38369.0, ans=0.125 2024-09-13 23:02:46,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38397.333333333336, ans=0.1 2024-09-13 23:02:55,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=38425.666666666664, ans=0.125 2024-09-13 23:02:58,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-09-13 23:03:20,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=38454.0, ans=0.2 2024-09-13 23:03:30,717 INFO [train.py:1198] (1/2) Epoch 3, batch 800, loss[loss=0.3735, ctc_loss=0.2841, cr_loss=0.447, over 20967.00 frames. ], tot_loss[loss=0.359, ctc_loss=0.273, cr_loss=0.4303, over 4011214.73 frames. ], batch size: 58, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:03:32,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=38482.333333333336, ans=0.125 2024-09-13 23:03:57,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=38510.666666666664, ans=0.0 2024-09-13 23:04:22,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.609e+02 2.977e+02 3.655e+02 5.386e+02, threshold=5.955e+02, percent-clipped=0.0 2024-09-13 23:04:46,689 INFO [train.py:1198] (1/2) Epoch 3, batch 850, loss[loss=0.3553, ctc_loss=0.2701, cr_loss=0.4263, over 20970.00 frames. ], tot_loss[loss=0.3567, ctc_loss=0.271, cr_loss=0.4285, over 4037332.51 frames. ], batch size: 55, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:04:54,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.89 vs. limit=22.5 2024-09-13 23:05:23,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.88 vs. limit=22.5 2024-09-13 23:05:37,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-09-13 23:06:00,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-13 23:06:02,286 INFO [train.py:1198] (1/2) Epoch 3, batch 900, loss[loss=0.3815, ctc_loss=0.29, cr_loss=0.4577, over 21068.00 frames. ], tot_loss[loss=0.3565, ctc_loss=0.2708, cr_loss=0.4283, over 4041943.37 frames. ], batch size: 59, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:06:54,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.418e+02 2.830e+02 3.421e+02 5.289e+02, threshold=5.660e+02, percent-clipped=0.0 2024-09-13 23:07:18,712 INFO [train.py:1198] (1/2) Epoch 3, batch 950, loss[loss=0.3942, ctc_loss=0.3088, cr_loss=0.4269, over 19357.00 frames. ], tot_loss[loss=0.3568, ctc_loss=0.2709, cr_loss=0.4295, over 4055226.61 frames. ], batch size: 90, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:08:08,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=38992.333333333336, ans=0.125 2024-09-13 23:08:36,989 INFO [train.py:1198] (1/2) Epoch 3, batch 1000, loss[loss=0.3185, ctc_loss=0.2378, cr_loss=0.4033, over 20879.00 frames. ], tot_loss[loss=0.3552, ctc_loss=0.2697, cr_loss=0.4277, over 4071941.98 frames. ], batch size: 54, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:08:40,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2024-09-13 23:09:04,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=39077.333333333336, ans=0.025 2024-09-13 23:09:32,052 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.397e+02 2.774e+02 3.547e+02 5.852e+02, threshold=5.548e+02, percent-clipped=2.0 2024-09-13 23:09:56,171 INFO [train.py:1198] (1/2) Epoch 3, batch 1050, loss[loss=0.343, ctc_loss=0.2587, cr_loss=0.4212, over 20792.00 frames. ], tot_loss[loss=0.3561, ctc_loss=0.2704, cr_loss=0.4289, over 4083209.66 frames. ], batch size: 56, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:10:12,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=39219.0, ans=0.0 2024-09-13 23:10:33,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=39247.333333333336, ans=0.125 2024-09-13 23:11:05,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=39304.0, ans=0.125 2024-09-13 23:11:11,299 INFO [train.py:1198] (1/2) Epoch 3, batch 1100, loss[loss=0.3653, ctc_loss=0.2762, cr_loss=0.4457, over 20668.00 frames. ], tot_loss[loss=0.3548, ctc_loss=0.2692, cr_loss=0.4281, over 4088476.54 frames. ], batch size: 68, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:11:24,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-13 23:11:26,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=39360.666666666664, ans=0.0 2024-09-13 23:11:57,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-09-13 23:12:02,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.365e+02 2.740e+02 3.415e+02 8.630e+02, threshold=5.481e+02, percent-clipped=5.0 2024-09-13 23:12:26,778 INFO [train.py:1198] (1/2) Epoch 3, batch 1150, loss[loss=0.305, ctc_loss=0.2249, cr_loss=0.4006, over 20964.00 frames. ], tot_loss[loss=0.3557, ctc_loss=0.27, cr_loss=0.4287, over 4077167.26 frames. ], batch size: 48, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:12:30,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=39474.0, ans=0.0 2024-09-13 23:12:30,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39474.0, ans=0.125 2024-09-13 23:13:09,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=39530.666666666664, ans=0.04949747468305833 2024-09-13 23:13:15,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-13 23:13:44,207 INFO [train.py:1198] (1/2) Epoch 3, batch 1200, loss[loss=0.3583, ctc_loss=0.2702, cr_loss=0.4406, over 21032.00 frames. ], tot_loss[loss=0.3565, ctc_loss=0.2705, cr_loss=0.4298, over 4085026.18 frames. ], batch size: 56, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:13:50,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39615.666666666664, ans=0.125 2024-09-13 23:14:38,511 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.392e+02 2.617e+02 3.564e+02 8.180e+02, threshold=5.235e+02, percent-clipped=5.0 2024-09-13 23:15:02,918 INFO [train.py:1198] (1/2) Epoch 3, batch 1250, loss[loss=0.3798, ctc_loss=0.2963, cr_loss=0.4176, over 19523.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.2717, cr_loss=0.4297, over 4065599.91 frames. ], batch size: 90, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:15:09,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39757.333333333336, ans=0.1 2024-09-13 23:15:23,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-09-13 23:15:49,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=39842.333333333336, ans=0.035 2024-09-13 23:16:07,578 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:16:19,439 INFO [train.py:1198] (1/2) Epoch 3, batch 1300, loss[loss=0.3312, ctc_loss=0.2525, cr_loss=0.3934, over 20876.00 frames. ], tot_loss[loss=0.3548, ctc_loss=0.2691, cr_loss=0.4283, over 4085315.77 frames. ], batch size: 54, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:17:03,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=39984.0, ans=0.125 2024-09-13 23:17:10,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.710e+02 3.108e+02 3.849e+02 6.117e+02, threshold=6.215e+02, percent-clipped=3.0 2024-09-13 23:17:18,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=40012.333333333336, ans=0.125 2024-09-13 23:17:35,235 INFO [train.py:1198] (1/2) Epoch 3, batch 1350, loss[loss=0.3364, ctc_loss=0.2481, cr_loss=0.4417, over 20889.00 frames. ], tot_loss[loss=0.3551, ctc_loss=0.2694, cr_loss=0.4286, over 4085110.36 frames. ], batch size: 57, lr: 2.55e-02, grad_scale: 32.0 2024-09-13 23:17:53,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40069.0, ans=0.1 2024-09-13 23:18:20,997 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:18:30,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=40125.666666666664, ans=0.0 2024-09-13 23:18:51,041 INFO [train.py:1198] (1/2) Epoch 3, batch 1400, loss[loss=0.384, ctc_loss=0.2961, cr_loss=0.4397, over 20854.00 frames. ], tot_loss[loss=0.3543, ctc_loss=0.2686, cr_loss=0.4285, over 4092492.01 frames. ], batch size: 65, lr: 2.55e-02, grad_scale: 32.0 2024-09-13 23:18:52,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=40182.333333333336, ans=0.0021342753623188396 2024-09-13 23:18:54,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=40182.333333333336, ans=0.125 2024-09-13 23:18:57,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-13 23:19:28,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=40239.0, ans=0.04949747468305833 2024-09-13 23:19:46,059 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.428e+02 2.743e+02 3.383e+02 6.412e+02, threshold=5.486e+02, percent-clipped=1.0 2024-09-13 23:19:52,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=40267.333333333336, ans=0.125 2024-09-13 23:20:10,796 INFO [train.py:1198] (1/2) Epoch 3, batch 1450, loss[loss=0.3308, ctc_loss=0.2444, cr_loss=0.4319, over 20885.00 frames. ], tot_loss[loss=0.3543, ctc_loss=0.2688, cr_loss=0.4277, over 4079247.04 frames. ], batch size: 54, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:20:43,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=40380.666666666664, ans=0.2 2024-09-13 23:20:57,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=40409.0, ans=0.125 2024-09-13 23:21:21,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-09-13 23:21:28,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=40465.666666666664, ans=0.125 2024-09-13 23:21:29,571 INFO [train.py:1198] (1/2) Epoch 3, batch 1500, loss[loss=0.3565, ctc_loss=0.2713, cr_loss=0.4257, over 20662.00 frames. ], tot_loss[loss=0.3556, ctc_loss=0.2697, cr_loss=0.4293, over 4082209.84 frames. ], batch size: 68, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:21:32,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=40465.666666666664, ans=0.125 2024-09-13 23:21:54,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2024-09-13 23:22:20,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.297e+02 2.781e+02 3.384e+02 6.120e+02, threshold=5.561e+02, percent-clipped=2.0 2024-09-13 23:22:44,510 INFO [train.py:1198] (1/2) Epoch 3, batch 1550, loss[loss=0.3348, ctc_loss=0.2587, cr_loss=0.3805, over 20871.00 frames. ], tot_loss[loss=0.3561, ctc_loss=0.2701, cr_loss=0.4298, over 4088969.63 frames. ], batch size: 54, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:23:10,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40635.666666666664, ans=0.1 2024-09-13 23:23:32,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40692.333333333336, ans=0.1 2024-09-13 23:23:37,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40692.333333333336, ans=0.1 2024-09-13 23:23:41,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40692.333333333336, ans=0.1 2024-09-13 23:23:44,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=40720.666666666664, ans=0.125 2024-09-13 23:23:46,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=40720.666666666664, ans=0.125 2024-09-13 23:23:59,567 INFO [train.py:1198] (1/2) Epoch 3, batch 1600, loss[loss=0.3243, ctc_loss=0.2447, cr_loss=0.3979, over 19933.00 frames. ], tot_loss[loss=0.3555, ctc_loss=0.2697, cr_loss=0.4287, over 4085689.32 frames. ], batch size: 44, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:24:00,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2024-09-13 23:24:40,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=40805.666666666664, ans=0.0019987681159420283 2024-09-13 23:24:53,871 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.310e+02 2.640e+02 3.137e+02 6.719e+02, threshold=5.281e+02, percent-clipped=4.0 2024-09-13 23:24:58,965 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:25:03,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=40862.333333333336, ans=0.0019864492753623175 2024-09-13 23:25:09,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=40862.333333333336, ans=0.125 2024-09-13 23:25:14,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=40862.333333333336, ans=0.125 2024-09-13 23:25:18,499 INFO [train.py:1198] (1/2) Epoch 3, batch 1650, loss[loss=0.3823, ctc_loss=0.2957, cr_loss=0.4328, over 20703.00 frames. ], tot_loss[loss=0.3553, ctc_loss=0.2694, cr_loss=0.4294, over 4101796.78 frames. ], batch size: 68, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:25:44,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=40919.0, ans=0.2 2024-09-13 23:25:48,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=40947.333333333336, ans=0.125 2024-09-13 23:25:48,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=40947.333333333336, ans=0.125 2024-09-13 23:26:04,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-13 23:26:07,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-13 23:26:31,665 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:26:32,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-13 23:26:37,255 INFO [train.py:1198] (1/2) Epoch 3, batch 1700, loss[loss=0.3824, ctc_loss=0.2958, cr_loss=0.4328, over 19459.00 frames. ], tot_loss[loss=0.3562, ctc_loss=0.2701, cr_loss=0.4302, over 4088384.24 frames. ], batch size: 90, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:27:06,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=41089.0, ans=0.001937173913043478 2024-09-13 23:27:22,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=41117.333333333336, ans=0.0019310144927536226 2024-09-13 23:27:27,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.488e+02 2.861e+02 3.723e+02 5.926e+02, threshold=5.722e+02, percent-clipped=3.0 2024-09-13 23:27:40,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=41145.666666666664, ans=0.0 2024-09-13 23:27:46,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=41145.666666666664, ans=0.125 2024-09-13 23:27:52,425 INFO [train.py:1198] (1/2) Epoch 3, batch 1750, loss[loss=0.3462, ctc_loss=0.2556, cr_loss=0.4529, over 20781.00 frames. ], tot_loss[loss=0.3547, ctc_loss=0.2691, cr_loss=0.4284, over 4095692.23 frames. ], batch size: 56, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:27:58,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=41174.0, ans=0.125 2024-09-13 23:28:01,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=41174.0, ans=0.125 2024-09-13 23:28:18,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2024-09-13 23:28:36,727 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:28:47,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=41259.0, ans=0.125 2024-09-13 23:28:54,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=41287.333333333336, ans=0.025 2024-09-13 23:29:08,220 INFO [train.py:1198] (1/2) Epoch 3, batch 1800, loss[loss=0.4479, ctc_loss=0.3587, cr_loss=0.4461, over 13855.00 frames. ], tot_loss[loss=0.3552, ctc_loss=0.2694, cr_loss=0.4288, over 4078869.37 frames. ], batch size: 149, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:29:08,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41315.666666666664, ans=0.125 2024-09-13 23:29:10,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-09-13 23:30:00,292 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.573e+02 3.061e+02 3.744e+02 6.005e+02, threshold=6.123e+02, percent-clipped=3.0 2024-09-13 23:30:27,775 INFO [train.py:1198] (1/2) Epoch 3, batch 1850, loss[loss=0.3449, ctc_loss=0.2594, cr_loss=0.4274, over 21052.00 frames. ], tot_loss[loss=0.3542, ctc_loss=0.2684, cr_loss=0.4286, over 4079658.39 frames. ], batch size: 56, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:31:05,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41514.0, ans=0.1 2024-09-13 23:31:06,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=41514.0, ans=0.125 2024-09-13 23:31:21,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=41542.333333333336, ans=0.09899494936611666 2024-09-13 23:31:46,920 INFO [train.py:1198] (1/2) Epoch 3, batch 1900, loss[loss=0.3174, ctc_loss=0.2378, cr_loss=0.3983, over 20947.00 frames. ], tot_loss[loss=0.3523, ctc_loss=0.2667, cr_loss=0.4279, over 4099036.24 frames. ], batch size: 50, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:32:11,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=41627.333333333336, ans=0.1 2024-09-13 23:32:18,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41655.666666666664, ans=0.125 2024-09-13 23:32:38,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.418e+02 2.994e+02 3.662e+02 6.504e+02, threshold=5.988e+02, percent-clipped=1.0 2024-09-13 23:32:56,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=41712.333333333336, ans=0.2 2024-09-13 23:33:01,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=41740.666666666664, ans=0.125 2024-09-13 23:33:01,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=41740.666666666664, ans=0.001795507246376813 2024-09-13 23:33:02,511 INFO [train.py:1198] (1/2) Epoch 3, batch 1950, loss[loss=0.3348, ctc_loss=0.2562, cr_loss=0.393, over 21065.00 frames. ], tot_loss[loss=0.3517, ctc_loss=0.2663, cr_loss=0.4273, over 4099947.94 frames. ], batch size: 53, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:33:22,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41769.0, ans=0.1 2024-09-13 23:33:25,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-13 23:33:26,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-09-13 23:34:07,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=41854.0, ans=0.2 2024-09-13 23:34:17,734 INFO [train.py:1198] (1/2) Epoch 3, batch 2000, loss[loss=0.4386, ctc_loss=0.3485, cr_loss=0.4501, over 14142.00 frames. ], tot_loss[loss=0.3511, ctc_loss=0.2657, cr_loss=0.4271, over 4098830.76 frames. ], batch size: 149, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:34:41,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=41910.666666666664, ans=0.0 2024-09-13 23:34:59,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=41939.0, ans=0.2 2024-09-13 23:35:05,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=41967.333333333336, ans=0.0 2024-09-13 23:35:09,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.358e+02 2.581e+02 2.979e+02 5.095e+02, threshold=5.161e+02, percent-clipped=0.0 2024-09-13 23:35:17,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=22.5 2024-09-13 23:35:18,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=41995.666666666664, ans=0.125 2024-09-13 23:35:32,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42024.0, ans=0.125 2024-09-13 23:35:33,508 INFO [train.py:1198] (1/2) Epoch 3, batch 2050, loss[loss=0.3207, ctc_loss=0.2361, cr_loss=0.4228, over 20898.00 frames. ], tot_loss[loss=0.3501, ctc_loss=0.2649, cr_loss=0.4262, over 4093839.40 frames. ], batch size: 54, lr: 2.50e-02, grad_scale: 32.0 2024-09-13 23:35:52,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-13 23:36:41,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=42137.333333333336, ans=0.0017092753623188395 2024-09-13 23:36:51,976 INFO [train.py:1198] (1/2) Epoch 3, batch 2100, loss[loss=0.3518, ctc_loss=0.2644, cr_loss=0.4366, over 21000.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2638, cr_loss=0.425, over 4092321.77 frames. ], batch size: 61, lr: 2.50e-02, grad_scale: 64.0 2024-09-13 23:36:52,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42165.666666666664, ans=0.1 2024-09-13 23:36:55,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=42165.666666666664, ans=0.125 2024-09-13 23:37:43,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=42250.666666666664, ans=0.125 2024-09-13 23:37:46,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.498e+02 2.992e+02 3.605e+02 7.205e+02, threshold=5.983e+02, percent-clipped=5.0 2024-09-13 23:38:10,912 INFO [train.py:1198] (1/2) Epoch 3, batch 2150, loss[loss=0.3802, ctc_loss=0.2899, cr_loss=0.4513, over 19695.00 frames. ], tot_loss[loss=0.3498, ctc_loss=0.2645, cr_loss=0.4264, over 4097289.23 frames. ], batch size: 90, lr: 2.50e-02, grad_scale: 64.0 2024-09-13 23:38:24,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=42335.666666666664, ans=0.0016661594202898554 2024-09-13 23:38:29,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=42335.666666666664, ans=0.0 2024-09-13 23:39:13,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=42420.666666666664, ans=0.125 2024-09-13 23:39:27,158 INFO [train.py:1198] (1/2) Epoch 3, batch 2200, loss[loss=0.3847, ctc_loss=0.2926, cr_loss=0.4606, over 20690.00 frames. ], tot_loss[loss=0.349, ctc_loss=0.2638, cr_loss=0.4261, over 4101523.67 frames. ], batch size: 71, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:40:22,341 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.395e+02 2.642e+02 3.226e+02 6.726e+02, threshold=5.285e+02, percent-clipped=3.0 2024-09-13 23:40:28,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=42562.333333333336, ans=0.125 2024-09-13 23:40:43,302 INFO [train.py:1198] (1/2) Epoch 3, batch 2250, loss[loss=0.3188, ctc_loss=0.2384, cr_loss=0.4017, over 20792.00 frames. ], tot_loss[loss=0.3492, ctc_loss=0.2639, cr_loss=0.4267, over 4101944.05 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:41:07,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=42619.0, ans=0.0016045652173913051 2024-09-13 23:41:10,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=42619.0, ans=0.125 2024-09-13 23:41:18,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=42647.333333333336, ans=0.0015984057971014497 2024-09-13 23:41:40,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=42675.666666666664, ans=0.125 2024-09-13 23:41:58,907 INFO [train.py:1198] (1/2) Epoch 3, batch 2300, loss[loss=0.3685, ctc_loss=0.2785, cr_loss=0.4497, over 20064.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2635, cr_loss=0.4255, over 4096924.73 frames. ], batch size: 80, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:42:02,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=42732.333333333336, ans=0.125 2024-09-13 23:42:03,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=42732.333333333336, ans=0.05 2024-09-13 23:42:32,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=42789.0, ans=0.125 2024-09-13 23:42:56,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.338e+02 2.519e+02 3.097e+02 4.952e+02, threshold=5.037e+02, percent-clipped=0.0 2024-09-13 23:43:05,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=42845.666666666664, ans=0.0015552898550724639 2024-09-13 23:43:19,975 INFO [train.py:1198] (1/2) Epoch 3, batch 2350, loss[loss=0.3071, ctc_loss=0.2275, cr_loss=0.3978, over 19970.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2635, cr_loss=0.4258, over 4102991.84 frames. ], batch size: 44, lr: 2.48e-02, grad_scale: 16.0 2024-09-13 23:43:41,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-13 23:43:48,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=42930.666666666664, ans=0.0 2024-09-13 23:44:13,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2024-09-13 23:44:35,186 INFO [train.py:1198] (1/2) Epoch 3, batch 2400, loss[loss=0.2998, ctc_loss=0.2227, cr_loss=0.3857, over 20404.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2635, cr_loss=0.4263, over 4110622.42 frames. ], batch size: 45, lr: 2.48e-02, grad_scale: 32.0 2024-09-13 23:44:44,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=43015.666666666664, ans=0.125 2024-09-13 23:44:54,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=43044.0, ans=0.125 2024-09-13 23:45:10,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-09-13 23:45:12,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=43072.333333333336, ans=0.2 2024-09-13 23:45:29,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.391e+02 2.830e+02 3.681e+02 8.924e+02, threshold=5.660e+02, percent-clipped=3.0 2024-09-13 23:45:30,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=43100.666666666664, ans=0.125 2024-09-13 23:45:50,509 INFO [train.py:1198] (1/2) Epoch 3, batch 2450, loss[loss=0.3414, ctc_loss=0.2558, cr_loss=0.4282, over 20899.00 frames. ], tot_loss[loss=0.3494, ctc_loss=0.264, cr_loss=0.4265, over 4107677.76 frames. ], batch size: 54, lr: 2.48e-02, grad_scale: 32.0 2024-09-13 23:46:16,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=43185.666666666664, ans=0.025 2024-09-13 23:46:17,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=43185.666666666664, ans=0.0 2024-09-13 23:47:05,286 INFO [train.py:1198] (1/2) Epoch 3, batch 2500, loss[loss=0.3451, ctc_loss=0.2604, cr_loss=0.4233, over 21070.00 frames. ], tot_loss[loss=0.3501, ctc_loss=0.2646, cr_loss=0.4276, over 4096573.14 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:47:13,056 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:47:31,124 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:47:37,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2024-09-13 23:47:43,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2024-09-13 23:48:02,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.453e+02 3.011e+02 3.504e+02 5.504e+02, threshold=6.022e+02, percent-clipped=0.0 2024-09-13 23:48:23,057 INFO [train.py:1198] (1/2) Epoch 3, batch 2550, loss[loss=0.3563, ctc_loss=0.2715, cr_loss=0.4238, over 19507.00 frames. ], tot_loss[loss=0.3517, ctc_loss=0.2659, cr_loss=0.429, over 4092943.55 frames. ], batch size: 90, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:49:36,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-13 23:49:41,877 INFO [train.py:1198] (1/2) Epoch 3, batch 2600, loss[loss=0.3257, ctc_loss=0.243, cr_loss=0.4138, over 20849.00 frames. ], tot_loss[loss=0.3504, ctc_loss=0.2649, cr_loss=0.428, over 4098196.72 frames. ], batch size: 54, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:50:19,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43639.0, ans=0.1 2024-09-13 23:50:35,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.359e+02 2.647e+02 3.183e+02 6.616e+02, threshold=5.295e+02, percent-clipped=1.0 2024-09-13 23:50:40,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=43695.666666666664, ans=0.125 2024-09-13 23:50:51,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=43695.666666666664, ans=0.125 2024-09-13 23:50:53,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-13 23:50:56,885 INFO [train.py:1198] (1/2) Epoch 3, batch 2650, loss[loss=0.3497, ctc_loss=0.264, cr_loss=0.4288, over 20950.00 frames. ], tot_loss[loss=0.3536, ctc_loss=0.2677, cr_loss=0.4294, over 4074211.68 frames. ], batch size: 58, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:51:00,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=43724.0, ans=0.0 2024-09-13 23:51:03,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=43724.0, ans=0.125 2024-09-13 23:51:52,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=43809.0, ans=0.04949747468305833 2024-09-13 23:52:12,186 INFO [train.py:1198] (1/2) Epoch 3, batch 2700, loss[loss=0.3597, ctc_loss=0.274, cr_loss=0.4285, over 20683.00 frames. ], tot_loss[loss=0.3536, ctc_loss=0.2678, cr_loss=0.429, over 4064465.70 frames. ], batch size: 68, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:52:14,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-13 23:52:59,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-09-13 23:53:05,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=12.0 2024-09-13 23:53:06,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.329e+02 2.737e+02 3.277e+02 1.023e+03, threshold=5.475e+02, percent-clipped=1.0 2024-09-13 23:53:27,732 INFO [train.py:1198] (1/2) Epoch 3, batch 2750, loss[loss=0.4221, ctc_loss=0.3317, cr_loss=0.4522, over 14055.00 frames. ], tot_loss[loss=0.351, ctc_loss=0.2657, cr_loss=0.4267, over 4060942.45 frames. ], batch size: 149, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:53:45,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=44035.666666666664, ans=0.05 2024-09-13 23:54:00,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=44064.0, ans=0.125 2024-09-13 23:54:00,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=44064.0, ans=0.2 2024-09-13 23:54:34,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=44120.666666666664, ans=0.0 2024-09-13 23:54:49,033 INFO [train.py:1198] (1/2) Epoch 3, batch 2800, loss[loss=0.331, ctc_loss=0.2477, cr_loss=0.4163, over 20779.00 frames. ], tot_loss[loss=0.3507, ctc_loss=0.2653, cr_loss=0.4271, over 4075860.32 frames. ], batch size: 53, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:54:57,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2024-09-13 23:55:19,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.08 vs. limit=22.5 2024-09-13 23:55:36,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=44234.0, ans=0.04949747468305833 2024-09-13 23:55:43,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.473e+02 2.845e+02 3.511e+02 5.408e+02, threshold=5.690e+02, percent-clipped=0.0 2024-09-13 23:55:49,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=44262.333333333336, ans=0.2 2024-09-13 23:55:52,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2024-09-13 23:55:57,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-09-13 23:56:02,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-13 23:56:04,550 INFO [train.py:1198] (1/2) Epoch 3, batch 2850, loss[loss=0.4488, ctc_loss=0.3648, cr_loss=0.4199, over 14589.00 frames. ], tot_loss[loss=0.3508, ctc_loss=0.2653, cr_loss=0.4275, over 4077588.18 frames. ], batch size: 149, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:56:21,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=44319.0, ans=0.125 2024-09-13 23:56:36,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=44347.333333333336, ans=0.125 2024-09-13 23:57:19,503 INFO [train.py:1198] (1/2) Epoch 3, batch 2900, loss[loss=0.3915, ctc_loss=0.2947, cr_loss=0.484, over 20027.00 frames. ], tot_loss[loss=0.3491, ctc_loss=0.2638, cr_loss=0.4265, over 4095148.52 frames. ], batch size: 80, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:57:50,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=44489.0, ans=0.125 2024-09-13 23:57:51,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=44489.0, ans=0.2 2024-09-13 23:57:53,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=44489.0, ans=0.2 2024-09-13 23:58:02,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=44489.0, ans=0.125 2024-09-13 23:58:06,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-09-13 23:58:13,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.437e+02 2.698e+02 3.177e+02 4.737e+02, threshold=5.396e+02, percent-clipped=0.0 2024-09-13 23:58:30,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=44545.666666666664, ans=0.5 2024-09-13 23:58:34,836 INFO [train.py:1198] (1/2) Epoch 3, batch 2950, loss[loss=0.3719, ctc_loss=0.2797, cr_loss=0.4612, over 19483.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2634, cr_loss=0.4269, over 4102660.08 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 32.0 2024-09-13 23:58:48,932 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:58:52,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=44602.333333333336, ans=0.125 2024-09-13 23:59:03,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-13 23:59:30,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-13 23:59:48,504 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:59:54,101 INFO [train.py:1198] (1/2) Epoch 3, batch 3000, loss[loss=0.3701, ctc_loss=0.2827, cr_loss=0.437, over 20828.00 frames. ], tot_loss[loss=0.3484, ctc_loss=0.2631, cr_loss=0.4262, over 4107025.99 frames. ], batch size: 59, lr: 2.44e-02, grad_scale: 32.0 2024-09-13 23:59:54,101 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 00:00:06,356 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6846, 3.9647, 4.6362, 4.5600, 4.0300, 4.5961, 3.4795, 3.8826], device='cuda:1') 2024-09-14 00:00:14,198 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.09133, ctc_loss=0.09133, cr_loss=9.464e-15, over 944034.00 frames. 2024-09-14 00:00:14,199 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 00:00:20,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-14 00:00:39,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=44744.0, ans=0.125 2024-09-14 00:00:47,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=44772.333333333336, ans=0.125 2024-09-14 00:01:08,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.615e+02 2.994e+02 3.669e+02 6.655e+02, threshold=5.989e+02, percent-clipped=5.0 2024-09-14 00:01:29,351 INFO [train.py:1198] (1/2) Epoch 3, batch 3050, loss[loss=0.3719, ctc_loss=0.2787, cr_loss=0.466, over 20978.00 frames. ], tot_loss[loss=0.3499, ctc_loss=0.2644, cr_loss=0.4275, over 4105533.78 frames. ], batch size: 64, lr: 2.44e-02, grad_scale: 32.0 2024-09-14 00:01:31,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=44857.333333333336, ans=0.025 2024-09-14 00:01:49,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=44885.666666666664, ans=0.07 2024-09-14 00:02:25,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=44942.333333333336, ans=0.0 2024-09-14 00:02:33,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2024-09-14 00:02:44,435 INFO [train.py:1198] (1/2) Epoch 3, batch 3100, loss[loss=0.315, ctc_loss=0.2343, cr_loss=0.4038, over 20788.00 frames. ], tot_loss[loss=0.3497, ctc_loss=0.2643, cr_loss=0.427, over 4094023.29 frames. ], batch size: 53, lr: 2.44e-02, grad_scale: 32.0 2024-09-14 00:02:48,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=44999.0, ans=0.125 2024-09-14 00:02:55,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-09-14 00:03:10,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=45027.333333333336, ans=0.015 2024-09-14 00:03:30,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=45084.0, ans=0.125 2024-09-14 00:03:38,684 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.421e+02 2.946e+02 3.449e+02 6.819e+02, threshold=5.892e+02, percent-clipped=1.0 2024-09-14 00:03:59,647 INFO [train.py:1198] (1/2) Epoch 3, batch 3150, loss[loss=0.3499, ctc_loss=0.2656, cr_loss=0.4213, over 20865.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2635, cr_loss=0.4258, over 4079888.93 frames. ], batch size: 57, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:04:27,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-09-14 00:04:31,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=45197.333333333336, ans=0.04949747468305833 2024-09-14 00:04:52,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=45225.666666666664, ans=0.2 2024-09-14 00:05:18,113 INFO [train.py:1198] (1/2) Epoch 3, batch 3200, loss[loss=0.2752, ctc_loss=0.2052, cr_loss=0.35, over 19953.00 frames. ], tot_loss[loss=0.3479, ctc_loss=0.2626, cr_loss=0.426, over 4087799.68 frames. ], batch size: 44, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:05:56,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=45339.0, ans=0.09899494936611666 2024-09-14 00:05:59,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=45339.0, ans=0.2 2024-09-14 00:06:05,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=45367.333333333336, ans=0.2 2024-09-14 00:06:16,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.452e+02 2.785e+02 3.417e+02 6.516e+02, threshold=5.570e+02, percent-clipped=1.0 2024-09-14 00:06:32,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=45395.666666666664, ans=0.001000942028985508 2024-09-14 00:06:37,166 INFO [train.py:1198] (1/2) Epoch 3, batch 3250, loss[loss=0.2784, ctc_loss=0.2049, cr_loss=0.3674, over 21072.00 frames. ], tot_loss[loss=0.3478, ctc_loss=0.2626, cr_loss=0.426, over 4087141.69 frames. ], batch size: 53, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:06:38,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-14 00:06:43,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=45424.0, ans=0.125 2024-09-14 00:06:45,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 00:07:05,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=45480.666666666664, ans=0.0009824637681159418 2024-09-14 00:07:10,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-09-14 00:07:17,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=45480.666666666664, ans=0.2 2024-09-14 00:07:19,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=45480.666666666664, ans=0.125 2024-09-14 00:07:52,700 INFO [train.py:1198] (1/2) Epoch 3, batch 3300, loss[loss=0.3699, ctc_loss=0.2784, cr_loss=0.4574, over 20032.00 frames. ], tot_loss[loss=0.3474, ctc_loss=0.2621, cr_loss=0.4266, over 4089738.98 frames. ], batch size: 80, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:08:36,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45650.666666666664, ans=0.125 2024-09-14 00:08:39,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45650.666666666664, ans=0.1 2024-09-14 00:08:46,856 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.285e+02 2.532e+02 2.921e+02 4.809e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 00:09:07,801 INFO [train.py:1198] (1/2) Epoch 3, batch 3350, loss[loss=0.3746, ctc_loss=0.2856, cr_loss=0.4446, over 21011.00 frames. ], tot_loss[loss=0.3483, ctc_loss=0.2629, cr_loss=0.4268, over 4091043.82 frames. ], batch size: 61, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:09:17,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=45707.333333333336, ans=0.0 2024-09-14 00:09:20,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=45707.333333333336, ans=15.0 2024-09-14 00:09:57,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=45792.333333333336, ans=0.125 2024-09-14 00:09:59,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=45792.333333333336, ans=0.2 2024-09-14 00:10:08,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45820.666666666664, ans=0.1 2024-09-14 00:10:11,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=45820.666666666664, ans=0.0009085507246376825 2024-09-14 00:10:14,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=45820.666666666664, ans=0.95 2024-09-14 00:10:23,220 INFO [train.py:1198] (1/2) Epoch 3, batch 3400, loss[loss=0.334, ctc_loss=0.251, cr_loss=0.415, over 20871.00 frames. ], tot_loss[loss=0.3474, ctc_loss=0.2621, cr_loss=0.4266, over 4098183.00 frames. ], batch size: 57, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:10:35,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=45849.0, ans=0.025 2024-09-14 00:10:35,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=45849.0, ans=0.0009023913043478254 2024-09-14 00:10:36,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45849.0, ans=0.125 2024-09-14 00:10:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=45877.333333333336, ans=0.02 2024-09-14 00:10:59,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=45905.666666666664, ans=0.0008900724637681164 2024-09-14 00:11:05,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45905.666666666664, ans=0.125 2024-09-14 00:11:20,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=45934.0, ans=0.125 2024-09-14 00:11:22,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.330e+02 2.613e+02 3.141e+02 5.836e+02, threshold=5.227e+02, percent-clipped=3.0 2024-09-14 00:11:44,122 INFO [train.py:1198] (1/2) Epoch 3, batch 3450, loss[loss=0.4017, ctc_loss=0.3071, cr_loss=0.473, over 20020.00 frames. ], tot_loss[loss=0.3473, ctc_loss=0.262, cr_loss=0.4266, over 4102879.01 frames. ], batch size: 80, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:12:01,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-14 00:12:19,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=46047.333333333336, ans=0.0008592753623188395 2024-09-14 00:12:59,159 INFO [train.py:1198] (1/2) Epoch 3, batch 3500, loss[loss=0.4225, ctc_loss=0.3322, cr_loss=0.4515, over 14775.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.263, cr_loss=0.4276, over 4094369.60 frames. ], batch size: 150, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:13:02,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46132.333333333336, ans=0.1 2024-09-14 00:13:10,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-14 00:13:46,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=46217.333333333336, ans=0.1 2024-09-14 00:13:53,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.367e+02 2.706e+02 3.266e+02 5.266e+02, threshold=5.411e+02, percent-clipped=2.0 2024-09-14 00:14:09,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=46245.666666666664, ans=0.0 2024-09-14 00:14:14,911 INFO [train.py:1198] (1/2) Epoch 3, batch 3550, loss[loss=0.383, ctc_loss=0.2934, cr_loss=0.4478, over 20655.00 frames. ], tot_loss[loss=0.349, ctc_loss=0.2633, cr_loss=0.4284, over 4090800.26 frames. ], batch size: 71, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:15:29,882 INFO [train.py:1198] (1/2) Epoch 3, batch 3600, loss[loss=0.3149, ctc_loss=0.238, cr_loss=0.3843, over 20981.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2631, cr_loss=0.4278, over 4085575.25 frames. ], batch size: 52, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:15:59,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=12.0 2024-09-14 00:16:10,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=46472.333333333336, ans=10.0 2024-09-14 00:16:24,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=46500.666666666664, ans=0.0007607246376811604 2024-09-14 00:16:26,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.371e+02 2.596e+02 2.997e+02 4.705e+02, threshold=5.193e+02, percent-clipped=0.0 2024-09-14 00:16:29,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-09-14 00:16:31,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-14 00:16:50,490 INFO [train.py:1198] (1/2) Epoch 3, batch 3650, loss[loss=0.375, ctc_loss=0.2834, cr_loss=0.4583, over 20660.00 frames. ], tot_loss[loss=0.3492, ctc_loss=0.2634, cr_loss=0.429, over 4090747.88 frames. ], batch size: 68, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:17:40,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=46642.333333333336, ans=0.0 2024-09-14 00:17:45,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=46642.333333333336, ans=0.2 2024-09-14 00:17:51,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=46670.666666666664, ans=0.2 2024-09-14 00:18:06,430 INFO [train.py:1198] (1/2) Epoch 3, batch 3700, loss[loss=0.3552, ctc_loss=0.264, cr_loss=0.4559, over 20689.00 frames. ], tot_loss[loss=0.3473, ctc_loss=0.2617, cr_loss=0.4281, over 4095915.84 frames. ], batch size: 66, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:18:38,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=46755.666666666664, ans=0.0007052898550724638 2024-09-14 00:18:40,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=46755.666666666664, ans=0.0007052898550724638 2024-09-14 00:18:55,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=46784.0, ans=0.125 2024-09-14 00:19:01,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.546e+02 2.834e+02 3.481e+02 5.195e+02, threshold=5.668e+02, percent-clipped=1.0 2024-09-14 00:19:04,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=46784.0, ans=0.025 2024-09-14 00:19:12,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=46812.333333333336, ans=0.125 2024-09-14 00:19:22,635 INFO [train.py:1198] (1/2) Epoch 3, batch 3750, loss[loss=0.4432, ctc_loss=0.3484, cr_loss=0.4737, over 14421.00 frames. ], tot_loss[loss=0.3469, ctc_loss=0.2614, cr_loss=0.4274, over 4097543.81 frames. ], batch size: 151, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:19:23,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46840.666666666664, ans=0.1 2024-09-14 00:19:27,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46840.666666666664, ans=0.1 2024-09-14 00:19:31,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=46840.666666666664, ans=0.0 2024-09-14 00:19:37,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=46869.0, ans=0.125 2024-09-14 00:20:02,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=46897.333333333336, ans=0.125 2024-09-14 00:20:03,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-14 00:20:06,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=46925.666666666664, ans=0.0006683333333333333 2024-09-14 00:20:33,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=46954.0, ans=0.125 2024-09-14 00:20:37,760 INFO [train.py:1198] (1/2) Epoch 3, batch 3800, loss[loss=0.3134, ctc_loss=0.2337, cr_loss=0.3986, over 20983.00 frames. ], tot_loss[loss=0.3462, ctc_loss=0.2609, cr_loss=0.4267, over 4099415.55 frames. ], batch size: 50, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:20:38,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46982.333333333336, ans=0.1 2024-09-14 00:20:59,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=47010.666666666664, ans=0.0 2024-09-14 00:21:10,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=47039.0, ans=0.125 2024-09-14 00:21:22,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=47067.333333333336, ans=0.025 2024-09-14 00:21:27,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2024-09-14 00:21:31,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.350e+02 2.718e+02 3.478e+02 5.662e+02, threshold=5.437e+02, percent-clipped=0.0 2024-09-14 00:21:36,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-09-14 00:21:48,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47095.666666666664, ans=0.1 2024-09-14 00:21:54,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47124.0, ans=0.1 2024-09-14 00:21:55,831 INFO [train.py:1198] (1/2) Epoch 3, batch 3850, loss[loss=0.3975, ctc_loss=0.3066, cr_loss=0.4545, over 20661.00 frames. ], tot_loss[loss=0.3473, ctc_loss=0.2617, cr_loss=0.4279, over 4099106.04 frames. ], batch size: 68, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:22:44,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47209.0, ans=0.125 2024-09-14 00:22:50,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=47209.0, ans=0.0006067391304347813 2024-09-14 00:23:14,314 INFO [train.py:1198] (1/2) Epoch 3, batch 3900, loss[loss=0.338, ctc_loss=0.2523, cr_loss=0.4287, over 20787.00 frames. ], tot_loss[loss=0.3467, ctc_loss=0.2613, cr_loss=0.427, over 4099226.57 frames. ], batch size: 53, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:23:25,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=47265.666666666664, ans=0.0 2024-09-14 00:23:26,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=47265.666666666664, ans=0.2 2024-09-14 00:23:51,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-09-14 00:23:55,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=47322.333333333336, ans=0.04949747468305833 2024-09-14 00:24:08,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.288e+02 2.504e+02 2.863e+02 3.836e+02, threshold=5.007e+02, percent-clipped=0.0 2024-09-14 00:24:09,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47350.666666666664, ans=0.1 2024-09-14 00:24:14,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-14 00:24:29,638 INFO [train.py:1198] (1/2) Epoch 3, batch 3950, loss[loss=0.4001, ctc_loss=0.3009, cr_loss=0.4961, over 20985.00 frames. ], tot_loss[loss=0.3466, ctc_loss=0.2611, cr_loss=0.4275, over 4110497.62 frames. ], batch size: 64, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:24:50,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-09-14 00:25:19,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2024-09-14 00:25:44,699 INFO [train.py:1198] (1/2) Epoch 3, batch 4000, loss[loss=0.3518, ctc_loss=0.2624, cr_loss=0.4468, over 21020.00 frames. ], tot_loss[loss=0.3456, ctc_loss=0.2602, cr_loss=0.4269, over 4107129.41 frames. ], batch size: 61, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:26:36,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47634.0, ans=0.125 2024-09-14 00:26:40,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.380e+02 2.715e+02 3.768e+02 6.856e+02, threshold=5.431e+02, percent-clipped=7.0 2024-09-14 00:26:57,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=47662.333333333336, ans=0.2 2024-09-14 00:27:00,139 INFO [train.py:1198] (1/2) Epoch 3, batch 4050, loss[loss=0.2817, ctc_loss=0.212, cr_loss=0.3488, over 20970.00 frames. ], tot_loss[loss=0.346, ctc_loss=0.2607, cr_loss=0.4264, over 4089683.83 frames. ], batch size: 49, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:27:06,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.83 vs. limit=22.5 2024-09-14 00:27:13,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47719.0, ans=0.125 2024-09-14 00:27:20,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=47719.0, ans=0.125 2024-09-14 00:27:25,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-14 00:28:09,822 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 00:28:20,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=47832.333333333336, ans=0.125 2024-09-14 00:28:21,458 INFO [train.py:1198] (1/2) Epoch 3, batch 4100, loss[loss=0.3638, ctc_loss=0.2775, cr_loss=0.4317, over 21011.00 frames. ], tot_loss[loss=0.3462, ctc_loss=0.2606, cr_loss=0.428, over 4100281.17 frames. ], batch size: 63, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:28:22,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-09-14 00:28:27,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47832.333333333336, ans=0.1 2024-09-14 00:28:33,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47832.333333333336, ans=0.125 2024-09-14 00:28:54,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47889.0, ans=0.125 2024-09-14 00:29:07,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=47917.333333333336, ans=0.1 2024-09-14 00:29:16,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.316e+02 2.497e+02 2.917e+02 4.511e+02, threshold=4.994e+02, percent-clipped=0.0 2024-09-14 00:29:36,583 INFO [train.py:1198] (1/2) Epoch 3, batch 4150, loss[loss=0.2897, ctc_loss=0.213, cr_loss=0.3835, over 20969.00 frames. ], tot_loss[loss=0.3441, ctc_loss=0.2589, cr_loss=0.4256, over 4098922.98 frames. ], batch size: 51, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:29:37,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2024-09-14 00:30:00,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48002.333333333336, ans=0.1 2024-09-14 00:30:09,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=48030.666666666664, ans=0.125 2024-09-14 00:30:38,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=48087.333333333336, ans=0.2 2024-09-14 00:30:51,555 INFO [train.py:1198] (1/2) Epoch 3, batch 4200, loss[loss=0.2995, ctc_loss=0.2204, cr_loss=0.3955, over 20943.00 frames. ], tot_loss[loss=0.344, ctc_loss=0.259, cr_loss=0.4254, over 4088966.22 frames. ], batch size: 48, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:31:08,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-14 00:31:32,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48172.333333333336, ans=0.1 2024-09-14 00:31:46,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=48200.666666666664, ans=0.2 2024-09-14 00:31:47,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.438e+02 2.954e+02 3.531e+02 6.549e+02, threshold=5.908e+02, percent-clipped=3.0 2024-09-14 00:32:06,725 INFO [train.py:1198] (1/2) Epoch 3, batch 4250, loss[loss=0.3544, ctc_loss=0.267, cr_loss=0.4374, over 20975.00 frames. ], tot_loss[loss=0.3447, ctc_loss=0.2595, cr_loss=0.4259, over 4095819.09 frames. ], batch size: 55, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:32:31,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=48285.666666666664, ans=0.0003726811594202909 2024-09-14 00:33:13,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-09-14 00:33:14,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2024-09-14 00:33:17,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-14 00:33:25,006 INFO [train.py:1198] (1/2) Epoch 3, batch 4300, loss[loss=0.3881, ctc_loss=0.2973, cr_loss=0.4542, over 20311.00 frames. ], tot_loss[loss=0.3447, ctc_loss=0.2596, cr_loss=0.4259, over 4088661.36 frames. ], batch size: 74, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:33:34,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48399.0, ans=0.1 2024-09-14 00:33:53,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=48427.333333333336, ans=0.0 2024-09-14 00:34:22,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=48484.0, ans=0.0 2024-09-14 00:34:23,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.438e+02 2.813e+02 3.513e+02 6.461e+02, threshold=5.625e+02, percent-clipped=1.0 2024-09-14 00:34:35,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=48512.333333333336, ans=0.1 2024-09-14 00:34:36,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=48512.333333333336, ans=0.2 2024-09-14 00:34:43,238 INFO [train.py:1198] (1/2) Epoch 3, batch 4350, loss[loss=0.3044, ctc_loss=0.2311, cr_loss=0.3668, over 20965.00 frames. ], tot_loss[loss=0.3458, ctc_loss=0.2603, cr_loss=0.4272, over 4087970.76 frames. ], batch size: 51, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:35:46,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=48654.0, ans=0.2 2024-09-14 00:35:58,846 INFO [train.py:1198] (1/2) Epoch 3, batch 4400, loss[loss=0.3146, ctc_loss=0.2307, cr_loss=0.4193, over 20783.00 frames. ], tot_loss[loss=0.344, ctc_loss=0.2589, cr_loss=0.4256, over 4090344.84 frames. ], batch size: 53, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:36:03,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=48682.333333333336, ans=0.2 2024-09-14 00:36:05,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48682.333333333336, ans=0.125 2024-09-14 00:36:54,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.431e+02 2.704e+02 3.193e+02 5.692e+02, threshold=5.407e+02, percent-clipped=1.0 2024-09-14 00:37:11,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48795.666666666664, ans=0.125 2024-09-14 00:37:13,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=48824.0, ans=0.125 2024-09-14 00:37:14,422 INFO [train.py:1198] (1/2) Epoch 3, batch 4450, loss[loss=0.3291, ctc_loss=0.2411, cr_loss=0.44, over 20887.00 frames. ], tot_loss[loss=0.3434, ctc_loss=0.2583, cr_loss=0.4255, over 4109087.88 frames. ], batch size: 54, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:37:20,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48824.0, ans=0.125 2024-09-14 00:37:27,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2024-09-14 00:37:44,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=48880.666666666664, ans=0.2 2024-09-14 00:37:50,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-09-14 00:38:03,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=48909.0, ans=0.125 2024-09-14 00:38:27,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=12.0 2024-09-14 00:38:28,720 INFO [train.py:1198] (1/2) Epoch 3, batch 4500, loss[loss=0.3975, ctc_loss=0.3156, cr_loss=0.4097, over 14483.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.2576, cr_loss=0.4251, over 4113171.96 frames. ], batch size: 149, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:38:37,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48965.666666666664, ans=0.125 2024-09-14 00:39:27,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.397e+02 2.714e+02 3.183e+02 5.265e+02, threshold=5.429e+02, percent-clipped=0.0 2024-09-14 00:39:41,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49079.0, ans=0.1 2024-09-14 00:39:41,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-09-14 00:39:49,975 INFO [train.py:1198] (1/2) Epoch 3, batch 4550, loss[loss=0.3103, ctc_loss=0.2304, cr_loss=0.3997, over 20950.00 frames. ], tot_loss[loss=0.3429, ctc_loss=0.2579, cr_loss=0.425, over 4113166.71 frames. ], batch size: 51, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:40:02,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=49107.333333333336, ans=0.125 2024-09-14 00:40:12,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=49135.666666666664, ans=0.125 2024-09-14 00:40:50,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=49220.666666666664, ans=0.00016942028985507222 2024-09-14 00:41:05,069 INFO [train.py:1198] (1/2) Epoch 3, batch 4600, loss[loss=0.3054, ctc_loss=0.2334, cr_loss=0.3603, over 20325.00 frames. ], tot_loss[loss=0.3432, ctc_loss=0.2582, cr_loss=0.4247, over 4108967.46 frames. ], batch size: 45, lr: 2.35e-02, grad_scale: 16.0 2024-09-14 00:41:46,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-09-14 00:41:47,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2024-09-14 00:41:55,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=49334.0, ans=0.2 2024-09-14 00:42:02,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.427e+02 2.669e+02 3.167e+02 5.749e+02, threshold=5.338e+02, percent-clipped=1.0 2024-09-14 00:42:20,192 INFO [train.py:1198] (1/2) Epoch 3, batch 4650, loss[loss=0.3629, ctc_loss=0.2706, cr_loss=0.4611, over 20936.00 frames. ], tot_loss[loss=0.3436, ctc_loss=0.2586, cr_loss=0.425, over 4104441.64 frames. ], batch size: 67, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:42:37,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=49419.0, ans=0.125 2024-09-14 00:42:43,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=49419.0, ans=0.2 2024-09-14 00:42:46,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=49419.0, ans=0.125 2024-09-14 00:43:04,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=49475.666666666664, ans=0.5 2024-09-14 00:43:17,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=49475.666666666664, ans=0.125 2024-09-14 00:43:35,605 INFO [train.py:1198] (1/2) Epoch 3, batch 4700, loss[loss=0.3962, ctc_loss=0.2973, cr_loss=0.4942, over 20954.00 frames. ], tot_loss[loss=0.3445, ctc_loss=0.2592, cr_loss=0.4268, over 4105322.96 frames. ], batch size: 67, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:43:39,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=49532.333333333336, ans=0.00010166666666666657 2024-09-14 00:43:43,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=49532.333333333336, ans=0.04949747468305833 2024-09-14 00:43:45,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=49532.333333333336, ans=0.2 2024-09-14 00:44:12,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=49589.0, ans=0.2 2024-09-14 00:44:17,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=49589.0, ans=0.125 2024-09-14 00:44:35,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.377e+02 2.710e+02 3.190e+02 4.745e+02, threshold=5.421e+02, percent-clipped=0.0 2024-09-14 00:44:53,856 INFO [train.py:1198] (1/2) Epoch 3, batch 4750, loss[loss=0.3157, ctc_loss=0.2312, cr_loss=0.4223, over 20980.00 frames. ], tot_loss[loss=0.3435, ctc_loss=0.2584, cr_loss=0.4258, over 4103192.07 frames. ], batch size: 52, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:45:33,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49730.666666666664, ans=0.125 2024-09-14 00:45:51,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=49759.0, ans=0.0 2024-09-14 00:45:56,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=49787.333333333336, ans=0.025 2024-09-14 00:46:12,306 INFO [train.py:1198] (1/2) Epoch 3, batch 4800, loss[loss=0.348, ctc_loss=0.2624, cr_loss=0.4281, over 20638.00 frames. ], tot_loss[loss=0.3421, ctc_loss=0.2571, cr_loss=0.4247, over 4104197.98 frames. ], batch size: 68, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:46:15,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=49815.666666666664, ans=0.2 2024-09-14 00:46:18,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=49815.666666666664, ans=0.015 2024-09-14 00:46:26,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=49844.0, ans=0.2 2024-09-14 00:46:45,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-14 00:47:09,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.313e+02 2.745e+02 3.230e+02 4.598e+02, threshold=5.490e+02, percent-clipped=0.0 2024-09-14 00:47:26,890 INFO [train.py:1198] (1/2) Epoch 3, batch 4850, loss[loss=0.3351, ctc_loss=0.2532, cr_loss=0.4096, over 20846.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.2575, cr_loss=0.4242, over 4096191.18 frames. ], batch size: 57, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:47:33,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=49957.333333333336, ans=0.0 2024-09-14 00:48:26,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=50070.666666666664, ans=0.025 2024-09-14 00:48:30,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=50070.666666666664, ans=0.0 2024-09-14 00:48:42,299 INFO [train.py:1198] (1/2) Epoch 3, batch 4900, loss[loss=0.2777, ctc_loss=0.2025, cr_loss=0.376, over 19918.00 frames. ], tot_loss[loss=0.3404, ctc_loss=0.2559, cr_loss=0.4224, over 4094103.05 frames. ], batch size: 44, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:48:42,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=50099.0, ans=0.125 2024-09-14 00:49:28,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=50184.0, ans=0.125 2024-09-14 00:49:38,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.313e+02 2.593e+02 2.985e+02 4.393e+02, threshold=5.187e+02, percent-clipped=0.0 2024-09-14 00:49:52,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=50212.333333333336, ans=0.0 2024-09-14 00:49:54,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=50212.333333333336, ans=0.95 2024-09-14 00:49:54,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=50212.333333333336, ans=0.125 2024-09-14 00:49:56,992 INFO [train.py:1198] (1/2) Epoch 3, batch 4950, loss[loss=0.2926, ctc_loss=0.22, cr_loss=0.3633, over 20946.00 frames. ], tot_loss[loss=0.3413, ctc_loss=0.2565, cr_loss=0.4238, over 4099889.62 frames. ], batch size: 49, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:50:31,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=50297.333333333336, ans=0.09899494936611666 2024-09-14 00:51:11,228 INFO [train.py:1198] (1/2) Epoch 3, batch 5000, loss[loss=0.3303, ctc_loss=0.2447, cr_loss=0.4277, over 20897.00 frames. ], tot_loss[loss=0.3406, ctc_loss=0.2559, cr_loss=0.4235, over 4103962.57 frames. ], batch size: 57, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:51:56,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-09-14 00:51:57,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=50467.333333333336, ans=0.125 2024-09-14 00:52:04,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=50467.333333333336, ans=0.0 2024-09-14 00:52:10,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.312e+02 2.677e+02 3.290e+02 7.428e+02, threshold=5.354e+02, percent-clipped=2.0 2024-09-14 00:52:10,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=50467.333333333336, ans=0.2 2024-09-14 00:52:27,704 INFO [train.py:1198] (1/2) Epoch 3, batch 5050, loss[loss=0.456, ctc_loss=0.3578, cr_loss=0.4911, over 13756.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.2574, cr_loss=0.4261, over 4104871.49 frames. ], batch size: 149, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:53:34,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=50637.333333333336, ans=0.125 2024-09-14 00:53:44,028 INFO [train.py:1198] (1/2) Epoch 3, batch 5100, loss[loss=0.3591, ctc_loss=0.2722, cr_loss=0.4344, over 21018.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.2568, cr_loss=0.4261, over 4102758.75 frames. ], batch size: 61, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:53:56,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=50665.666666666664, ans=0.125 2024-09-14 00:53:59,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50694.0, ans=0.1 2024-09-14 00:54:22,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=50722.333333333336, ans=0.125 2024-09-14 00:54:25,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=50722.333333333336, ans=0.0 2024-09-14 00:54:33,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=50750.666666666664, ans=0.125 2024-09-14 00:54:41,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.468e+02 2.944e+02 3.691e+02 6.014e+02, threshold=5.889e+02, percent-clipped=1.0 2024-09-14 00:54:58,510 INFO [train.py:1198] (1/2) Epoch 3, batch 5150, loss[loss=0.3467, ctc_loss=0.261, cr_loss=0.4285, over 20873.00 frames. ], tot_loss[loss=0.3415, ctc_loss=0.2563, cr_loss=0.4261, over 4111859.80 frames. ], batch size: 54, lr: 2.32e-02, grad_scale: 16.0 2024-09-14 00:55:13,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=50835.666666666664, ans=0.2 2024-09-14 00:55:33,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50864.0, ans=0.1 2024-09-14 00:56:04,608 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 00:56:13,022 INFO [train.py:1198] (1/2) Epoch 3, batch 5200, loss[loss=0.309, ctc_loss=0.2305, cr_loss=0.3922, over 20796.00 frames. ], tot_loss[loss=0.3427, ctc_loss=0.2574, cr_loss=0.4267, over 4100183.45 frames. ], batch size: 53, lr: 2.31e-02, grad_scale: 32.0 2024-09-14 00:56:43,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=12.0 2024-09-14 00:56:46,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=51005.666666666664, ans=0.125 2024-09-14 00:56:58,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=51034.0, ans=0.025 2024-09-14 00:57:11,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51062.333333333336, ans=0.1 2024-09-14 00:57:12,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.592e+02 2.910e+02 3.799e+02 5.620e+02, threshold=5.820e+02, percent-clipped=0.0 2024-09-14 00:57:27,230 INFO [train.py:1198] (1/2) Epoch 3, batch 5250, loss[loss=0.3443, ctc_loss=0.2548, cr_loss=0.4474, over 20831.00 frames. ], tot_loss[loss=0.3431, ctc_loss=0.2579, cr_loss=0.4259, over 4084770.67 frames. ], batch size: 65, lr: 2.31e-02, grad_scale: 16.0 2024-09-14 00:57:30,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=51090.666666666664, ans=0.125 2024-09-14 00:58:06,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=51147.333333333336, ans=0.0 2024-09-14 00:58:25,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=51204.0, ans=0.0 2024-09-14 00:58:40,575 INFO [train.py:1198] (1/2) Epoch 3, batch 5300, loss[loss=0.3404, ctc_loss=0.254, cr_loss=0.432, over 20881.00 frames. ], tot_loss[loss=0.3425, ctc_loss=0.2573, cr_loss=0.4257, over 4089091.25 frames. ], batch size: 57, lr: 2.31e-02, grad_scale: 16.0 2024-09-14 00:58:42,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-14 00:59:12,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51289.0, ans=0.125 2024-09-14 00:59:39,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.292e+02 2.469e+02 2.814e+02 4.070e+02, threshold=4.939e+02, percent-clipped=0.0 2024-09-14 00:59:40,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=51345.666666666664, ans=0.125 2024-09-14 00:59:55,179 INFO [train.py:1198] (1/2) Epoch 3, batch 5350, loss[loss=0.3873, ctc_loss=0.293, cr_loss=0.4713, over 20847.00 frames. ], tot_loss[loss=0.3429, ctc_loss=0.2577, cr_loss=0.4263, over 4084880.08 frames. ], batch size: 65, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:00:13,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=51402.333333333336, ans=0.025 2024-09-14 01:00:13,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-14 01:00:46,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=51459.0, ans=0.0 2024-09-14 01:00:55,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=51487.333333333336, ans=0.025 2024-09-14 01:00:58,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51487.333333333336, ans=0.1 2024-09-14 01:01:10,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2024-09-14 01:01:11,362 INFO [train.py:1198] (1/2) Epoch 3, batch 5400, loss[loss=0.3173, ctc_loss=0.2288, cr_loss=0.4425, over 20213.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.2574, cr_loss=0.4257, over 4081495.62 frames. ], batch size: 45, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:01:13,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=51515.666666666664, ans=0.0 2024-09-14 01:01:20,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51515.666666666664, ans=0.125 2024-09-14 01:02:04,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=51600.666666666664, ans=0.125 2024-09-14 01:02:07,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51600.666666666664, ans=0.125 2024-09-14 01:02:09,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51629.0, ans=0.1 2024-09-14 01:02:10,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.396e+02 2.735e+02 3.180e+02 5.536e+02, threshold=5.470e+02, percent-clipped=4.0 2024-09-14 01:02:11,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2024-09-14 01:02:16,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51629.0, ans=0.125 2024-09-14 01:02:21,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=51629.0, ans=15.0 2024-09-14 01:02:27,332 INFO [train.py:1198] (1/2) Epoch 3, batch 5450, loss[loss=0.3514, ctc_loss=0.2657, cr_loss=0.4284, over 21055.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.2569, cr_loss=0.425, over 4084368.48 frames. ], batch size: 62, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:03:06,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=51714.0, ans=0.0 2024-09-14 01:03:15,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=51742.333333333336, ans=0.0 2024-09-14 01:03:41,434 INFO [train.py:1198] (1/2) Epoch 3, batch 5500, loss[loss=0.362, ctc_loss=0.2739, cr_loss=0.4408, over 20842.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.2586, cr_loss=0.4285, over 4098946.14 frames. ], batch size: 65, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:04:26,284 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:04:40,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.225e+02 2.487e+02 2.765e+02 5.183e+02, threshold=4.975e+02, percent-clipped=0.0 2024-09-14 01:04:50,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-14 01:04:55,305 INFO [train.py:1198] (1/2) Epoch 3, batch 5550, loss[loss=0.3782, ctc_loss=0.2838, cr_loss=0.4717, over 20340.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.2581, cr_loss=0.4283, over 4101076.15 frames. ], batch size: 74, lr: 2.29e-02, grad_scale: 16.0 2024-09-14 01:04:59,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51940.666666666664, ans=0.1 2024-09-14 01:05:10,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=51969.0, ans=0.125 2024-09-14 01:06:09,022 INFO [train.py:1198] (1/2) Epoch 3, batch 5600, loss[loss=0.3007, ctc_loss=0.226, cr_loss=0.3735, over 20982.00 frames. ], tot_loss[loss=0.3436, ctc_loss=0.258, cr_loss=0.4279, over 4101724.89 frames. ], batch size: 52, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:06:19,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=52082.333333333336, ans=0.015 2024-09-14 01:07:08,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.606e+02 2.917e+02 3.457e+02 7.046e+02, threshold=5.834e+02, percent-clipped=6.0 2024-09-14 01:07:11,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=52195.666666666664, ans=0.07 2024-09-14 01:07:23,317 INFO [train.py:1198] (1/2) Epoch 3, batch 5650, loss[loss=0.3264, ctc_loss=0.2443, cr_loss=0.4106, over 20982.00 frames. ], tot_loss[loss=0.3452, ctc_loss=0.2596, cr_loss=0.428, over 4051552.77 frames. ], batch size: 55, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:08:05,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52280.666666666664, ans=0.125 2024-09-14 01:08:21,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=52337.333333333336, ans=0.125 2024-09-14 01:08:37,538 INFO [train.py:1198] (1/2) Epoch 3, batch 5700, loss[loss=0.294, ctc_loss=0.2164, cr_loss=0.3881, over 20954.00 frames. ], tot_loss[loss=0.344, ctc_loss=0.2584, cr_loss=0.4278, over 4059402.80 frames. ], batch size: 49, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:08:58,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=52394.0, ans=0.0 2024-09-14 01:09:39,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.391e+02 2.677e+02 3.083e+02 5.761e+02, threshold=5.355e+02, percent-clipped=1.0 2024-09-14 01:09:54,682 INFO [train.py:1198] (1/2) Epoch 3, batch 5750, loss[loss=0.3734, ctc_loss=0.2845, cr_loss=0.4443, over 20292.00 frames. ], tot_loss[loss=0.3433, ctc_loss=0.2578, cr_loss=0.4274, over 4070792.60 frames. ], batch size: 74, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:10:00,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52507.333333333336, ans=0.125 2024-09-14 01:10:32,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=52564.0, ans=0.125 2024-09-14 01:10:45,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-14 01:10:50,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=52592.333333333336, ans=0.125 2024-09-14 01:11:10,207 INFO [train.py:1198] (1/2) Epoch 3, batch 5800, loss[loss=0.3819, ctc_loss=0.2916, cr_loss=0.4515, over 18403.00 frames. ], tot_loss[loss=0.3427, ctc_loss=0.2572, cr_loss=0.4274, over 4082128.57 frames. ], batch size: 108, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:11:41,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=52705.666666666664, ans=0.0 2024-09-14 01:12:09,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.414e+02 2.743e+02 3.288e+02 5.794e+02, threshold=5.486e+02, percent-clipped=3.0 2024-09-14 01:12:11,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=52762.333333333336, ans=0.125 2024-09-14 01:12:20,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-09-14 01:12:24,257 INFO [train.py:1198] (1/2) Epoch 3, batch 5850, loss[loss=0.4131, ctc_loss=0.3238, cr_loss=0.4469, over 18114.00 frames. ], tot_loss[loss=0.3429, ctc_loss=0.2574, cr_loss=0.4278, over 4085293.07 frames. ], batch size: 108, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:12:30,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=52790.666666666664, ans=0.125 2024-09-14 01:13:00,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=52847.333333333336, ans=0.125 2024-09-14 01:13:07,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52875.666666666664, ans=0.125 2024-09-14 01:13:14,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52875.666666666664, ans=0.125 2024-09-14 01:13:35,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-14 01:13:37,678 INFO [train.py:1198] (1/2) Epoch 3, batch 5900, loss[loss=0.2979, ctc_loss=0.2182, cr_loss=0.3985, over 20996.00 frames. ], tot_loss[loss=0.3434, ctc_loss=0.2578, cr_loss=0.4277, over 4073001.12 frames. ], batch size: 52, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:13:58,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=52960.666666666664, ans=0.025 2024-09-14 01:14:15,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52989.0, ans=0.1 2024-09-14 01:14:37,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.295e+02 2.684e+02 3.322e+02 6.255e+02, threshold=5.369e+02, percent-clipped=1.0 2024-09-14 01:14:52,317 INFO [train.py:1198] (1/2) Epoch 3, batch 5950, loss[loss=0.3711, ctc_loss=0.2823, cr_loss=0.4441, over 21065.00 frames. ], tot_loss[loss=0.3449, ctc_loss=0.2591, cr_loss=0.4288, over 4075631.04 frames. ], batch size: 59, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:14:52,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=53074.0, ans=0.0 2024-09-14 01:15:14,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=53102.333333333336, ans=0.125 2024-09-14 01:15:42,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53159.0, ans=0.1 2024-09-14 01:16:01,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53187.333333333336, ans=0.1 2024-09-14 01:16:05,316 INFO [train.py:1198] (1/2) Epoch 3, batch 6000, loss[loss=0.3386, ctc_loss=0.252, cr_loss=0.4333, over 20941.00 frames. ], tot_loss[loss=0.3453, ctc_loss=0.2592, cr_loss=0.4306, over 4085590.38 frames. ], batch size: 60, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:16:05,317 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 01:16:23,883 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.08739, ctc_loss=0.08739, cr_loss=9.228e-15, over 944034.00 frames. 2024-09-14 01:16:23,884 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 01:16:30,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=53215.666666666664, ans=0.2 2024-09-14 01:16:50,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-14 01:16:51,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=53244.0, ans=0.125 2024-09-14 01:17:00,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=53272.333333333336, ans=0.125 2024-09-14 01:17:06,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53272.333333333336, ans=0.125 2024-09-14 01:17:25,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.436e+02 2.868e+02 3.699e+02 6.884e+02, threshold=5.737e+02, percent-clipped=5.0 2024-09-14 01:17:38,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=53329.0, ans=0.02 2024-09-14 01:17:41,298 INFO [train.py:1198] (1/2) Epoch 3, batch 6050, loss[loss=0.2874, ctc_loss=0.2163, cr_loss=0.3555, over 20969.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.2583, cr_loss=0.4295, over 4084909.97 frames. ], batch size: 48, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:18:02,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-09-14 01:18:11,269 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:18:31,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=53442.333333333336, ans=0.125 2024-09-14 01:18:56,684 INFO [train.py:1198] (1/2) Epoch 3, batch 6100, loss[loss=0.3377, ctc_loss=0.2469, cr_loss=0.454, over 20877.00 frames. ], tot_loss[loss=0.3434, ctc_loss=0.2577, cr_loss=0.4283, over 4086408.43 frames. ], batch size: 57, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:19:03,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=53499.0, ans=0.2 2024-09-14 01:19:30,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=53555.666666666664, ans=0.0 2024-09-14 01:19:53,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-14 01:19:55,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.271e+02 2.566e+02 3.060e+02 5.343e+02, threshold=5.133e+02, percent-clipped=0.0 2024-09-14 01:20:10,720 INFO [train.py:1198] (1/2) Epoch 3, batch 6150, loss[loss=0.3191, ctc_loss=0.2355, cr_loss=0.4179, over 20868.00 frames. ], tot_loss[loss=0.3416, ctc_loss=0.2563, cr_loss=0.4265, over 4081001.25 frames. ], batch size: 57, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:20:17,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53640.666666666664, ans=0.1 2024-09-14 01:20:22,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-14 01:20:27,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=53669.0, ans=0.2 2024-09-14 01:21:22,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=53782.333333333336, ans=0.125 2024-09-14 01:21:23,992 INFO [train.py:1198] (1/2) Epoch 3, batch 6200, loss[loss=0.3104, ctc_loss=0.2287, cr_loss=0.4085, over 19976.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.257, cr_loss=0.4265, over 4054170.22 frames. ], batch size: 44, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:21:27,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=53782.333333333336, ans=0.125 2024-09-14 01:21:32,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=53782.333333333336, ans=0.04949747468305833 2024-09-14 01:22:07,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=53867.333333333336, ans=0.05 2024-09-14 01:22:22,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.487e+02 3.106e+02 3.866e+02 8.162e+02, threshold=6.213e+02, percent-clipped=4.0 2024-09-14 01:22:38,051 INFO [train.py:1198] (1/2) Epoch 3, batch 6250, loss[loss=0.3747, ctc_loss=0.2855, cr_loss=0.4458, over 19957.00 frames. ], tot_loss[loss=0.3434, ctc_loss=0.2581, cr_loss=0.4268, over 4027392.37 frames. ], batch size: 80, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:22:41,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=53924.0, ans=0.125 2024-09-14 01:22:41,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=53924.0, ans=0.2 2024-09-14 01:22:47,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=53924.0, ans=0.2 2024-09-14 01:22:50,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=53924.0, ans=0.2 2024-09-14 01:22:58,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53952.333333333336, ans=0.125 2024-09-14 01:23:02,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-14 01:23:50,345 INFO [train.py:1198] (1/2) Epoch 3, batch 6300, loss[loss=0.3659, ctc_loss=0.2769, cr_loss=0.4452, over 20740.00 frames. ], tot_loss[loss=0.3476, ctc_loss=0.262, cr_loss=0.428, over 3967204.99 frames. ], batch size: 71, lr: 2.25e-02, grad_scale: 32.0 2024-09-14 01:23:53,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54065.666666666664, ans=0.1 2024-09-14 01:24:09,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=54094.0, ans=0.025 2024-09-14 01:24:49,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.498e+02 2.907e+02 3.648e+02 6.562e+02, threshold=5.813e+02, percent-clipped=1.0 2024-09-14 01:24:58,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=54179.0, ans=0.05 2024-09-14 01:25:02,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=54207.333333333336, ans=0.0 2024-09-14 01:25:02,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-14 01:25:03,502 INFO [train.py:1198] (1/2) Epoch 3, batch 6350, loss[loss=0.3962, ctc_loss=0.3142, cr_loss=0.4099, over 14496.00 frames. ], tot_loss[loss=0.3517, ctc_loss=0.2659, cr_loss=0.4291, over 3881028.91 frames. ], batch size: 151, lr: 2.25e-02, grad_scale: 32.0 2024-09-14 01:25:19,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=54235.666666666664, ans=0.0 2024-09-14 01:25:49,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-14 01:25:58,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-14 01:26:47,831 INFO [train.py:1198] (1/2) Epoch 4, batch 0, loss[loss=0.3431, ctc_loss=0.2642, cr_loss=0.3944, over 20767.00 frames. ], tot_loss[loss=0.3431, ctc_loss=0.2642, cr_loss=0.3944, over 20767.00 frames. ], batch size: 56, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:26:47,831 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 01:27:06,295 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.08861, ctc_loss=0.08861, cr_loss=9.595e-15, over 944034.00 frames. 2024-09-14 01:27:06,296 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 01:27:08,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.12 vs. limit=10.0 2024-09-14 01:28:02,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=54408.5, ans=0.025 2024-09-14 01:28:14,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=54436.833333333336, ans=0.04949747468305833 2024-09-14 01:28:19,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.466e+02 2.778e+02 3.325e+02 4.890e+02, threshold=5.557e+02, percent-clipped=0.0 2024-09-14 01:28:21,411 INFO [train.py:1198] (1/2) Epoch 4, batch 50, loss[loss=0.3753, ctc_loss=0.285, cr_loss=0.4511, over 20844.00 frames. ], tot_loss[loss=0.3311, ctc_loss=0.2468, cr_loss=0.4215, over 927129.39 frames. ], batch size: 65, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:28:44,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=54493.5, ans=0.2 2024-09-14 01:28:56,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=54521.833333333336, ans=0.125 2024-09-14 01:28:58,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-09-14 01:29:08,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=54550.166666666664, ans=0.125 2024-09-14 01:29:18,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=54550.166666666664, ans=0.025 2024-09-14 01:29:39,508 INFO [train.py:1198] (1/2) Epoch 4, batch 100, loss[loss=0.2817, ctc_loss=0.2075, cr_loss=0.3713, over 19562.00 frames. ], tot_loss[loss=0.3338, ctc_loss=0.2494, cr_loss=0.4221, over 1626669.50 frames. ], batch size: 43, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:29:49,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=54606.833333333336, ans=0.125 2024-09-14 01:29:50,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=54606.833333333336, ans=0.2 2024-09-14 01:30:04,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54635.166666666664, ans=0.1 2024-09-14 01:30:10,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=54635.166666666664, ans=0.125 2024-09-14 01:30:14,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2024-09-14 01:30:26,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=54691.833333333336, ans=0.125 2024-09-14 01:30:33,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2024-09-14 01:30:48,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=54720.166666666664, ans=0.0 2024-09-14 01:30:56,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.292e+02 2.583e+02 3.089e+02 4.761e+02, threshold=5.167e+02, percent-clipped=0.0 2024-09-14 01:30:57,525 INFO [train.py:1198] (1/2) Epoch 4, batch 150, loss[loss=0.3851, ctc_loss=0.2926, cr_loss=0.4624, over 20694.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.2503, cr_loss=0.4236, over 2185233.35 frames. ], batch size: 66, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:31:32,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-09-14 01:31:36,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=54805.166666666664, ans=0.125 2024-09-14 01:31:43,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-14 01:31:47,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=54833.5, ans=0.125 2024-09-14 01:32:12,447 INFO [train.py:1198] (1/2) Epoch 4, batch 200, loss[loss=0.3437, ctc_loss=0.2595, cr_loss=0.4211, over 20875.00 frames. ], tot_loss[loss=0.3362, ctc_loss=0.2512, cr_loss=0.425, over 2607240.20 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2024-09-14 01:32:21,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54890.166666666664, ans=0.125 2024-09-14 01:32:27,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=54918.5, ans=0.0 2024-09-14 01:32:33,487 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:32:52,049 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:33:02,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=54975.166666666664, ans=0.2 2024-09-14 01:33:25,640 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.290e+02 2.532e+02 2.846e+02 4.959e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 01:33:27,162 INFO [train.py:1198] (1/2) Epoch 4, batch 250, loss[loss=0.3148, ctc_loss=0.2384, cr_loss=0.3819, over 20982.00 frames. ], tot_loss[loss=0.3371, ctc_loss=0.252, cr_loss=0.4251, over 2931755.05 frames. ], batch size: 55, lr: 2.09e-02, grad_scale: 32.0 2024-09-14 01:34:02,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=55088.5, ans=0.0 2024-09-14 01:34:42,635 INFO [train.py:1198] (1/2) Epoch 4, batch 300, loss[loss=0.3023, ctc_loss=0.2226, cr_loss=0.3984, over 20994.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.2519, cr_loss=0.4245, over 3180934.51 frames. ], batch size: 52, lr: 2.09e-02, grad_scale: 16.0 2024-09-14 01:35:01,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-14 01:35:02,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=22.5 2024-09-14 01:35:48,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=55286.833333333336, ans=0.125 2024-09-14 01:36:02,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=55286.833333333336, ans=0.125 2024-09-14 01:36:04,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.239e+02 2.447e+02 2.806e+02 5.533e+02, threshold=4.895e+02, percent-clipped=1.0 2024-09-14 01:36:04,713 INFO [train.py:1198] (1/2) Epoch 4, batch 350, loss[loss=0.3499, ctc_loss=0.2587, cr_loss=0.456, over 20970.00 frames. ], tot_loss[loss=0.3344, ctc_loss=0.2499, cr_loss=0.4225, over 3394125.62 frames. ], batch size: 64, lr: 2.09e-02, grad_scale: 16.0 2024-09-14 01:36:17,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=55315.166666666664, ans=22.5 2024-09-14 01:36:32,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=55343.5, ans=0.125 2024-09-14 01:36:47,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=55371.833333333336, ans=0.125 2024-09-14 01:36:51,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=55400.166666666664, ans=0.125 2024-09-14 01:37:19,885 INFO [train.py:1198] (1/2) Epoch 4, batch 400, loss[loss=0.3071, ctc_loss=0.2285, cr_loss=0.3931, over 20773.00 frames. ], tot_loss[loss=0.3339, ctc_loss=0.2494, cr_loss=0.4223, over 3553518.27 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:37:20,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=55456.833333333336, ans=0.125 2024-09-14 01:38:08,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=55541.833333333336, ans=0.125 2024-09-14 01:38:17,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55541.833333333336, ans=0.0 2024-09-14 01:38:35,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.264e+02 2.550e+02 3.085e+02 5.220e+02, threshold=5.100e+02, percent-clipped=1.0 2024-09-14 01:38:35,122 INFO [train.py:1198] (1/2) Epoch 4, batch 450, loss[loss=0.3418, ctc_loss=0.2609, cr_loss=0.4045, over 20069.00 frames. ], tot_loss[loss=0.3328, ctc_loss=0.2484, cr_loss=0.4222, over 3684894.11 frames. ], batch size: 80, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:38:44,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=55598.5, ans=0.2 2024-09-14 01:39:01,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-14 01:39:37,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=12.0 2024-09-14 01:39:50,575 INFO [train.py:1198] (1/2) Epoch 4, batch 500, loss[loss=0.2993, ctc_loss=0.2188, cr_loss=0.4022, over 20891.00 frames. ], tot_loss[loss=0.3326, ctc_loss=0.248, cr_loss=0.4226, over 3778598.94 frames. ], batch size: 54, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:40:14,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=55768.5, ans=0.125 2024-09-14 01:41:09,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.255e+02 2.488e+02 2.846e+02 4.809e+02, threshold=4.975e+02, percent-clipped=0.0 2024-09-14 01:41:09,349 INFO [train.py:1198] (1/2) Epoch 4, batch 550, loss[loss=0.3725, ctc_loss=0.2821, cr_loss=0.4516, over 20625.00 frames. ], tot_loss[loss=0.3325, ctc_loss=0.2479, cr_loss=0.4231, over 3853734.45 frames. ], batch size: 71, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:41:41,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=55938.5, ans=0.125 2024-09-14 01:42:01,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=55966.833333333336, ans=0.0 2024-09-14 01:42:07,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=55966.833333333336, ans=0.04949747468305833 2024-09-14 01:42:27,946 INFO [train.py:1198] (1/2) Epoch 4, batch 600, loss[loss=0.3633, ctc_loss=0.2774, cr_loss=0.4296, over 19409.00 frames. ], tot_loss[loss=0.3348, ctc_loss=0.2498, cr_loss=0.4252, over 3899517.25 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:42:38,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=56023.5, ans=0.025 2024-09-14 01:43:04,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=56080.166666666664, ans=0.0 2024-09-14 01:43:43,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.280e+02 2.697e+02 3.155e+02 5.118e+02, threshold=5.393e+02, percent-clipped=2.0 2024-09-14 01:43:43,465 INFO [train.py:1198] (1/2) Epoch 4, batch 650, loss[loss=0.3354, ctc_loss=0.2458, cr_loss=0.448, over 21045.00 frames. ], tot_loss[loss=0.3333, ctc_loss=0.2485, cr_loss=0.4243, over 3949629.60 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:44:15,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=56221.833333333336, ans=0.0 2024-09-14 01:44:31,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=56250.166666666664, ans=0.125 2024-09-14 01:44:45,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=56278.5, ans=0.2 2024-09-14 01:44:51,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=56278.5, ans=0.125 2024-09-14 01:44:58,535 INFO [train.py:1198] (1/2) Epoch 4, batch 700, loss[loss=0.2746, ctc_loss=0.2, cr_loss=0.3728, over 20958.00 frames. ], tot_loss[loss=0.3335, ctc_loss=0.2486, cr_loss=0.4243, over 3982709.67 frames. ], batch size: 50, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:45:41,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-14 01:46:00,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=56420.166666666664, ans=0.125 2024-09-14 01:46:16,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.228e+02 2.527e+02 3.007e+02 4.463e+02, threshold=5.055e+02, percent-clipped=0.0 2024-09-14 01:46:16,760 INFO [train.py:1198] (1/2) Epoch 4, batch 750, loss[loss=0.319, ctc_loss=0.2358, cr_loss=0.4163, over 21063.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2481, cr_loss=0.4228, over 4010399.54 frames. ], batch size: 53, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:46:32,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=56476.833333333336, ans=0.125 2024-09-14 01:46:55,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=56505.166666666664, ans=0.04949747468305833 2024-09-14 01:46:57,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=56505.166666666664, ans=0.0 2024-09-14 01:47:04,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=56533.5, ans=0.125 2024-09-14 01:47:05,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=56533.5, ans=0.0 2024-09-14 01:47:28,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=56561.833333333336, ans=0.2 2024-09-14 01:47:35,713 INFO [train.py:1198] (1/2) Epoch 4, batch 800, loss[loss=0.3137, ctc_loss=0.2325, cr_loss=0.406, over 20340.00 frames. ], tot_loss[loss=0.333, ctc_loss=0.2485, cr_loss=0.4227, over 4024776.45 frames. ], batch size: 45, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:48:06,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56646.833333333336, ans=0.1 2024-09-14 01:48:20,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=56675.166666666664, ans=0.125 2024-09-14 01:48:26,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2024-09-14 01:48:51,659 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.334e+02 2.553e+02 2.998e+02 6.483e+02, threshold=5.106e+02, percent-clipped=2.0 2024-09-14 01:48:51,678 INFO [train.py:1198] (1/2) Epoch 4, batch 850, loss[loss=0.2776, ctc_loss=0.2024, cr_loss=0.3762, over 20957.00 frames. ], tot_loss[loss=0.3324, ctc_loss=0.248, cr_loss=0.4224, over 4039458.34 frames. ], batch size: 48, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:48:55,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=56731.833333333336, ans=0.2 2024-09-14 01:49:08,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=56760.166666666664, ans=0.04949747468305833 2024-09-14 01:50:06,727 INFO [train.py:1198] (1/2) Epoch 4, batch 900, loss[loss=0.3352, ctc_loss=0.2478, cr_loss=0.4371, over 20739.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2482, cr_loss=0.4225, over 4051836.85 frames. ], batch size: 71, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:50:46,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-14 01:50:54,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=56958.5, ans=0.0 2024-09-14 01:51:11,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=15.0 2024-09-14 01:51:12,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=56986.833333333336, ans=0.125 2024-09-14 01:51:21,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.187e+02 2.451e+02 2.751e+02 5.212e+02, threshold=4.902e+02, percent-clipped=1.0 2024-09-14 01:51:21,116 INFO [train.py:1198] (1/2) Epoch 4, batch 950, loss[loss=0.3424, ctc_loss=0.2534, cr_loss=0.445, over 20660.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2482, cr_loss=0.4225, over 4064330.65 frames. ], batch size: 68, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:51:21,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2024-09-14 01:51:27,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=57015.166666666664, ans=0.0 2024-09-14 01:51:29,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=57015.166666666664, ans=0.125 2024-09-14 01:51:41,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=57043.5, ans=0.0 2024-09-14 01:52:02,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-14 01:52:42,040 INFO [train.py:1198] (1/2) Epoch 4, batch 1000, loss[loss=0.3602, ctc_loss=0.271, cr_loss=0.4464, over 19373.00 frames. ], tot_loss[loss=0.3324, ctc_loss=0.248, cr_loss=0.4221, over 4065873.56 frames. ], batch size: 90, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:53:12,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=57213.5, ans=0.2 2024-09-14 01:53:30,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-14 01:53:43,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57270.166666666664, ans=0.125 2024-09-14 01:53:56,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.263e+02 2.491e+02 2.829e+02 5.827e+02, threshold=4.982e+02, percent-clipped=1.0 2024-09-14 01:53:56,693 INFO [train.py:1198] (1/2) Epoch 4, batch 1050, loss[loss=0.3038, ctc_loss=0.2274, cr_loss=0.3819, over 20952.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.2481, cr_loss=0.423, over 4076485.60 frames. ], batch size: 48, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:54:16,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2024-09-14 01:54:34,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-09-14 01:54:47,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=57383.5, ans=0.2 2024-09-14 01:55:08,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=22.5 2024-09-14 01:55:11,802 INFO [train.py:1198] (1/2) Epoch 4, batch 1100, loss[loss=0.3493, ctc_loss=0.2575, cr_loss=0.459, over 20681.00 frames. ], tot_loss[loss=0.3313, ctc_loss=0.2469, cr_loss=0.4224, over 4087196.96 frames. ], batch size: 66, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:55:36,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=57468.5, ans=0.125 2024-09-14 01:56:26,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.230e+02 2.445e+02 2.927e+02 4.253e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-14 01:56:26,568 INFO [train.py:1198] (1/2) Epoch 4, batch 1150, loss[loss=0.3468, ctc_loss=0.2604, cr_loss=0.4324, over 19414.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.2469, cr_loss=0.423, over 4100397.91 frames. ], batch size: 90, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:56:28,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-14 01:56:34,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=57581.833333333336, ans=0.0 2024-09-14 01:56:50,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57610.166666666664, ans=0.1 2024-09-14 01:57:07,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2024-09-14 01:57:16,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=57666.833333333336, ans=0.125 2024-09-14 01:57:30,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=57695.166666666664, ans=0.025 2024-09-14 01:57:44,770 INFO [train.py:1198] (1/2) Epoch 4, batch 1200, loss[loss=0.3466, ctc_loss=0.2598, cr_loss=0.4343, over 20839.00 frames. ], tot_loss[loss=0.3334, ctc_loss=0.2485, cr_loss=0.4245, over 4106443.08 frames. ], batch size: 65, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:58:03,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=57751.833333333336, ans=0.035 2024-09-14 01:59:03,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.239e+02 2.446e+02 2.774e+02 4.788e+02, threshold=4.893e+02, percent-clipped=0.0 2024-09-14 01:59:03,668 INFO [train.py:1198] (1/2) Epoch 4, batch 1250, loss[loss=0.296, ctc_loss=0.217, cr_loss=0.395, over 21048.00 frames. ], tot_loss[loss=0.3335, ctc_loss=0.2485, cr_loss=0.4246, over 4104163.97 frames. ], batch size: 53, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:59:24,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=57893.5, ans=0.0 2024-09-14 01:59:30,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=57893.5, ans=0.2 2024-09-14 01:59:36,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=57921.833333333336, ans=0.0 2024-09-14 01:59:50,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57950.166666666664, ans=0.125 2024-09-14 02:00:04,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-09-14 02:00:18,917 INFO [train.py:1198] (1/2) Epoch 4, batch 1300, loss[loss=0.2923, ctc_loss=0.2151, cr_loss=0.386, over 20779.00 frames. ], tot_loss[loss=0.3359, ctc_loss=0.2506, cr_loss=0.4266, over 4082734.35 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:00:25,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=58006.833333333336, ans=0.125 2024-09-14 02:00:46,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=15.0 2024-09-14 02:00:46,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-14 02:00:59,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=58063.5, ans=0.0 2024-09-14 02:01:00,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-14 02:01:05,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=58091.833333333336, ans=0.125 2024-09-14 02:01:08,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=58091.833333333336, ans=10.0 2024-09-14 02:01:33,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.422e+02 2.848e+02 3.328e+02 7.197e+02, threshold=5.697e+02, percent-clipped=3.0 2024-09-14 02:01:33,811 INFO [train.py:1198] (1/2) Epoch 4, batch 1350, loss[loss=0.3578, ctc_loss=0.2699, cr_loss=0.4397, over 19529.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2513, cr_loss=0.4271, over 4078754.86 frames. ], batch size: 90, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:01:41,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=58148.5, ans=0.125 2024-09-14 02:01:46,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2024-09-14 02:02:41,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-14 02:02:45,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-14 02:02:48,639 INFO [train.py:1198] (1/2) Epoch 4, batch 1400, loss[loss=0.3383, ctc_loss=0.2515, cr_loss=0.4342, over 20631.00 frames. ], tot_loss[loss=0.3363, ctc_loss=0.2508, cr_loss=0.4275, over 4093505.96 frames. ], batch size: 68, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:02:56,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=58290.166666666664, ans=0.09899494936611666 2024-09-14 02:02:59,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=58290.166666666664, ans=0.2 2024-09-14 02:03:11,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58318.5, ans=0.1 2024-09-14 02:03:27,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=58346.833333333336, ans=0.0 2024-09-14 02:04:09,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.266e+02 2.530e+02 3.055e+02 6.181e+02, threshold=5.060e+02, percent-clipped=1.0 2024-09-14 02:04:10,011 INFO [train.py:1198] (1/2) Epoch 4, batch 1450, loss[loss=0.2971, ctc_loss=0.2122, cr_loss=0.4246, over 20972.00 frames. ], tot_loss[loss=0.3345, ctc_loss=0.2493, cr_loss=0.4259, over 4092806.16 frames. ], batch size: 51, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:04:43,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58488.5, ans=0.125 2024-09-14 02:05:11,726 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:05:25,165 INFO [train.py:1198] (1/2) Epoch 4, batch 1500, loss[loss=0.3233, ctc_loss=0.2361, cr_loss=0.4364, over 21025.00 frames. ], tot_loss[loss=0.3336, ctc_loss=0.2485, cr_loss=0.4254, over 4097198.47 frames. ], batch size: 62, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:06:03,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=58630.166666666664, ans=0.125 2024-09-14 02:06:40,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.302e+02 2.552e+02 2.878e+02 4.776e+02, threshold=5.103e+02, percent-clipped=0.0 2024-09-14 02:06:40,853 INFO [train.py:1198] (1/2) Epoch 4, batch 1550, loss[loss=0.2918, ctc_loss=0.2072, cr_loss=0.423, over 20799.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.2468, cr_loss=0.4233, over 4103067.68 frames. ], batch size: 53, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:07:35,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=58800.166666666664, ans=0.2 2024-09-14 02:07:51,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=58828.5, ans=22.5 2024-09-14 02:07:56,503 INFO [train.py:1198] (1/2) Epoch 4, batch 1600, loss[loss=0.379, ctc_loss=0.2878, cr_loss=0.4558, over 19488.00 frames. ], tot_loss[loss=0.3296, ctc_loss=0.2452, cr_loss=0.4215, over 4114340.39 frames. ], batch size: 90, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:08:24,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=58885.166666666664, ans=0.0 2024-09-14 02:08:25,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=58913.5, ans=10.0 2024-09-14 02:08:27,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=58913.5, ans=0.0 2024-09-14 02:08:44,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.43 vs. limit=22.5 2024-09-14 02:09:05,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-14 02:09:14,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.305e+02 2.521e+02 2.941e+02 4.741e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-14 02:09:14,904 INFO [train.py:1198] (1/2) Epoch 4, batch 1650, loss[loss=0.3133, ctc_loss=0.2322, cr_loss=0.4056, over 21034.00 frames. ], tot_loss[loss=0.3307, ctc_loss=0.2462, cr_loss=0.4226, over 4111738.70 frames. ], batch size: 62, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:09:24,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=58998.5, ans=0.125 2024-09-14 02:09:27,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58998.5, ans=0.1 2024-09-14 02:09:44,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=59026.833333333336, ans=0.1 2024-09-14 02:09:59,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=59055.166666666664, ans=0.0 2024-09-14 02:10:08,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=59083.5, ans=0.0 2024-09-14 02:10:29,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=59111.833333333336, ans=0.0 2024-09-14 02:10:33,906 INFO [train.py:1198] (1/2) Epoch 4, batch 1700, loss[loss=0.2685, ctc_loss=0.1979, cr_loss=0.353, over 20969.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2462, cr_loss=0.4236, over 4109108.34 frames. ], batch size: 48, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:10:40,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-14 02:11:05,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59196.833333333336, ans=0.1 2024-09-14 02:11:22,440 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:11:31,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=59225.166666666664, ans=0.0 2024-09-14 02:11:35,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=59253.5, ans=0.0 2024-09-14 02:11:48,875 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.403e+02 2.734e+02 3.429e+02 5.395e+02, threshold=5.469e+02, percent-clipped=3.0 2024-09-14 02:11:48,905 INFO [train.py:1198] (1/2) Epoch 4, batch 1750, loss[loss=0.3075, ctc_loss=0.2261, cr_loss=0.4067, over 19914.00 frames. ], tot_loss[loss=0.3288, ctc_loss=0.2444, cr_loss=0.422, over 4119009.65 frames. ], batch size: 44, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:11:58,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=59281.833333333336, ans=0.125 2024-09-14 02:12:52,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=59395.166666666664, ans=0.125 2024-09-14 02:13:04,079 INFO [train.py:1198] (1/2) Epoch 4, batch 1800, loss[loss=0.3021, ctc_loss=0.2265, cr_loss=0.3778, over 20954.00 frames. ], tot_loss[loss=0.3282, ctc_loss=0.244, cr_loss=0.4212, over 4119509.78 frames. ], batch size: 50, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:13:13,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-14 02:13:26,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=59451.833333333336, ans=0.0 2024-09-14 02:13:34,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=59480.166666666664, ans=0.2 2024-09-14 02:14:05,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-14 02:14:19,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.241e+02 2.476e+02 2.738e+02 4.960e+02, threshold=4.953e+02, percent-clipped=0.0 2024-09-14 02:14:19,647 INFO [train.py:1198] (1/2) Epoch 4, batch 1850, loss[loss=0.3366, ctc_loss=0.2497, cr_loss=0.4344, over 20966.00 frames. ], tot_loss[loss=0.328, ctc_loss=0.2438, cr_loss=0.4209, over 4118578.01 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:14:44,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2024-09-14 02:15:00,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=59621.833333333336, ans=0.2 2024-09-14 02:15:03,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=59621.833333333336, ans=0.125 2024-09-14 02:15:13,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=59650.166666666664, ans=0.0 2024-09-14 02:15:15,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=59650.166666666664, ans=0.0 2024-09-14 02:15:24,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=59678.5, ans=0.2 2024-09-14 02:15:41,189 INFO [train.py:1198] (1/2) Epoch 4, batch 1900, loss[loss=0.3758, ctc_loss=0.2846, cr_loss=0.4561, over 20712.00 frames. ], tot_loss[loss=0.3288, ctc_loss=0.2443, cr_loss=0.4221, over 4115855.78 frames. ], batch size: 68, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:16:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59763.5, ans=0.125 2024-09-14 02:16:45,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59820.166666666664, ans=0.1 2024-09-14 02:16:57,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.325e+02 2.598e+02 2.989e+02 4.886e+02, threshold=5.196e+02, percent-clipped=0.0 2024-09-14 02:16:57,025 INFO [train.py:1198] (1/2) Epoch 4, batch 1950, loss[loss=0.2885, ctc_loss=0.2149, cr_loss=0.3682, over 21054.00 frames. ], tot_loss[loss=0.3283, ctc_loss=0.2439, cr_loss=0.4217, over 4120630.48 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:17:21,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=59876.833333333336, ans=0.025 2024-09-14 02:17:39,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=59905.166666666664, ans=0.125 2024-09-14 02:18:12,644 INFO [train.py:1198] (1/2) Epoch 4, batch 2000, loss[loss=0.3674, ctc_loss=0.2822, cr_loss=0.4263, over 20034.00 frames. ], tot_loss[loss=0.3282, ctc_loss=0.2439, cr_loss=0.4219, over 4109340.62 frames. ], batch size: 80, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:18:37,145 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:19:27,596 INFO [train.py:1198] (1/2) Epoch 4, batch 2050, loss[loss=0.2779, ctc_loss=0.2053, cr_loss=0.3629, over 20933.00 frames. ], tot_loss[loss=0.3286, ctc_loss=0.2442, cr_loss=0.4221, over 4114021.52 frames. ], batch size: 49, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:19:29,092 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.164e+02 2.429e+02 2.730e+02 4.906e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-14 02:19:34,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=60131.833333333336, ans=0.0 2024-09-14 02:19:37,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=60131.833333333336, ans=0.0 2024-09-14 02:20:13,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60216.833333333336, ans=0.1 2024-09-14 02:20:46,043 INFO [train.py:1198] (1/2) Epoch 4, batch 2100, loss[loss=0.3367, ctc_loss=0.2497, cr_loss=0.435, over 20976.00 frames. ], tot_loss[loss=0.3279, ctc_loss=0.2435, cr_loss=0.422, over 4119561.82 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:20:56,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2024-09-14 02:21:01,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60301.833333333336, ans=0.0 2024-09-14 02:22:04,915 INFO [train.py:1198] (1/2) Epoch 4, batch 2150, loss[loss=0.3411, ctc_loss=0.2527, cr_loss=0.4419, over 20810.00 frames. ], tot_loss[loss=0.3279, ctc_loss=0.2434, cr_loss=0.4225, over 4118199.37 frames. ], batch size: 59, lr: 2.01e-02, grad_scale: 16.0 2024-09-14 02:22:07,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.354e+02 2.656e+02 3.093e+02 5.783e+02, threshold=5.313e+02, percent-clipped=1.0 2024-09-14 02:22:17,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=60415.166666666664, ans=0.125 2024-09-14 02:22:21,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-14 02:22:32,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=60443.5, ans=0.2 2024-09-14 02:22:36,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60471.833333333336, ans=0.1 2024-09-14 02:22:47,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=60471.833333333336, ans=0.125 2024-09-14 02:22:56,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=60500.166666666664, ans=0.2 2024-09-14 02:23:20,086 INFO [train.py:1198] (1/2) Epoch 4, batch 2200, loss[loss=0.3266, ctc_loss=0.2456, cr_loss=0.4051, over 20951.00 frames. ], tot_loss[loss=0.3295, ctc_loss=0.2448, cr_loss=0.4236, over 4112110.18 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 16.0 2024-09-14 02:23:29,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60556.833333333336, ans=0.125 2024-09-14 02:24:18,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=22.5 2024-09-14 02:24:33,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60670.166666666664, ans=0.1 2024-09-14 02:24:34,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:24:35,702 INFO [train.py:1198] (1/2) Epoch 4, batch 2250, loss[loss=0.3751, ctc_loss=0.2789, cr_loss=0.4809, over 20106.00 frames. ], tot_loss[loss=0.3279, ctc_loss=0.2433, cr_loss=0.4228, over 4122525.66 frames. ], batch size: 80, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:24:38,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.228e+02 2.504e+02 2.906e+02 4.727e+02, threshold=5.007e+02, percent-clipped=0.0 2024-09-14 02:24:40,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=60698.5, ans=0.125 2024-09-14 02:25:51,392 INFO [train.py:1198] (1/2) Epoch 4, batch 2300, loss[loss=0.2955, ctc_loss=0.214, cr_loss=0.4075, over 19890.00 frames. ], tot_loss[loss=0.3271, ctc_loss=0.2427, cr_loss=0.4219, over 4117343.92 frames. ], batch size: 44, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:25:51,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=60840.166666666664, ans=0.2 2024-09-14 02:25:53,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=22.5 2024-09-14 02:26:08,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-09-14 02:27:12,086 INFO [train.py:1198] (1/2) Epoch 4, batch 2350, loss[loss=0.373, ctc_loss=0.2858, cr_loss=0.4358, over 18136.00 frames. ], tot_loss[loss=0.3287, ctc_loss=0.2441, cr_loss=0.4229, over 4111687.74 frames. ], batch size: 108, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:27:15,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.371e+02 2.641e+02 3.214e+02 5.050e+02, threshold=5.283e+02, percent-clipped=1.0 2024-09-14 02:27:39,923 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:27:52,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-09-14 02:28:27,477 INFO [train.py:1198] (1/2) Epoch 4, batch 2400, loss[loss=0.2903, ctc_loss=0.2114, cr_loss=0.3945, over 20976.00 frames. ], tot_loss[loss=0.3277, ctc_loss=0.2432, cr_loss=0.4225, over 4124742.65 frames. ], batch size: 51, lr: 2.00e-02, grad_scale: 32.0 2024-09-14 02:28:29,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=61123.5, ans=0.0 2024-09-14 02:28:29,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=61123.5, ans=0.0 2024-09-14 02:28:55,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=61151.833333333336, ans=0.0 2024-09-14 02:29:07,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-14 02:29:15,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=61208.5, ans=0.07 2024-09-14 02:29:22,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=61208.5, ans=0.2 2024-09-14 02:29:43,076 INFO [train.py:1198] (1/2) Epoch 4, batch 2450, loss[loss=0.4026, ctc_loss=0.311, cr_loss=0.4581, over 18140.00 frames. ], tot_loss[loss=0.3263, ctc_loss=0.2422, cr_loss=0.4208, over 4116521.26 frames. ], batch size: 109, lr: 2.00e-02, grad_scale: 32.0 2024-09-14 02:29:46,086 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.320e+02 2.688e+02 3.117e+02 5.115e+02, threshold=5.375e+02, percent-clipped=0.0 2024-09-14 02:29:46,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61265.166666666664, ans=0.125 2024-09-14 02:29:56,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=61293.5, ans=0.0 2024-09-14 02:30:43,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=61378.5, ans=0.0 2024-09-14 02:30:58,636 INFO [train.py:1198] (1/2) Epoch 4, batch 2500, loss[loss=0.3749, ctc_loss=0.2796, cr_loss=0.4764, over 21024.00 frames. ], tot_loss[loss=0.3283, ctc_loss=0.2437, cr_loss=0.423, over 4115083.26 frames. ], batch size: 63, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:31:03,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61406.833333333336, ans=0.125 2024-09-14 02:31:33,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61463.5, ans=0.1 2024-09-14 02:32:16,367 INFO [train.py:1198] (1/2) Epoch 4, batch 2550, loss[loss=0.3131, ctc_loss=0.2334, cr_loss=0.3986, over 20976.00 frames. ], tot_loss[loss=0.3275, ctc_loss=0.2429, cr_loss=0.4227, over 4115176.40 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:32:19,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.249e+02 2.600e+02 3.277e+02 5.290e+02, threshold=5.199e+02, percent-clipped=0.0 2024-09-14 02:32:29,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=61548.5, ans=0.2 2024-09-14 02:32:32,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61576.833333333336, ans=0.1 2024-09-14 02:32:38,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=61576.833333333336, ans=0.2 2024-09-14 02:32:47,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=61605.166666666664, ans=0.125 2024-09-14 02:32:51,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61605.166666666664, ans=0.125 2024-09-14 02:33:03,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=61633.5, ans=0.2 2024-09-14 02:33:05,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=61633.5, ans=0.0 2024-09-14 02:33:35,245 INFO [train.py:1198] (1/2) Epoch 4, batch 2600, loss[loss=0.3157, ctc_loss=0.2388, cr_loss=0.3847, over 20981.00 frames. ], tot_loss[loss=0.3277, ctc_loss=0.2432, cr_loss=0.4225, over 4120470.25 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:33:48,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-14 02:33:50,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=61718.5, ans=0.5 2024-09-14 02:34:08,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61746.833333333336, ans=0.125 2024-09-14 02:34:43,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=61803.5, ans=0.125 2024-09-14 02:34:50,824 INFO [train.py:1198] (1/2) Epoch 4, batch 2650, loss[loss=0.3002, ctc_loss=0.2202, cr_loss=0.3997, over 20995.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2427, cr_loss=0.4212, over 4123323.24 frames. ], batch size: 51, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:34:53,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.382e+02 2.697e+02 3.200e+02 4.789e+02, threshold=5.395e+02, percent-clipped=0.0 2024-09-14 02:35:30,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=61888.5, ans=0.125 2024-09-14 02:35:35,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-14 02:36:00,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=61945.166666666664, ans=0.025 2024-09-14 02:36:06,008 INFO [train.py:1198] (1/2) Epoch 4, batch 2700, loss[loss=0.3176, ctc_loss=0.2355, cr_loss=0.4109, over 20354.00 frames. ], tot_loss[loss=0.3278, ctc_loss=0.2434, cr_loss=0.4218, over 4119323.20 frames. ], batch size: 74, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:36:07,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=61973.5, ans=0.0 2024-09-14 02:36:09,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61973.5, ans=0.125 2024-09-14 02:36:16,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61973.5, ans=0.1 2024-09-14 02:37:20,830 INFO [train.py:1198] (1/2) Epoch 4, batch 2750, loss[loss=0.361, ctc_loss=0.2686, cr_loss=0.4617, over 20123.00 frames. ], tot_loss[loss=0.3292, ctc_loss=0.2444, cr_loss=0.4235, over 4107159.93 frames. ], batch size: 80, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:37:21,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-14 02:37:23,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.307e+02 2.523e+02 3.152e+02 4.526e+02, threshold=5.045e+02, percent-clipped=0.0 2024-09-14 02:37:24,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=62115.166666666664, ans=0.125 2024-09-14 02:37:45,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=62143.5, ans=0.125 2024-09-14 02:37:54,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=62171.833333333336, ans=0.04949747468305833 2024-09-14 02:37:54,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=62171.833333333336, ans=0.025 2024-09-14 02:38:10,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-14 02:38:32,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=62228.5, ans=0.125 2024-09-14 02:38:42,513 INFO [train.py:1198] (1/2) Epoch 4, batch 2800, loss[loss=0.2779, ctc_loss=0.2041, cr_loss=0.3688, over 20944.00 frames. ], tot_loss[loss=0.3281, ctc_loss=0.2437, cr_loss=0.4219, over 4104640.98 frames. ], batch size: 49, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:39:25,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-09-14 02:39:53,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-14 02:39:57,670 INFO [train.py:1198] (1/2) Epoch 4, batch 2850, loss[loss=0.3393, ctc_loss=0.2539, cr_loss=0.4274, over 19610.00 frames. ], tot_loss[loss=0.329, ctc_loss=0.2445, cr_loss=0.4225, over 4094678.61 frames. ], batch size: 90, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:40:00,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.333e+02 2.637e+02 3.139e+02 4.944e+02, threshold=5.275e+02, percent-clipped=0.0 2024-09-14 02:40:05,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=62398.5, ans=0.025 2024-09-14 02:40:08,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=62398.5, ans=0.125 2024-09-14 02:41:01,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=22.5 2024-09-14 02:41:07,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2024-09-14 02:41:13,075 INFO [train.py:1198] (1/2) Epoch 4, batch 2900, loss[loss=0.301, ctc_loss=0.2194, cr_loss=0.4082, over 21068.00 frames. ], tot_loss[loss=0.3294, ctc_loss=0.2448, cr_loss=0.4227, over 4098559.99 frames. ], batch size: 53, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:41:16,564 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:41:36,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-09-14 02:41:37,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=62568.5, ans=0.025 2024-09-14 02:41:43,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=62596.833333333336, ans=0.125 2024-09-14 02:42:17,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.11 vs. limit=10.0 2024-09-14 02:42:22,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-09-14 02:42:27,693 INFO [train.py:1198] (1/2) Epoch 4, batch 2950, loss[loss=0.3292, ctc_loss=0.2406, cr_loss=0.4431, over 20973.00 frames. ], tot_loss[loss=0.33, ctc_loss=0.2455, cr_loss=0.4229, over 4096269.63 frames. ], batch size: 64, lr: 1.97e-02, grad_scale: 32.0 2024-09-14 02:42:30,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.239e+02 2.461e+02 2.747e+02 6.552e+02, threshold=4.921e+02, percent-clipped=1.0 2024-09-14 02:43:02,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=62738.5, ans=0.2 2024-09-14 02:43:16,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=62766.833333333336, ans=0.1 2024-09-14 02:43:17,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=62766.833333333336, ans=0.0 2024-09-14 02:43:45,884 INFO [train.py:1198] (1/2) Epoch 4, batch 3000, loss[loss=0.3467, ctc_loss=0.2623, cr_loss=0.422, over 20857.00 frames. ], tot_loss[loss=0.3311, ctc_loss=0.2463, cr_loss=0.4239, over 4084723.67 frames. ], batch size: 65, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:43:45,885 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 02:44:05,796 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.08117, ctc_loss=0.08117, cr_loss=9.426e-15, over 944034.00 frames. 2024-09-14 02:44:05,797 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 02:44:08,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-14 02:44:54,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=62908.5, ans=10.0 2024-09-14 02:45:19,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=62965.166666666664, ans=0.1 2024-09-14 02:45:21,063 INFO [train.py:1198] (1/2) Epoch 4, batch 3050, loss[loss=0.3879, ctc_loss=0.2936, cr_loss=0.4712, over 19933.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2463, cr_loss=0.4237, over 4083663.11 frames. ], batch size: 80, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:45:25,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.274e+02 2.525e+02 2.965e+02 5.292e+02, threshold=5.050e+02, percent-clipped=2.0 2024-09-14 02:45:33,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=62965.166666666664, ans=0.125 2024-09-14 02:46:28,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63078.5, ans=0.1 2024-09-14 02:46:36,839 INFO [train.py:1198] (1/2) Epoch 4, batch 3100, loss[loss=0.3385, ctc_loss=0.2545, cr_loss=0.4198, over 21027.00 frames. ], tot_loss[loss=0.3302, ctc_loss=0.2456, cr_loss=0.4232, over 4075711.52 frames. ], batch size: 62, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:46:38,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=63106.833333333336, ans=0.2 2024-09-14 02:47:13,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-09-14 02:47:28,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=63191.833333333336, ans=0.125 2024-09-14 02:47:29,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-14 02:47:34,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-14 02:47:51,400 INFO [train.py:1198] (1/2) Epoch 4, batch 3150, loss[loss=0.3061, ctc_loss=0.2261, cr_loss=0.4003, over 20962.00 frames. ], tot_loss[loss=0.3282, ctc_loss=0.2439, cr_loss=0.4213, over 4083179.23 frames. ], batch size: 49, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:47:55,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.359e+02 2.728e+02 3.358e+02 9.340e+02, threshold=5.456e+02, percent-clipped=4.0 2024-09-14 02:49:10,234 INFO [train.py:1198] (1/2) Epoch 4, batch 3200, loss[loss=0.3283, ctc_loss=0.242, cr_loss=0.4313, over 20789.00 frames. ], tot_loss[loss=0.3275, ctc_loss=0.2432, cr_loss=0.4213, over 4089060.87 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:49:28,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=63418.5, ans=0.125 2024-09-14 02:49:57,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=63475.166666666664, ans=0.125 2024-09-14 02:50:11,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=63475.166666666664, ans=0.0 2024-09-14 02:50:28,860 INFO [train.py:1198] (1/2) Epoch 4, batch 3250, loss[loss=0.294, ctc_loss=0.2126, cr_loss=0.4067, over 19956.00 frames. ], tot_loss[loss=0.3263, ctc_loss=0.2421, cr_loss=0.4208, over 4102680.31 frames. ], batch size: 44, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:50:33,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.173e+02 2.516e+02 3.106e+02 4.227e+02, threshold=5.031e+02, percent-clipped=0.0 2024-09-14 02:50:43,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-14 02:50:46,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-09-14 02:50:54,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=63560.166666666664, ans=0.125 2024-09-14 02:51:26,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=63616.833333333336, ans=0.0 2024-09-14 02:51:33,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=63645.166666666664, ans=0.0 2024-09-14 02:51:43,752 INFO [train.py:1198] (1/2) Epoch 4, batch 3300, loss[loss=0.3375, ctc_loss=0.2494, cr_loss=0.44, over 20827.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2438, cr_loss=0.4236, over 4098971.15 frames. ], batch size: 59, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:52:00,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.00 vs. limit=22.5 2024-09-14 02:52:05,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:52:28,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=63758.5, ans=0.125 2024-09-14 02:52:33,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=63758.5, ans=0.2 2024-09-14 02:52:43,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=63786.833333333336, ans=0.125 2024-09-14 02:52:58,359 INFO [train.py:1198] (1/2) Epoch 4, batch 3350, loss[loss=0.3251, ctc_loss=0.2448, cr_loss=0.4013, over 20883.00 frames. ], tot_loss[loss=0.3291, ctc_loss=0.2444, cr_loss=0.4236, over 4093216.75 frames. ], batch size: 65, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:53:02,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.280e+02 2.587e+02 3.253e+02 5.558e+02, threshold=5.175e+02, percent-clipped=1.0 2024-09-14 02:53:14,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=63843.5, ans=0.0 2024-09-14 02:53:34,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-09-14 02:53:51,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=63900.166666666664, ans=0.0 2024-09-14 02:53:54,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=63900.166666666664, ans=0.025 2024-09-14 02:54:07,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=63928.5, ans=0.125 2024-09-14 02:54:14,598 INFO [train.py:1198] (1/2) Epoch 4, batch 3400, loss[loss=0.3412, ctc_loss=0.2534, cr_loss=0.4388, over 20970.00 frames. ], tot_loss[loss=0.328, ctc_loss=0.2435, cr_loss=0.4228, over 4101740.63 frames. ], batch size: 64, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:54:37,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=63985.166666666664, ans=0.025 2024-09-14 02:55:26,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=64070.166666666664, ans=0.125 2024-09-14 02:55:37,016 INFO [train.py:1198] (1/2) Epoch 4, batch 3450, loss[loss=0.3107, ctc_loss=0.2289, cr_loss=0.4086, over 20994.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2428, cr_loss=0.4213, over 4098378.96 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:55:37,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64098.5, ans=0.0 2024-09-14 02:55:41,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.364e+02 2.636e+02 3.041e+02 4.515e+02, threshold=5.273e+02, percent-clipped=1.0 2024-09-14 02:56:37,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64211.833333333336, ans=0.0 2024-09-14 02:56:47,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 02:56:53,132 INFO [train.py:1198] (1/2) Epoch 4, batch 3500, loss[loss=0.3241, ctc_loss=0.2413, cr_loss=0.414, over 21037.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2441, cr_loss=0.4222, over 4089297.91 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:56:57,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=64240.166666666664, ans=0.125 2024-09-14 02:57:02,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=64240.166666666664, ans=0.0 2024-09-14 02:57:49,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=64325.166666666664, ans=0.025 2024-09-14 02:58:08,587 INFO [train.py:1198] (1/2) Epoch 4, batch 3550, loss[loss=0.2973, ctc_loss=0.2239, cr_loss=0.3671, over 19922.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2433, cr_loss=0.4217, over 4085011.18 frames. ], batch size: 44, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:58:08,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=64381.833333333336, ans=0.125 2024-09-14 02:58:13,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.378e+02 2.692e+02 3.412e+02 6.201e+02, threshold=5.385e+02, percent-clipped=4.0 2024-09-14 02:58:26,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=64410.166666666664, ans=0.0 2024-09-14 02:59:18,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=64495.166666666664, ans=0.125 2024-09-14 02:59:23,723 INFO [train.py:1198] (1/2) Epoch 4, batch 3600, loss[loss=0.2661, ctc_loss=0.1907, cr_loss=0.3771, over 19948.00 frames. ], tot_loss[loss=0.3299, ctc_loss=0.2451, cr_loss=0.4238, over 4071462.31 frames. ], batch size: 44, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:59:30,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=64523.5, ans=0.0 2024-09-14 02:59:42,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=64551.833333333336, ans=0.125 2024-09-14 02:59:51,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=64551.833333333336, ans=0.1 2024-09-14 03:00:01,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64580.166666666664, ans=0.125 2024-09-14 03:00:10,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=64608.5, ans=0.125 2024-09-14 03:00:42,208 INFO [train.py:1198] (1/2) Epoch 4, batch 3650, loss[loss=0.3507, ctc_loss=0.2618, cr_loss=0.4446, over 20922.00 frames. ], tot_loss[loss=0.3286, ctc_loss=0.244, cr_loss=0.4231, over 4076180.23 frames. ], batch size: 60, lr: 1.95e-02, grad_scale: 16.0 2024-09-14 03:00:48,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.333e+02 2.690e+02 3.120e+02 5.506e+02, threshold=5.380e+02, percent-clipped=1.0 2024-09-14 03:01:02,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=64693.5, ans=0.125 2024-09-14 03:01:06,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=64693.5, ans=0.0 2024-09-14 03:01:18,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=64721.833333333336, ans=0.0 2024-09-14 03:01:28,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-14 03:01:33,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64750.166666666664, ans=0.1 2024-09-14 03:01:56,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=64778.5, ans=0.2 2024-09-14 03:02:00,513 INFO [train.py:1198] (1/2) Epoch 4, batch 3700, loss[loss=0.3011, ctc_loss=0.218, cr_loss=0.4155, over 21067.00 frames. ], tot_loss[loss=0.3288, ctc_loss=0.244, cr_loss=0.4242, over 4091298.17 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 16.0 2024-09-14 03:02:08,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=64806.833333333336, ans=0.2 2024-09-14 03:02:31,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64863.5, ans=0.1 2024-09-14 03:02:40,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=64863.5, ans=0.125 2024-09-14 03:03:02,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=64920.166666666664, ans=0.0 2024-09-14 03:03:16,173 INFO [train.py:1198] (1/2) Epoch 4, batch 3750, loss[loss=0.3521, ctc_loss=0.2642, cr_loss=0.4397, over 19504.00 frames. ], tot_loss[loss=0.3291, ctc_loss=0.2441, cr_loss=0.4248, over 4101204.11 frames. ], batch size: 90, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:03:20,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=64948.5, ans=0.125 2024-09-14 03:03:22,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.387e+02 2.649e+02 3.255e+02 5.060e+02, threshold=5.297e+02, percent-clipped=0.0 2024-09-14 03:03:31,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=64976.833333333336, ans=0.0 2024-09-14 03:03:44,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-14 03:04:23,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65061.833333333336, ans=0.125 2024-09-14 03:04:32,218 INFO [train.py:1198] (1/2) Epoch 4, batch 3800, loss[loss=0.3295, ctc_loss=0.2426, cr_loss=0.4347, over 20991.00 frames. ], tot_loss[loss=0.3279, ctc_loss=0.2432, cr_loss=0.4234, over 4103392.84 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:04:40,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=65090.166666666664, ans=0.0 2024-09-14 03:04:44,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65090.166666666664, ans=0.1 2024-09-14 03:04:47,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=65118.5, ans=0.125 2024-09-14 03:04:52,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=65118.5, ans=0.125 2024-09-14 03:05:08,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=65146.833333333336, ans=0.025 2024-09-14 03:05:35,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=65203.5, ans=0.025 2024-09-14 03:05:47,570 INFO [train.py:1198] (1/2) Epoch 4, batch 3850, loss[loss=0.3475, ctc_loss=0.2609, cr_loss=0.4334, over 20962.00 frames. ], tot_loss[loss=0.3263, ctc_loss=0.2419, cr_loss=0.4222, over 4106790.45 frames. ], batch size: 64, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:05:56,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.256e+02 2.420e+02 2.840e+02 5.630e+02, threshold=4.839e+02, percent-clipped=1.0 2024-09-14 03:06:32,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=15.0 2024-09-14 03:06:42,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=65316.833333333336, ans=0.0 2024-09-14 03:07:09,415 INFO [train.py:1198] (1/2) Epoch 4, batch 3900, loss[loss=0.3237, ctc_loss=0.2368, cr_loss=0.4346, over 20873.00 frames. ], tot_loss[loss=0.3264, ctc_loss=0.2421, cr_loss=0.4217, over 4093430.51 frames. ], batch size: 54, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:07:12,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=65373.5, ans=0.125 2024-09-14 03:07:36,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=65401.833333333336, ans=0.125 2024-09-14 03:08:01,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=65458.5, ans=10.0 2024-09-14 03:08:12,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65486.833333333336, ans=0.125 2024-09-14 03:08:24,127 INFO [train.py:1198] (1/2) Epoch 4, batch 3950, loss[loss=0.3162, ctc_loss=0.2329, cr_loss=0.4167, over 21067.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2429, cr_loss=0.4233, over 4087889.57 frames. ], batch size: 59, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:08:30,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.393e+02 2.693e+02 3.193e+02 4.940e+02, threshold=5.386e+02, percent-clipped=2.0 2024-09-14 03:08:37,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=65543.5, ans=0.0 2024-09-14 03:09:20,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-14 03:09:39,025 INFO [train.py:1198] (1/2) Epoch 4, batch 4000, loss[loss=0.306, ctc_loss=0.2254, cr_loss=0.4034, over 21078.00 frames. ], tot_loss[loss=0.328, ctc_loss=0.2432, cr_loss=0.4241, over 4087814.03 frames. ], batch size: 59, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:10:13,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65713.5, ans=0.125 2024-09-14 03:10:21,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=65713.5, ans=0.125 2024-09-14 03:10:27,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=65741.83333333333, ans=0.0 2024-09-14 03:10:52,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=65798.5, ans=0.125 2024-09-14 03:10:53,975 INFO [train.py:1198] (1/2) Epoch 4, batch 4050, loss[loss=0.3508, ctc_loss=0.2572, cr_loss=0.4677, over 20633.00 frames. ], tot_loss[loss=0.3271, ctc_loss=0.2424, cr_loss=0.4236, over 4103917.49 frames. ], batch size: 71, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:10:59,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.255e+02 2.573e+02 3.023e+02 5.009e+02, threshold=5.147e+02, percent-clipped=0.0 2024-09-14 03:11:28,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-14 03:11:31,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=65855.16666666667, ans=0.2 2024-09-14 03:11:46,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=65883.5, ans=0.0 2024-09-14 03:12:09,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65911.83333333333, ans=0.1 2024-09-14 03:12:10,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=65940.16666666667, ans=0.125 2024-09-14 03:12:11,781 INFO [train.py:1198] (1/2) Epoch 4, batch 4100, loss[loss=0.3398, ctc_loss=0.2534, cr_loss=0.4318, over 20298.00 frames. ], tot_loss[loss=0.3274, ctc_loss=0.2427, cr_loss=0.4239, over 4105159.20 frames. ], batch size: 74, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:12:42,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=65968.5, ans=0.125 2024-09-14 03:13:30,343 INFO [train.py:1198] (1/2) Epoch 4, batch 4150, loss[loss=0.3232, ctc_loss=0.24, cr_loss=0.4162, over 20889.00 frames. ], tot_loss[loss=0.3278, ctc_loss=0.2429, cr_loss=0.4242, over 4098147.95 frames. ], batch size: 54, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:13:36,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.323e+02 2.608e+02 3.200e+02 5.078e+02, threshold=5.216e+02, percent-clipped=0.0 2024-09-14 03:13:54,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66110.16666666667, ans=0.125 2024-09-14 03:14:45,287 INFO [train.py:1198] (1/2) Epoch 4, batch 4200, loss[loss=0.3737, ctc_loss=0.2777, cr_loss=0.4799, over 20708.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2424, cr_loss=0.423, over 4103882.67 frames. ], batch size: 71, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:15:00,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66251.83333333333, ans=0.125 2024-09-14 03:15:05,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-09-14 03:15:41,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-09-14 03:15:44,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=66336.83333333333, ans=0.125 2024-09-14 03:15:48,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=66336.83333333333, ans=0.125 2024-09-14 03:16:00,378 INFO [train.py:1198] (1/2) Epoch 4, batch 4250, loss[loss=0.3303, ctc_loss=0.2469, cr_loss=0.4172, over 20768.00 frames. ], tot_loss[loss=0.3277, ctc_loss=0.2431, cr_loss=0.423, over 4090401.90 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:16:06,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.390e+02 2.675e+02 3.034e+02 4.916e+02, threshold=5.351e+02, percent-clipped=0.0 2024-09-14 03:16:08,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=66365.16666666667, ans=0.125 2024-09-14 03:16:20,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=66393.5, ans=0.0 2024-09-14 03:16:29,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=66421.83333333333, ans=0.2 2024-09-14 03:16:38,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66421.83333333333, ans=0.1 2024-09-14 03:16:53,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=12.0 2024-09-14 03:17:07,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=66478.5, ans=0.2 2024-09-14 03:17:16,199 INFO [train.py:1198] (1/2) Epoch 4, batch 4300, loss[loss=0.3292, ctc_loss=0.2466, cr_loss=0.413, over 20884.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.243, cr_loss=0.4229, over 4078940.20 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 16.0 2024-09-14 03:17:37,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=66535.16666666667, ans=0.125 2024-09-14 03:17:40,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=66535.16666666667, ans=0.025 2024-09-14 03:17:59,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-09-14 03:18:09,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=66591.83333333333, ans=0.07 2024-09-14 03:18:28,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=66620.16666666667, ans=0.025 2024-09-14 03:18:30,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=66620.16666666667, ans=0.125 2024-09-14 03:18:30,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=22.5 2024-09-14 03:18:33,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66620.16666666667, ans=0.125 2024-09-14 03:18:37,294 INFO [train.py:1198] (1/2) Epoch 4, batch 4350, loss[loss=0.3016, ctc_loss=0.2183, cr_loss=0.4162, over 20949.00 frames. ], tot_loss[loss=0.3261, ctc_loss=0.2416, cr_loss=0.4224, over 4080495.31 frames. ], batch size: 55, lr: 1.92e-02, grad_scale: 16.0 2024-09-14 03:18:44,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.281e+02 2.581e+02 3.059e+02 5.942e+02, threshold=5.161e+02, percent-clipped=2.0 2024-09-14 03:18:54,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66676.83333333333, ans=0.125 2024-09-14 03:19:18,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=66705.16666666667, ans=0.125 2024-09-14 03:19:25,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=66733.5, ans=0.025 2024-09-14 03:19:28,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=66733.5, ans=0.125 2024-09-14 03:19:50,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-09-14 03:19:51,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-14 03:19:52,474 INFO [train.py:1198] (1/2) Epoch 4, batch 4400, loss[loss=0.31, ctc_loss=0.2262, cr_loss=0.419, over 20926.00 frames. ], tot_loss[loss=0.3245, ctc_loss=0.2404, cr_loss=0.4209, over 4084063.04 frames. ], batch size: 60, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:20:26,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=66846.83333333333, ans=0.125 2024-09-14 03:20:29,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=66846.83333333333, ans=0.125 2024-09-14 03:21:08,159 INFO [train.py:1198] (1/2) Epoch 4, batch 4450, loss[loss=0.3195, ctc_loss=0.2389, cr_loss=0.4029, over 20985.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2393, cr_loss=0.4196, over 4078157.12 frames. ], batch size: 55, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:21:15,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.305e+02 2.603e+02 3.007e+02 4.983e+02, threshold=5.207e+02, percent-clipped=0.0 2024-09-14 03:21:27,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=66960.16666666667, ans=0.025 2024-09-14 03:21:35,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=66960.16666666667, ans=0.0 2024-09-14 03:21:46,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=66988.5, ans=0.125 2024-09-14 03:22:23,545 INFO [train.py:1198] (1/2) Epoch 4, batch 4500, loss[loss=0.2979, ctc_loss=0.2156, cr_loss=0.4119, over 20890.00 frames. ], tot_loss[loss=0.3247, ctc_loss=0.2406, cr_loss=0.4204, over 4073209.89 frames. ], batch size: 54, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:22:54,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=67130.16666666667, ans=0.125 2024-09-14 03:23:23,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=67158.5, ans=0.025 2024-09-14 03:23:25,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=67186.83333333333, ans=0.04949747468305833 2024-09-14 03:23:40,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67215.16666666667, ans=0.1 2024-09-14 03:23:41,602 INFO [train.py:1198] (1/2) Epoch 4, batch 4550, loss[loss=0.3135, ctc_loss=0.2295, cr_loss=0.4203, over 20663.00 frames. ], tot_loss[loss=0.3264, ctc_loss=0.242, cr_loss=0.422, over 4078170.85 frames. ], batch size: 68, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:23:44,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=67215.16666666667, ans=0.015 2024-09-14 03:23:49,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.333e+02 2.572e+02 3.102e+02 4.754e+02, threshold=5.145e+02, percent-clipped=0.0 2024-09-14 03:24:47,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-14 03:24:53,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-14 03:24:59,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-14 03:24:59,854 INFO [train.py:1198] (1/2) Epoch 4, batch 4600, loss[loss=0.2943, ctc_loss=0.2207, cr_loss=0.368, over 20909.00 frames. ], tot_loss[loss=0.3239, ctc_loss=0.2399, cr_loss=0.4202, over 4087750.62 frames. ], batch size: 54, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:25:21,612 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:25:25,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=67385.16666666667, ans=0.015 2024-09-14 03:25:30,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=67413.5, ans=0.125 2024-09-14 03:25:32,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=67413.5, ans=0.125 2024-09-14 03:25:35,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-09-14 03:25:42,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67413.5, ans=0.1 2024-09-14 03:25:59,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=67470.16666666667, ans=0.125 2024-09-14 03:26:16,004 INFO [train.py:1198] (1/2) Epoch 4, batch 4650, loss[loss=0.3426, ctc_loss=0.2524, cr_loss=0.4511, over 20071.00 frames. ], tot_loss[loss=0.3225, ctc_loss=0.2388, cr_loss=0.4184, over 4090918.10 frames. ], batch size: 80, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:26:23,629 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.466e+02 2.933e+02 3.344e+02 5.732e+02, threshold=5.866e+02, percent-clipped=1.0 2024-09-14 03:26:34,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=67526.83333333333, ans=0.125 2024-09-14 03:26:56,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67555.16666666667, ans=0.1 2024-09-14 03:27:30,427 INFO [train.py:1198] (1/2) Epoch 4, batch 4700, loss[loss=0.3674, ctc_loss=0.2784, cr_loss=0.4452, over 20149.00 frames. ], tot_loss[loss=0.3244, ctc_loss=0.2403, cr_loss=0.4205, over 4083003.37 frames. ], batch size: 80, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:27:37,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=67640.16666666667, ans=0.95 2024-09-14 03:27:44,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=67668.5, ans=0.0 2024-09-14 03:28:32,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67753.5, ans=0.1 2024-09-14 03:28:47,852 INFO [train.py:1198] (1/2) Epoch 4, batch 4750, loss[loss=0.3055, ctc_loss=0.2232, cr_loss=0.4111, over 20805.00 frames. ], tot_loss[loss=0.3262, ctc_loss=0.2418, cr_loss=0.4223, over 4084805.81 frames. ], batch size: 53, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:28:55,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.253e+02 2.459e+02 2.716e+02 5.028e+02, threshold=4.917e+02, percent-clipped=0.0 2024-09-14 03:29:36,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=67866.83333333333, ans=0.1 2024-09-14 03:29:50,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=67895.16666666667, ans=0.0 2024-09-14 03:30:06,651 INFO [train.py:1198] (1/2) Epoch 4, batch 4800, loss[loss=0.3454, ctc_loss=0.2554, cr_loss=0.4498, over 20849.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2393, cr_loss=0.4202, over 4104193.20 frames. ], batch size: 65, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:30:12,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-09-14 03:30:44,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-14 03:30:56,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68008.5, ans=0.1 2024-09-14 03:31:22,578 INFO [train.py:1198] (1/2) Epoch 4, batch 4850, loss[loss=0.2796, ctc_loss=0.2053, cr_loss=0.3712, over 20409.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2392, cr_loss=0.4203, over 4106993.28 frames. ], batch size: 45, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:31:22,927 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:31:30,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.309e+02 2.649e+02 3.134e+02 7.070e+02, threshold=5.298e+02, percent-clipped=3.0 2024-09-14 03:31:49,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-14 03:32:02,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=68121.83333333333, ans=0.2 2024-09-14 03:32:09,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=68150.16666666667, ans=0.125 2024-09-14 03:32:12,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68150.16666666667, ans=0.125 2024-09-14 03:32:27,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=68178.5, ans=0.125 2024-09-14 03:32:37,457 INFO [train.py:1198] (1/2) Epoch 4, batch 4900, loss[loss=0.345, ctc_loss=0.2556, cr_loss=0.4469, over 20876.00 frames. ], tot_loss[loss=0.3242, ctc_loss=0.24, cr_loss=0.4213, over 4101631.27 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:32:51,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.40 vs. limit=10.0 2024-09-14 03:33:03,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=68235.16666666667, ans=0.0 2024-09-14 03:33:51,984 INFO [train.py:1198] (1/2) Epoch 4, batch 4950, loss[loss=0.3472, ctc_loss=0.2526, cr_loss=0.4728, over 20606.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2391, cr_loss=0.4213, over 4110191.78 frames. ], batch size: 66, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:33:54,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=12.0 2024-09-14 03:33:59,335 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.275e+02 2.605e+02 3.048e+02 5.653e+02, threshold=5.210e+02, percent-clipped=2.0 2024-09-14 03:33:59,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=68348.5, ans=0.025 2024-09-14 03:34:31,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=68405.16666666667, ans=0.04949747468305833 2024-09-14 03:34:33,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=68405.16666666667, ans=0.04949747468305833 2024-09-14 03:34:51,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=68461.83333333333, ans=0.0 2024-09-14 03:35:04,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-14 03:35:06,079 INFO [train.py:1198] (1/2) Epoch 4, batch 5000, loss[loss=0.3339, ctc_loss=0.2448, cr_loss=0.4457, over 20889.00 frames. ], tot_loss[loss=0.3238, ctc_loss=0.2394, cr_loss=0.4217, over 4106725.58 frames. ], batch size: 54, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:35:08,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=68490.16666666667, ans=0.95 2024-09-14 03:35:18,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68490.16666666667, ans=0.125 2024-09-14 03:35:29,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=68518.5, ans=0.1 2024-09-14 03:35:33,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=68518.5, ans=0.2 2024-09-14 03:35:41,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=68546.83333333333, ans=0.125 2024-09-14 03:35:59,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=22.5 2024-09-14 03:36:09,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-14 03:36:16,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=68603.5, ans=0.2 2024-09-14 03:36:23,735 INFO [train.py:1198] (1/2) Epoch 4, batch 5050, loss[loss=0.4066, ctc_loss=0.3095, cr_loss=0.4852, over 18268.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2392, cr_loss=0.4207, over 4098875.53 frames. ], batch size: 108, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:36:31,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.248e+02 2.535e+02 2.857e+02 5.181e+02, threshold=5.071e+02, percent-clipped=0.0 2024-09-14 03:36:33,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=68631.83333333333, ans=0.125 2024-09-14 03:37:28,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=68745.16666666667, ans=0.0 2024-09-14 03:37:40,018 INFO [train.py:1198] (1/2) Epoch 4, batch 5100, loss[loss=0.3531, ctc_loss=0.2707, cr_loss=0.4118, over 21075.00 frames. ], tot_loss[loss=0.324, ctc_loss=0.2398, cr_loss=0.4208, over 4084851.48 frames. ], batch size: 62, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:37:47,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=68773.5, ans=0.125 2024-09-14 03:37:53,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68801.83333333333, ans=0.125 2024-09-14 03:37:58,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68801.83333333333, ans=0.1 2024-09-14 03:38:44,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68886.83333333333, ans=0.0 2024-09-14 03:38:54,926 INFO [train.py:1198] (1/2) Epoch 4, batch 5150, loss[loss=0.3147, ctc_loss=0.2298, cr_loss=0.4245, over 20980.00 frames. ], tot_loss[loss=0.3241, ctc_loss=0.2399, cr_loss=0.4208, over 4075117.29 frames. ], batch size: 55, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:39:00,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2024-09-14 03:39:02,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.227e+02 2.530e+02 2.864e+02 5.813e+02, threshold=5.059e+02, percent-clipped=5.0 2024-09-14 03:39:11,499 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:39:31,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=12.0 2024-09-14 03:39:38,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69000.16666666667, ans=0.125 2024-09-14 03:40:09,581 INFO [train.py:1198] (1/2) Epoch 4, batch 5200, loss[loss=0.3544, ctc_loss=0.2622, cr_loss=0.4608, over 20688.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2392, cr_loss=0.4203, over 4085784.60 frames. ], batch size: 66, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:40:20,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=69056.83333333333, ans=0.125 2024-09-14 03:40:30,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=69085.16666666667, ans=0.015 2024-09-14 03:40:41,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-14 03:40:57,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=69141.83333333333, ans=0.2 2024-09-14 03:41:00,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=69141.83333333333, ans=0.125 2024-09-14 03:41:13,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=69170.16666666667, ans=0.125 2024-09-14 03:41:23,611 INFO [train.py:1198] (1/2) Epoch 4, batch 5250, loss[loss=0.3136, ctc_loss=0.2318, cr_loss=0.409, over 20840.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2384, cr_loss=0.4192, over 4078008.82 frames. ], batch size: 65, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:41:30,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.230e+02 2.655e+02 3.088e+02 4.121e+02, threshold=5.310e+02, percent-clipped=0.0 2024-09-14 03:41:44,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=12.0 2024-09-14 03:41:46,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=69226.83333333333, ans=0.125 2024-09-14 03:42:13,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=69283.5, ans=0.125 2024-09-14 03:42:37,936 INFO [train.py:1198] (1/2) Epoch 4, batch 5300, loss[loss=0.3471, ctc_loss=0.2499, cr_loss=0.4862, over 20276.00 frames. ], tot_loss[loss=0.3236, ctc_loss=0.2393, cr_loss=0.4214, over 4088066.86 frames. ], batch size: 74, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:42:46,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=69340.16666666667, ans=0.0 2024-09-14 03:43:31,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:43:52,046 INFO [train.py:1198] (1/2) Epoch 4, batch 5350, loss[loss=0.293, ctc_loss=0.2146, cr_loss=0.3921, over 21041.00 frames. ], tot_loss[loss=0.3219, ctc_loss=0.2379, cr_loss=0.42, over 4091148.18 frames. ], batch size: 56, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:43:59,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.320e+02 2.675e+02 3.518e+02 6.585e+02, threshold=5.349e+02, percent-clipped=4.0 2024-09-14 03:44:13,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=69510.16666666667, ans=0.125 2024-09-14 03:45:06,747 INFO [train.py:1198] (1/2) Epoch 4, batch 5400, loss[loss=0.3162, ctc_loss=0.2313, cr_loss=0.4245, over 20885.00 frames. ], tot_loss[loss=0.3219, ctc_loss=0.2377, cr_loss=0.4207, over 4098034.07 frames. ], batch size: 54, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:46:02,674 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:46:08,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69736.83333333333, ans=0.1 2024-09-14 03:46:10,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=69736.83333333333, ans=0.125 2024-09-14 03:46:19,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-09-14 03:46:23,200 INFO [train.py:1198] (1/2) Epoch 4, batch 5450, loss[loss=0.3162, ctc_loss=0.2359, cr_loss=0.4014, over 21033.00 frames. ], tot_loss[loss=0.3214, ctc_loss=0.2374, cr_loss=0.4199, over 4100653.28 frames. ], batch size: 63, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:46:30,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.222e+02 2.469e+02 3.026e+02 4.253e+02, threshold=4.938e+02, percent-clipped=0.0 2024-09-14 03:46:30,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=69765.16666666667, ans=0.015 2024-09-14 03:46:49,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=22.5 2024-09-14 03:47:15,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=69850.16666666667, ans=0.0 2024-09-14 03:47:19,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-09-14 03:47:31,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69878.5, ans=0.1 2024-09-14 03:47:32,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-09-14 03:47:39,831 INFO [train.py:1198] (1/2) Epoch 4, batch 5500, loss[loss=0.2828, ctc_loss=0.2062, cr_loss=0.3831, over 20997.00 frames. ], tot_loss[loss=0.3214, ctc_loss=0.2374, cr_loss=0.42, over 4094866.00 frames. ], batch size: 52, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:47:41,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=69906.83333333333, ans=0.035 2024-09-14 03:48:19,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=69963.5, ans=0.125 2024-09-14 03:48:25,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=69991.83333333333, ans=0.125 2024-09-14 03:48:39,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-09-14 03:48:48,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=70020.16666666667, ans=0.2 2024-09-14 03:48:53,826 INFO [train.py:1198] (1/2) Epoch 4, batch 5550, loss[loss=0.3734, ctc_loss=0.283, cr_loss=0.4518, over 21008.00 frames. ], tot_loss[loss=0.323, ctc_loss=0.2387, cr_loss=0.4213, over 4081503.21 frames. ], batch size: 63, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:48:55,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70048.5, ans=0.125 2024-09-14 03:49:01,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.248e+02 2.611e+02 2.984e+02 5.188e+02, threshold=5.222e+02, percent-clipped=1.0 2024-09-14 03:50:06,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70190.16666666667, ans=0.1 2024-09-14 03:50:07,827 INFO [train.py:1198] (1/2) Epoch 4, batch 5600, loss[loss=0.3403, ctc_loss=0.2509, cr_loss=0.4472, over 21092.00 frames. ], tot_loss[loss=0.3227, ctc_loss=0.2384, cr_loss=0.4212, over 4097194.22 frames. ], batch size: 59, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:50:20,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=70218.5, ans=0.125 2024-09-14 03:50:26,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=70218.5, ans=0.125 2024-09-14 03:50:41,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=70246.83333333333, ans=10.0 2024-09-14 03:50:51,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70275.16666666667, ans=0.125 2024-09-14 03:51:21,163 INFO [train.py:1198] (1/2) Epoch 4, batch 5650, loss[loss=0.3424, ctc_loss=0.2563, cr_loss=0.4306, over 20840.00 frames. ], tot_loss[loss=0.3231, ctc_loss=0.2389, cr_loss=0.4212, over 4090874.03 frames. ], batch size: 65, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:51:28,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.235e+02 2.459e+02 3.074e+02 5.356e+02, threshold=4.919e+02, percent-clipped=2.0 2024-09-14 03:51:37,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-14 03:52:15,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=70416.83333333333, ans=0.0 2024-09-14 03:52:34,659 INFO [train.py:1198] (1/2) Epoch 4, batch 5700, loss[loss=0.3725, ctc_loss=0.2768, cr_loss=0.4787, over 21041.00 frames. ], tot_loss[loss=0.3229, ctc_loss=0.2387, cr_loss=0.4211, over 4086806.50 frames. ], batch size: 62, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:52:36,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=70473.5, ans=0.125 2024-09-14 03:52:47,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-14 03:52:50,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2024-09-14 03:53:00,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-14 03:53:19,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=70558.5, ans=0.125 2024-09-14 03:53:49,036 INFO [train.py:1198] (1/2) Epoch 4, batch 5750, loss[loss=0.3132, ctc_loss=0.2271, cr_loss=0.4304, over 20886.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.238, cr_loss=0.4214, over 4098312.62 frames. ], batch size: 54, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:53:56,510 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.235e+02 2.475e+02 2.905e+02 4.217e+02, threshold=4.950e+02, percent-clipped=0.0 2024-09-14 03:54:14,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=70643.5, ans=0.0 2024-09-14 03:54:46,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-14 03:54:55,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=70728.5, ans=0.0 2024-09-14 03:55:05,364 INFO [train.py:1198] (1/2) Epoch 4, batch 5800, loss[loss=0.3666, ctc_loss=0.2751, cr_loss=0.458, over 20330.00 frames. ], tot_loss[loss=0.3211, ctc_loss=0.237, cr_loss=0.4202, over 4104405.81 frames. ], batch size: 74, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:55:37,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70813.5, ans=0.1 2024-09-14 03:56:10,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=70870.16666666667, ans=0.035 2024-09-14 03:56:16,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.78 vs. limit=15.0 2024-09-14 03:56:20,722 INFO [train.py:1198] (1/2) Epoch 4, batch 5850, loss[loss=0.3269, ctc_loss=0.2399, cr_loss=0.4354, over 21044.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2379, cr_loss=0.4212, over 4099554.22 frames. ], batch size: 56, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:56:28,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.290e+02 2.515e+02 2.921e+02 4.898e+02, threshold=5.030e+02, percent-clipped=0.0 2024-09-14 03:56:31,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=70898.5, ans=0.0 2024-09-14 03:56:46,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=70926.83333333333, ans=0.05 2024-09-14 03:56:55,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70955.16666666667, ans=0.1 2024-09-14 03:57:12,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-14 03:57:28,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71011.83333333333, ans=0.125 2024-09-14 03:57:31,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=71011.83333333333, ans=0.0 2024-09-14 03:57:32,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=71011.83333333333, ans=0.125 2024-09-14 03:57:35,162 INFO [train.py:1198] (1/2) Epoch 4, batch 5900, loss[loss=0.3603, ctc_loss=0.2683, cr_loss=0.4603, over 19417.00 frames. ], tot_loss[loss=0.3212, ctc_loss=0.2372, cr_loss=0.4198, over 4103442.50 frames. ], batch size: 90, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:58:06,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=71096.83333333333, ans=0.0 2024-09-14 03:58:21,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=71125.16666666667, ans=0.2 2024-09-14 03:58:29,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=71125.16666666667, ans=0.0 2024-09-14 03:58:32,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:58:32,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-14 03:58:45,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=71153.5, ans=0.125 2024-09-14 03:58:49,228 INFO [train.py:1198] (1/2) Epoch 4, batch 5950, loss[loss=0.3285, ctc_loss=0.2385, cr_loss=0.4498, over 21044.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2375, cr_loss=0.4201, over 4094892.50 frames. ], batch size: 62, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 03:58:56,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.272e+02 2.604e+02 3.252e+02 5.278e+02, threshold=5.208e+02, percent-clipped=1.0 2024-09-14 03:59:30,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-14 04:00:00,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-14 04:00:03,190 INFO [train.py:1198] (1/2) Epoch 4, batch 6000, loss[loss=0.3895, ctc_loss=0.2981, cr_loss=0.4569, over 13948.00 frames. ], tot_loss[loss=0.3221, ctc_loss=0.2379, cr_loss=0.4205, over 4095237.81 frames. ], batch size: 149, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:00:03,191 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 04:00:29,492 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.07445, ctc_loss=0.07445, cr_loss=9.516e-15, over 944034.00 frames. 2024-09-14 04:00:29,493 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 04:01:20,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=71408.5, ans=0.0 2024-09-14 04:01:38,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=71436.83333333333, ans=0.05 2024-09-14 04:01:44,544 INFO [train.py:1198] (1/2) Epoch 4, batch 6050, loss[loss=0.369, ctc_loss=0.2759, cr_loss=0.4653, over 20065.00 frames. ], tot_loss[loss=0.3227, ctc_loss=0.2384, cr_loss=0.4212, over 4105285.69 frames. ], batch size: 80, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:01:46,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=71465.16666666667, ans=0.0 2024-09-14 04:01:52,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.211e+02 2.724e+02 3.340e+02 6.745e+02, threshold=5.449e+02, percent-clipped=2.0 2024-09-14 04:02:26,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=71521.83333333333, ans=0.2 2024-09-14 04:02:54,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-14 04:03:00,128 INFO [train.py:1198] (1/2) Epoch 4, batch 6100, loss[loss=0.3017, ctc_loss=0.2189, cr_loss=0.4138, over 21070.00 frames. ], tot_loss[loss=0.3232, ctc_loss=0.2388, cr_loss=0.422, over 4092298.09 frames. ], batch size: 53, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:03:03,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=71606.83333333333, ans=0.125 2024-09-14 04:03:12,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=71606.83333333333, ans=0.125 2024-09-14 04:03:25,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71635.16666666667, ans=0.125 2024-09-14 04:03:27,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=71635.16666666667, ans=0.125 2024-09-14 04:03:31,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-14 04:03:45,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-14 04:03:51,959 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:04:15,611 INFO [train.py:1198] (1/2) Epoch 4, batch 6150, loss[loss=0.3304, ctc_loss=0.244, cr_loss=0.4318, over 20902.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2381, cr_loss=0.4204, over 4084590.86 frames. ], batch size: 54, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:04:22,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.314e+02 2.579e+02 3.111e+02 7.091e+02, threshold=5.159e+02, percent-clipped=1.0 2024-09-14 04:04:51,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=71805.16666666667, ans=0.125 2024-09-14 04:04:52,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=71805.16666666667, ans=0.125 2024-09-14 04:05:23,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=22.5 2024-09-14 04:05:25,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71861.83333333333, ans=0.125 2024-09-14 04:05:29,382 INFO [train.py:1198] (1/2) Epoch 4, batch 6200, loss[loss=0.3619, ctc_loss=0.2712, cr_loss=0.4532, over 20362.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2387, cr_loss=0.4194, over 4066671.66 frames. ], batch size: 74, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:05:38,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=71890.16666666667, ans=0.2 2024-09-14 04:05:44,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=71918.5, ans=0.0 2024-09-14 04:05:51,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=71918.5, ans=0.0 2024-09-14 04:05:54,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=71918.5, ans=6.0 2024-09-14 04:06:00,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=71946.83333333333, ans=15.0 2024-09-14 04:06:42,362 INFO [train.py:1198] (1/2) Epoch 4, batch 6250, loss[loss=0.3722, ctc_loss=0.2837, cr_loss=0.4424, over 18384.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.2386, cr_loss=0.4183, over 4041628.75 frames. ], batch size: 108, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:06:49,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.237e+02 2.467e+02 3.015e+02 4.414e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 04:07:02,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=12.0 2024-09-14 04:07:13,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=72088.5, ans=0.125 2024-09-14 04:07:18,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=72088.5, ans=0.2 2024-09-14 04:07:27,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=72116.83333333333, ans=0.125 2024-09-14 04:07:52,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-09-14 04:07:56,922 INFO [train.py:1198] (1/2) Epoch 4, batch 6300, loss[loss=0.2768, ctc_loss=0.2037, cr_loss=0.3654, over 20985.00 frames. ], tot_loss[loss=0.3258, ctc_loss=0.2418, cr_loss=0.4202, over 3992967.45 frames. ], batch size: 52, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:08:07,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72173.5, ans=0.125 2024-09-14 04:08:25,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=72230.16666666667, ans=0.125 2024-09-14 04:08:34,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=72230.16666666667, ans=0.125 2024-09-14 04:08:34,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72230.16666666667, ans=0.125 2024-09-14 04:08:58,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=12.0 2024-09-14 04:09:05,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=72286.83333333333, ans=0.125 2024-09-14 04:09:08,329 INFO [train.py:1198] (1/2) Epoch 4, batch 6350, loss[loss=0.409, ctc_loss=0.3193, cr_loss=0.4487, over 14577.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2439, cr_loss=0.4186, over 3889819.84 frames. ], batch size: 150, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:09:17,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.398e+02 2.903e+02 3.475e+02 4.595e+02, threshold=5.806e+02, percent-clipped=0.0 2024-09-14 04:09:22,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2024-09-14 04:09:33,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=72343.5, ans=0.125 2024-09-14 04:09:36,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-14 04:10:54,883 INFO [train.py:1198] (1/2) Epoch 5, batch 0, loss[loss=0.3531, ctc_loss=0.2654, cr_loss=0.4385, over 20958.00 frames. ], tot_loss[loss=0.3531, ctc_loss=0.2654, cr_loss=0.4385, over 20958.00 frames. ], batch size: 64, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:10:54,883 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 04:11:13,200 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.07931, ctc_loss=0.07931, cr_loss=9.897e-15, over 944034.00 frames. 2024-09-14 04:11:13,201 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 04:11:57,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=72513.5, ans=0.2 2024-09-14 04:12:03,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=72513.5, ans=0.5 2024-09-14 04:12:29,244 INFO [train.py:1198] (1/2) Epoch 5, batch 50, loss[loss=0.3127, ctc_loss=0.2326, cr_loss=0.4002, over 20376.00 frames. ], tot_loss[loss=0.3219, ctc_loss=0.2377, cr_loss=0.421, over 921627.23 frames. ], batch size: 74, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:12:30,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=72570.16666666667, ans=0.025 2024-09-14 04:12:53,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.284e+02 2.523e+02 2.865e+02 3.597e+02, threshold=5.047e+02, percent-clipped=0.0 2024-09-14 04:12:58,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-14 04:13:03,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-14 04:13:22,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72655.16666666667, ans=0.1 2024-09-14 04:13:27,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72655.16666666667, ans=0.125 2024-09-14 04:13:36,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72683.5, ans=0.1 2024-09-14 04:13:45,085 INFO [train.py:1198] (1/2) Epoch 5, batch 100, loss[loss=0.3345, ctc_loss=0.247, cr_loss=0.4375, over 20874.00 frames. ], tot_loss[loss=0.3262, ctc_loss=0.241, cr_loss=0.426, over 1623990.38 frames. ], batch size: 54, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:13:46,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=72711.83333333333, ans=0.035 2024-09-14 04:14:07,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=72740.16666666667, ans=0.2 2024-09-14 04:14:18,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.42 vs. limit=10.0 2024-09-14 04:14:30,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=72796.83333333333, ans=0.0 2024-09-14 04:15:00,152 INFO [train.py:1198] (1/2) Epoch 5, batch 150, loss[loss=0.3853, ctc_loss=0.2898, cr_loss=0.478, over 20863.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2385, cr_loss=0.4241, over 2179479.94 frames. ], batch size: 65, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:15:07,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=72853.5, ans=0.2 2024-09-14 04:15:27,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.192e+02 2.483e+02 2.953e+02 5.023e+02, threshold=4.965e+02, percent-clipped=0.0 2024-09-14 04:15:56,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=72938.5, ans=0.125 2024-09-14 04:16:21,965 INFO [train.py:1198] (1/2) Epoch 5, batch 200, loss[loss=0.3423, ctc_loss=0.2528, cr_loss=0.4474, over 19527.00 frames. ], tot_loss[loss=0.3227, ctc_loss=0.2379, cr_loss=0.4237, over 2600988.80 frames. ], batch size: 90, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:16:33,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-09-14 04:16:46,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=73023.5, ans=0.0 2024-09-14 04:16:50,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-14 04:17:37,729 INFO [train.py:1198] (1/2) Epoch 5, batch 250, loss[loss=0.3349, ctc_loss=0.2514, cr_loss=0.4174, over 19399.00 frames. ], tot_loss[loss=0.3196, ctc_loss=0.2355, cr_loss=0.4205, over 2929950.18 frames. ], batch size: 90, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:18:01,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.225e+02 2.506e+02 3.171e+02 4.742e+02, threshold=5.012e+02, percent-clipped=0.0 2024-09-14 04:18:24,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=73221.83333333333, ans=0.04949747468305833 2024-09-14 04:18:43,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=73250.16666666667, ans=0.125 2024-09-14 04:18:52,773 INFO [train.py:1198] (1/2) Epoch 5, batch 300, loss[loss=0.3036, ctc_loss=0.2231, cr_loss=0.4026, over 20894.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2333, cr_loss=0.4182, over 3198273.83 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:19:21,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73335.16666666667, ans=0.1 2024-09-14 04:19:32,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2024-09-14 04:19:57,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-14 04:20:08,538 INFO [train.py:1198] (1/2) Epoch 5, batch 350, loss[loss=0.3689, ctc_loss=0.2748, cr_loss=0.4706, over 20612.00 frames. ], tot_loss[loss=0.3186, ctc_loss=0.2347, cr_loss=0.4198, over 3391656.83 frames. ], batch size: 71, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:20:11,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=73420.16666666667, ans=0.0 2024-09-14 04:20:27,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-14 04:20:32,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.196e+02 2.581e+02 3.294e+02 5.338e+02, threshold=5.162e+02, percent-clipped=1.0 2024-09-14 04:20:36,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73448.5, ans=0.0 2024-09-14 04:20:46,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=73476.83333333333, ans=0.04949747468305833 2024-09-14 04:20:59,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=73505.16666666667, ans=0.0 2024-09-14 04:21:00,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=73505.16666666667, ans=0.125 2024-09-14 04:21:27,643 INFO [train.py:1198] (1/2) Epoch 5, batch 400, loss[loss=0.3341, ctc_loss=0.254, cr_loss=0.4004, over 18067.00 frames. ], tot_loss[loss=0.3167, ctc_loss=0.2332, cr_loss=0.4173, over 3533257.25 frames. ], batch size: 108, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:22:41,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=73675.16666666667, ans=0.0 2024-09-14 04:22:44,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=73675.16666666667, ans=0.07 2024-09-14 04:22:47,213 INFO [train.py:1198] (1/2) Epoch 5, batch 450, loss[loss=0.3325, ctc_loss=0.2458, cr_loss=0.4332, over 21012.00 frames. ], tot_loss[loss=0.315, ctc_loss=0.2319, cr_loss=0.4155, over 3656153.58 frames. ], batch size: 63, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:22:53,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=73703.5, ans=0.125 2024-09-14 04:23:03,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73731.83333333333, ans=0.125 2024-09-14 04:23:11,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.264e+02 2.692e+02 3.227e+02 5.067e+02, threshold=5.384e+02, percent-clipped=0.0 2024-09-14 04:23:24,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=73760.16666666667, ans=0.125 2024-09-14 04:23:32,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=73788.5, ans=0.125 2024-09-14 04:23:35,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=73788.5, ans=0.125 2024-09-14 04:23:39,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73788.5, ans=0.125 2024-09-14 04:23:47,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:24:02,016 INFO [train.py:1198] (1/2) Epoch 5, batch 500, loss[loss=0.2735, ctc_loss=0.201, cr_loss=0.3624, over 19938.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2323, cr_loss=0.4158, over 3743315.48 frames. ], batch size: 44, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:24:20,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73873.5, ans=0.125 2024-09-14 04:24:32,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73901.83333333333, ans=0.1 2024-09-14 04:25:07,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=73958.5, ans=0.2 2024-09-14 04:25:17,153 INFO [train.py:1198] (1/2) Epoch 5, batch 550, loss[loss=0.32, ctc_loss=0.2357, cr_loss=0.4211, over 20654.00 frames. ], tot_loss[loss=0.317, ctc_loss=0.2334, cr_loss=0.4178, over 3822092.99 frames. ], batch size: 71, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:25:26,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=73986.83333333333, ans=0.125 2024-09-14 04:25:41,080 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.265e+02 2.549e+02 3.068e+02 5.622e+02, threshold=5.099e+02, percent-clipped=1.0 2024-09-14 04:25:49,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74043.5, ans=0.125 2024-09-14 04:26:32,313 INFO [train.py:1198] (1/2) Epoch 5, batch 600, loss[loss=0.2824, ctc_loss=0.2026, cr_loss=0.3988, over 20980.00 frames. ], tot_loss[loss=0.3167, ctc_loss=0.2332, cr_loss=0.4176, over 3889164.23 frames. ], batch size: 48, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:26:38,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-14 04:26:55,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=74156.83333333333, ans=0.125 2024-09-14 04:26:59,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=74156.83333333333, ans=0.0 2024-09-14 04:27:22,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=74213.5, ans=0.2 2024-09-14 04:27:48,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74241.83333333333, ans=0.1 2024-09-14 04:27:53,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-14 04:27:54,117 INFO [train.py:1198] (1/2) Epoch 5, batch 650, loss[loss=0.3, ctc_loss=0.2227, cr_loss=0.3863, over 20823.00 frames. ], tot_loss[loss=0.3172, ctc_loss=0.2337, cr_loss=0.4177, over 3923843.79 frames. ], batch size: 59, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:28:17,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=74298.5, ans=0.95 2024-09-14 04:28:18,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.273e+02 2.558e+02 3.138e+02 5.126e+02, threshold=5.115e+02, percent-clipped=1.0 2024-09-14 04:28:32,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=74326.83333333333, ans=0.125 2024-09-14 04:28:43,032 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:28:58,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-14 04:29:03,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=74383.5, ans=0.2 2024-09-14 04:29:09,696 INFO [train.py:1198] (1/2) Epoch 5, batch 700, loss[loss=0.3375, ctc_loss=0.247, cr_loss=0.4526, over 21045.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2343, cr_loss=0.4193, over 3957957.21 frames. ], batch size: 62, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:29:59,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=74496.83333333333, ans=0.0 2024-09-14 04:30:05,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=74496.83333333333, ans=0.05 2024-09-14 04:30:17,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=74525.16666666667, ans=0.07 2024-09-14 04:30:24,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-14 04:30:25,101 INFO [train.py:1198] (1/2) Epoch 5, batch 750, loss[loss=0.2735, ctc_loss=0.1998, cr_loss=0.3683, over 20927.00 frames. ], tot_loss[loss=0.3157, ctc_loss=0.2324, cr_loss=0.4169, over 3989303.88 frames. ], batch size: 49, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:30:46,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=74581.83333333333, ans=0.125 2024-09-14 04:30:49,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.270e+02 2.626e+02 3.050e+02 4.773e+02, threshold=5.253e+02, percent-clipped=0.0 2024-09-14 04:30:59,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-14 04:31:41,073 INFO [train.py:1198] (1/2) Epoch 5, batch 800, loss[loss=0.3256, ctc_loss=0.2365, cr_loss=0.4453, over 20683.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2308, cr_loss=0.416, over 4020302.39 frames. ], batch size: 71, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:32:12,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74751.83333333333, ans=0.125 2024-09-14 04:32:59,682 INFO [train.py:1198] (1/2) Epoch 5, batch 850, loss[loss=0.2671, ctc_loss=0.1934, cr_loss=0.3684, over 20945.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2311, cr_loss=0.4166, over 4033465.19 frames. ], batch size: 48, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:33:04,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=74836.83333333333, ans=0.125 2024-09-14 04:33:08,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=74836.83333333333, ans=15.0 2024-09-14 04:33:24,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.232e+02 2.575e+02 3.092e+02 5.064e+02, threshold=5.150e+02, percent-clipped=0.0 2024-09-14 04:33:35,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=74893.5, ans=0.125 2024-09-14 04:33:58,031 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:34:18,946 INFO [train.py:1198] (1/2) Epoch 5, batch 900, loss[loss=0.292, ctc_loss=0.2136, cr_loss=0.3919, over 20922.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2315, cr_loss=0.4167, over 4049186.24 frames. ], batch size: 50, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:34:51,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=75035.16666666667, ans=0.0 2024-09-14 04:35:00,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=75035.16666666667, ans=10.0 2024-09-14 04:35:03,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75063.5, ans=0.1 2024-09-14 04:35:34,864 INFO [train.py:1198] (1/2) Epoch 5, batch 950, loss[loss=0.3374, ctc_loss=0.2512, cr_loss=0.431, over 20657.00 frames. ], tot_loss[loss=0.3146, ctc_loss=0.2311, cr_loss=0.4172, over 4066894.69 frames. ], batch size: 66, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:35:45,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=75120.16666666667, ans=0.125 2024-09-14 04:35:59,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.259e+02 2.505e+02 2.950e+02 5.700e+02, threshold=5.011e+02, percent-clipped=1.0 2024-09-14 04:36:24,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=75205.16666666667, ans=0.2 2024-09-14 04:36:31,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=75205.16666666667, ans=0.2 2024-09-14 04:36:35,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=75233.5, ans=0.0 2024-09-14 04:36:50,493 INFO [train.py:1198] (1/2) Epoch 5, batch 1000, loss[loss=0.321, ctc_loss=0.2327, cr_loss=0.4416, over 20935.00 frames. ], tot_loss[loss=0.3151, ctc_loss=0.2317, cr_loss=0.4173, over 4061481.69 frames. ], batch size: 60, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:36:59,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=75261.83333333333, ans=0.125 2024-09-14 04:37:11,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=75290.16666666667, ans=0.125 2024-09-14 04:37:31,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=75318.5, ans=0.125 2024-09-14 04:37:42,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=75346.83333333333, ans=0.025 2024-09-14 04:38:06,095 INFO [train.py:1198] (1/2) Epoch 5, batch 1050, loss[loss=0.3344, ctc_loss=0.2476, cr_loss=0.434, over 20996.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2317, cr_loss=0.419, over 4081937.98 frames. ], batch size: 63, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:38:33,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.273e+02 2.577e+02 3.329e+02 5.088e+02, threshold=5.154e+02, percent-clipped=1.0 2024-09-14 04:39:27,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=75545.16666666667, ans=0.0 2024-09-14 04:39:28,346 INFO [train.py:1198] (1/2) Epoch 5, batch 1100, loss[loss=0.2949, ctc_loss=0.2151, cr_loss=0.3989, over 20801.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2323, cr_loss=0.42, over 4087084.32 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:39:39,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-14 04:40:00,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=75601.83333333333, ans=10.0 2024-09-14 04:40:06,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=75601.83333333333, ans=0.125 2024-09-14 04:40:14,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=75630.16666666667, ans=0.2 2024-09-14 04:40:18,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=75630.16666666667, ans=0.95 2024-09-14 04:40:43,848 INFO [train.py:1198] (1/2) Epoch 5, batch 1150, loss[loss=0.2737, ctc_loss=0.1965, cr_loss=0.3859, over 21057.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2315, cr_loss=0.4185, over 4081562.99 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 16.0 2024-09-14 04:41:00,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=75715.16666666667, ans=0.125 2024-09-14 04:41:04,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=75715.16666666667, ans=0.2 2024-09-14 04:41:09,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.141e+02 2.363e+02 2.618e+02 3.853e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-14 04:41:41,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=75771.83333333333, ans=0.125 2024-09-14 04:41:49,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75800.16666666667, ans=0.1 2024-09-14 04:41:59,495 INFO [train.py:1198] (1/2) Epoch 5, batch 1200, loss[loss=0.3409, ctc_loss=0.2517, cr_loss=0.4461, over 20836.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2309, cr_loss=0.4175, over 4091271.18 frames. ], batch size: 59, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:42:13,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=75856.83333333333, ans=0.0 2024-09-14 04:42:28,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=75885.16666666667, ans=0.04949747468305833 2024-09-14 04:42:39,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-14 04:42:52,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=75913.5, ans=0.125 2024-09-14 04:43:02,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=75941.83333333333, ans=0.0 2024-09-14 04:43:13,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=22.5 2024-09-14 04:43:15,888 INFO [train.py:1198] (1/2) Epoch 5, batch 1250, loss[loss=0.3262, ctc_loss=0.2422, cr_loss=0.4199, over 20246.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2301, cr_loss=0.4166, over 4109961.27 frames. ], batch size: 74, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:43:32,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75998.5, ans=0.1 2024-09-14 04:43:41,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.260e+02 2.462e+02 3.135e+02 5.192e+02, threshold=4.924e+02, percent-clipped=1.0 2024-09-14 04:43:48,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=76026.83333333333, ans=0.125 2024-09-14 04:43:48,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=76026.83333333333, ans=12.0 2024-09-14 04:43:55,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-14 04:44:28,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=76083.5, ans=0.125 2024-09-14 04:44:30,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=76083.5, ans=0.125 2024-09-14 04:44:34,585 INFO [train.py:1198] (1/2) Epoch 5, batch 1300, loss[loss=0.2988, ctc_loss=0.2178, cr_loss=0.4047, over 21067.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2301, cr_loss=0.4163, over 4111544.21 frames. ], batch size: 53, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:44:54,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-14 04:45:53,306 INFO [train.py:1198] (1/2) Epoch 5, batch 1350, loss[loss=0.3033, ctc_loss=0.2181, cr_loss=0.426, over 20903.00 frames. ], tot_loss[loss=0.3139, ctc_loss=0.2305, cr_loss=0.4171, over 4105601.93 frames. ], batch size: 54, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:45:56,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=76253.5, ans=10.0 2024-09-14 04:46:02,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=76253.5, ans=0.0 2024-09-14 04:46:19,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.260e+02 2.463e+02 3.053e+02 4.334e+02, threshold=4.926e+02, percent-clipped=0.0 2024-09-14 04:46:22,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=76310.16666666667, ans=0.125 2024-09-14 04:46:25,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=76310.16666666667, ans=0.125 2024-09-14 04:46:52,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=76366.83333333333, ans=0.07 2024-09-14 04:47:09,337 INFO [train.py:1198] (1/2) Epoch 5, batch 1400, loss[loss=0.2573, ctc_loss=0.1814, cr_loss=0.3791, over 19923.00 frames. ], tot_loss[loss=0.3122, ctc_loss=0.2291, cr_loss=0.4152, over 4104533.92 frames. ], batch size: 44, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:47:15,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=76395.16666666667, ans=0.125 2024-09-14 04:47:30,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=76423.5, ans=0.0 2024-09-14 04:47:30,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=76423.5, ans=0.0 2024-09-14 04:47:33,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=76423.5, ans=0.2 2024-09-14 04:47:35,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76423.5, ans=0.125 2024-09-14 04:47:44,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=76451.83333333333, ans=0.035 2024-09-14 04:48:24,978 INFO [train.py:1198] (1/2) Epoch 5, batch 1450, loss[loss=0.2568, ctc_loss=0.1846, cr_loss=0.3613, over 20941.00 frames. ], tot_loss[loss=0.3127, ctc_loss=0.2295, cr_loss=0.4162, over 4105122.35 frames. ], batch size: 50, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:48:35,924 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:48:50,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.187e+02 2.432e+02 2.673e+02 4.140e+02, threshold=4.864e+02, percent-clipped=0.0 2024-09-14 04:49:00,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=76593.5, ans=0.2 2024-09-14 04:49:32,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-09-14 04:49:37,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-09-14 04:49:40,528 INFO [train.py:1198] (1/2) Epoch 5, batch 1500, loss[loss=0.2402, ctc_loss=0.175, cr_loss=0.3257, over 20988.00 frames. ], tot_loss[loss=0.3111, ctc_loss=0.2282, cr_loss=0.4147, over 4109554.45 frames. ], batch size: 49, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:50:32,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-09-14 04:51:01,550 INFO [train.py:1198] (1/2) Epoch 5, batch 1550, loss[loss=0.3131, ctc_loss=0.2243, cr_loss=0.4439, over 21014.00 frames. ], tot_loss[loss=0.3119, ctc_loss=0.2287, cr_loss=0.4161, over 4104069.00 frames. ], batch size: 62, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:51:17,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=76848.5, ans=0.125 2024-09-14 04:51:27,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.199e+02 2.388e+02 2.846e+02 4.682e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-14 04:52:17,483 INFO [train.py:1198] (1/2) Epoch 5, batch 1600, loss[loss=0.3137, ctc_loss=0.2277, cr_loss=0.4299, over 21017.00 frames. ], tot_loss[loss=0.3115, ctc_loss=0.2283, cr_loss=0.4161, over 4106585.88 frames. ], batch size: 63, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:52:17,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76961.83333333333, ans=0.125 2024-09-14 04:52:37,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=76990.16666666667, ans=0.0 2024-09-14 04:52:58,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=77018.5, ans=0.2 2024-09-14 04:53:33,700 INFO [train.py:1198] (1/2) Epoch 5, batch 1650, loss[loss=0.2762, ctc_loss=0.1986, cr_loss=0.388, over 20959.00 frames. ], tot_loss[loss=0.3116, ctc_loss=0.2285, cr_loss=0.4157, over 4102902.65 frames. ], batch size: 49, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:53:59,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.205e+02 2.413e+02 2.920e+02 5.498e+02, threshold=4.827e+02, percent-clipped=2.0 2024-09-14 04:54:02,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77160.16666666667, ans=0.1 2024-09-14 04:54:29,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=77188.5, ans=0.125 2024-09-14 04:54:47,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=77245.16666666667, ans=0.2 2024-09-14 04:54:48,690 INFO [train.py:1198] (1/2) Epoch 5, batch 1700, loss[loss=0.297, ctc_loss=0.2183, cr_loss=0.3934, over 21065.00 frames. ], tot_loss[loss=0.3118, ctc_loss=0.2286, cr_loss=0.416, over 4100541.12 frames. ], batch size: 53, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:54:55,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77245.16666666667, ans=0.125 2024-09-14 04:54:59,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=77245.16666666667, ans=0.2 2024-09-14 04:55:05,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=77273.5, ans=0.2 2024-09-14 04:55:06,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-14 04:55:18,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-14 04:55:22,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=77301.83333333333, ans=0.0 2024-09-14 04:55:43,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=77330.16666666667, ans=0.125 2024-09-14 04:56:06,781 INFO [train.py:1198] (1/2) Epoch 5, batch 1750, loss[loss=0.3034, ctc_loss=0.221, cr_loss=0.4117, over 20932.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2287, cr_loss=0.4164, over 4102122.03 frames. ], batch size: 60, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:56:10,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77386.83333333333, ans=0.1 2024-09-14 04:56:35,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-09-14 04:56:35,764 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.241e+02 2.627e+02 3.126e+02 7.012e+02, threshold=5.255e+02, percent-clipped=4.0 2024-09-14 04:56:59,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=77471.83333333333, ans=0.025 2024-09-14 04:57:24,961 INFO [train.py:1198] (1/2) Epoch 5, batch 1800, loss[loss=0.3082, ctc_loss=0.2279, cr_loss=0.4012, over 20981.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.2293, cr_loss=0.4162, over 4085843.68 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:57:28,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=77528.5, ans=0.2 2024-09-14 04:57:43,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=77556.83333333333, ans=10.0 2024-09-14 04:57:50,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=77556.83333333333, ans=0.125 2024-09-14 04:58:16,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=77613.5, ans=0.0 2024-09-14 04:58:40,618 INFO [train.py:1198] (1/2) Epoch 5, batch 1850, loss[loss=0.3182, ctc_loss=0.2312, cr_loss=0.4349, over 20949.00 frames. ], tot_loss[loss=0.3126, ctc_loss=0.2293, cr_loss=0.4164, over 4078626.65 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:58:47,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=77670.16666666667, ans=0.0 2024-09-14 04:59:06,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.206e+02 2.459e+02 2.688e+02 5.866e+02, threshold=4.917e+02, percent-clipped=1.0 2024-09-14 04:59:14,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-14 04:59:27,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=77755.16666666667, ans=0.2 2024-09-14 04:59:48,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77783.5, ans=0.1 2024-09-14 04:59:55,213 INFO [train.py:1198] (1/2) Epoch 5, batch 1900, loss[loss=0.345, ctc_loss=0.2568, cr_loss=0.4409, over 21021.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2298, cr_loss=0.4179, over 4091255.58 frames. ], batch size: 61, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 05:00:10,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=77840.16666666667, ans=0.0 2024-09-14 05:00:18,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=77840.16666666667, ans=0.2 2024-09-14 05:01:09,905 INFO [train.py:1198] (1/2) Epoch 5, batch 1950, loss[loss=0.2562, ctc_loss=0.1876, cr_loss=0.343, over 20937.00 frames. ], tot_loss[loss=0.3131, ctc_loss=0.2296, cr_loss=0.4172, over 4091699.56 frames. ], batch size: 49, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:01:21,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=22.5 2024-09-14 05:01:35,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.272e+02 2.658e+02 3.216e+02 5.160e+02, threshold=5.317e+02, percent-clipped=2.0 2024-09-14 05:01:43,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=78010.16666666667, ans=0.0 2024-09-14 05:01:43,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-09-14 05:02:00,145 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.293e-02 2024-09-14 05:02:14,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-14 05:02:15,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78066.83333333333, ans=0.125 2024-09-14 05:02:26,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-14 05:02:31,854 INFO [train.py:1198] (1/2) Epoch 5, batch 2000, loss[loss=0.3478, ctc_loss=0.2673, cr_loss=0.4025, over 18296.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2299, cr_loss=0.4165, over 4084090.68 frames. ], batch size: 108, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:02:36,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78095.16666666667, ans=0.125 2024-09-14 05:02:37,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=78095.16666666667, ans=0.035 2024-09-14 05:03:13,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=78151.83333333333, ans=0.025 2024-09-14 05:03:47,023 INFO [train.py:1198] (1/2) Epoch 5, batch 2050, loss[loss=0.2581, ctc_loss=0.1855, cr_loss=0.3629, over 20977.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2297, cr_loss=0.4173, over 4089323.96 frames. ], batch size: 51, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:03:53,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=78236.83333333333, ans=0.2 2024-09-14 05:04:12,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.217e+02 2.439e+02 3.083e+02 6.067e+02, threshold=4.878e+02, percent-clipped=1.0 2024-09-14 05:04:14,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2024-09-14 05:04:21,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=78293.5, ans=0.025 2024-09-14 05:04:41,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=78321.83333333333, ans=0.025 2024-09-14 05:04:51,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=78350.16666666667, ans=0.09899494936611666 2024-09-14 05:05:02,149 INFO [train.py:1198] (1/2) Epoch 5, batch 2100, loss[loss=0.2785, ctc_loss=0.1994, cr_loss=0.3954, over 20944.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2298, cr_loss=0.4173, over 4082057.12 frames. ], batch size: 50, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:05:05,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=78378.5, ans=0.0 2024-09-14 05:05:33,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-09-14 05:06:06,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=78491.83333333333, ans=0.125 2024-09-14 05:06:12,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78491.83333333333, ans=0.125 2024-09-14 05:06:17,904 INFO [train.py:1198] (1/2) Epoch 5, batch 2150, loss[loss=0.2698, ctc_loss=0.188, cr_loss=0.4089, over 20946.00 frames. ], tot_loss[loss=0.3112, ctc_loss=0.2282, cr_loss=0.4152, over 4080593.12 frames. ], batch size: 50, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:06:44,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.182e+02 2.428e+02 2.938e+02 5.310e+02, threshold=4.856e+02, percent-clipped=1.0 2024-09-14 05:06:53,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=78576.83333333333, ans=0.2 2024-09-14 05:06:57,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78576.83333333333, ans=0.1 2024-09-14 05:07:06,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-09-14 05:07:07,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=78605.16666666667, ans=0.125 2024-09-14 05:07:26,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=78633.5, ans=0.025 2024-09-14 05:07:36,827 INFO [train.py:1198] (1/2) Epoch 5, batch 2200, loss[loss=0.3043, ctc_loss=0.2215, cr_loss=0.4138, over 20871.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2274, cr_loss=0.4152, over 4090253.65 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:08:00,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=78690.16666666667, ans=0.0 2024-09-14 05:08:01,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.94 vs. limit=10.0 2024-09-14 05:08:26,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=78746.83333333333, ans=0.0 2024-09-14 05:08:29,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78746.83333333333, ans=0.125 2024-09-14 05:08:40,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=78775.16666666667, ans=0.125 2024-09-14 05:08:45,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=78775.16666666667, ans=0.125 2024-09-14 05:08:55,430 INFO [train.py:1198] (1/2) Epoch 5, batch 2250, loss[loss=0.3601, ctc_loss=0.2717, cr_loss=0.4422, over 21068.00 frames. ], tot_loss[loss=0.3115, ctc_loss=0.2281, cr_loss=0.4168, over 4104888.18 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:09:20,909 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.383e+02 2.840e+02 3.304e+02 5.979e+02, threshold=5.680e+02, percent-clipped=3.0 2024-09-14 05:09:34,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=78860.16666666667, ans=0.125 2024-09-14 05:10:10,773 INFO [train.py:1198] (1/2) Epoch 5, batch 2300, loss[loss=0.2579, ctc_loss=0.1854, cr_loss=0.3626, over 20380.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.229, cr_loss=0.4176, over 4097774.63 frames. ], batch size: 45, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:10:40,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=79001.83333333333, ans=0.125 2024-09-14 05:10:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=79001.83333333333, ans=0.125 2024-09-14 05:10:53,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=79001.83333333333, ans=0.125 2024-09-14 05:10:54,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=79030.16666666667, ans=0.125 2024-09-14 05:11:05,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=79030.16666666667, ans=0.125 2024-09-14 05:11:26,261 INFO [train.py:1198] (1/2) Epoch 5, batch 2350, loss[loss=0.3405, ctc_loss=0.2484, cr_loss=0.4604, over 21012.00 frames. ], tot_loss[loss=0.3122, ctc_loss=0.2287, cr_loss=0.4171, over 4105405.39 frames. ], batch size: 61, lr: 1.65e-02, grad_scale: 16.0 2024-09-14 05:11:41,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=79115.16666666667, ans=0.125 2024-09-14 05:11:46,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-09-14 05:11:53,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.309e+02 2.622e+02 3.073e+02 5.142e+02, threshold=5.243e+02, percent-clipped=0.0 2024-09-14 05:12:41,566 INFO [train.py:1198] (1/2) Epoch 5, batch 2400, loss[loss=0.3166, ctc_loss=0.2309, cr_loss=0.4287, over 20679.00 frames. ], tot_loss[loss=0.3127, ctc_loss=0.2292, cr_loss=0.4178, over 4113647.43 frames. ], batch size: 66, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:12:45,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79228.5, ans=0.125 2024-09-14 05:13:10,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=79285.16666666667, ans=0.0 2024-09-14 05:13:13,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=79285.16666666667, ans=0.125 2024-09-14 05:13:14,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-14 05:13:35,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=79313.5, ans=0.125 2024-09-14 05:14:04,349 INFO [train.py:1198] (1/2) Epoch 5, batch 2450, loss[loss=0.3423, ctc_loss=0.2525, cr_loss=0.449, over 21030.00 frames. ], tot_loss[loss=0.3124, ctc_loss=0.2289, cr_loss=0.4178, over 4113126.25 frames. ], batch size: 63, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:14:22,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=79398.5, ans=0.0 2024-09-14 05:14:25,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=79398.5, ans=0.0 2024-09-14 05:14:27,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=79398.5, ans=0.125 2024-09-14 05:14:31,541 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.141e+02 2.417e+02 2.858e+02 5.130e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-14 05:15:05,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=79483.5, ans=0.125 2024-09-14 05:15:19,164 INFO [train.py:1198] (1/2) Epoch 5, batch 2500, loss[loss=0.2854, ctc_loss=0.2095, cr_loss=0.3799, over 20990.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2305, cr_loss=0.419, over 4107968.90 frames. ], batch size: 51, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:15:27,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-14 05:15:56,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-14 05:16:31,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-14 05:16:31,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-14 05:16:34,229 INFO [train.py:1198] (1/2) Epoch 5, batch 2550, loss[loss=0.2748, ctc_loss=0.1982, cr_loss=0.383, over 20962.00 frames. ], tot_loss[loss=0.3142, ctc_loss=0.2305, cr_loss=0.4187, over 4104386.85 frames. ], batch size: 51, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:16:40,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-14 05:16:55,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=79681.83333333333, ans=0.125 2024-09-14 05:17:00,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=79681.83333333333, ans=0.5 2024-09-14 05:17:01,153 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.242e+02 2.471e+02 2.801e+02 5.372e+02, threshold=4.942e+02, percent-clipped=0.0 2024-09-14 05:17:07,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79710.16666666667, ans=0.1 2024-09-14 05:17:45,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=79766.83333333333, ans=0.2 2024-09-14 05:17:49,552 INFO [train.py:1198] (1/2) Epoch 5, batch 2600, loss[loss=0.328, ctc_loss=0.2405, cr_loss=0.4374, over 20939.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.231, cr_loss=0.4182, over 4089178.64 frames. ], batch size: 60, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:17:57,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79795.16666666667, ans=0.1 2024-09-14 05:17:59,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-14 05:18:03,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=79823.5, ans=0.0 2024-09-14 05:18:47,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=79880.16666666667, ans=0.0 2024-09-14 05:19:08,384 INFO [train.py:1198] (1/2) Epoch 5, batch 2650, loss[loss=0.288, ctc_loss=0.2094, cr_loss=0.3929, over 20827.00 frames. ], tot_loss[loss=0.3142, ctc_loss=0.2306, cr_loss=0.418, over 4091349.96 frames. ], batch size: 59, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:19:38,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.167e+02 2.414e+02 2.922e+02 4.679e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-14 05:20:21,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=80050.16666666667, ans=0.125 2024-09-14 05:20:22,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=80050.16666666667, ans=0.125 2024-09-14 05:20:27,058 INFO [train.py:1198] (1/2) Epoch 5, batch 2700, loss[loss=0.327, ctc_loss=0.2409, cr_loss=0.4308, over 20333.00 frames. ], tot_loss[loss=0.3123, ctc_loss=0.2288, cr_loss=0.4171, over 4091002.74 frames. ], batch size: 74, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:20:31,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=80078.5, ans=0.125 2024-09-14 05:20:46,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=80106.83333333333, ans=0.125 2024-09-14 05:21:42,653 INFO [train.py:1198] (1/2) Epoch 5, batch 2750, loss[loss=0.3289, ctc_loss=0.2393, cr_loss=0.4478, over 20827.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.229, cr_loss=0.4173, over 4074717.71 frames. ], batch size: 59, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:21:58,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=80248.5, ans=0.025 2024-09-14 05:22:09,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.195e+02 2.460e+02 2.775e+02 5.354e+02, threshold=4.921e+02, percent-clipped=1.0 2024-09-14 05:22:17,531 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:22:23,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=80276.83333333333, ans=0.125 2024-09-14 05:22:47,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=80333.5, ans=0.2 2024-09-14 05:22:48,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=80333.5, ans=0.125 2024-09-14 05:22:57,544 INFO [train.py:1198] (1/2) Epoch 5, batch 2800, loss[loss=0.264, ctc_loss=0.1873, cr_loss=0.3838, over 20968.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2298, cr_loss=0.4183, over 4080009.38 frames. ], batch size: 49, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:22:57,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80361.83333333333, ans=0.1 2024-09-14 05:23:13,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=22.5 2024-09-14 05:23:20,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=80390.16666666667, ans=0.0 2024-09-14 05:23:25,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=80390.16666666667, ans=0.0 2024-09-14 05:23:31,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=80418.5, ans=0.0 2024-09-14 05:24:03,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-14 05:24:04,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-14 05:24:13,297 INFO [train.py:1198] (1/2) Epoch 5, batch 2850, loss[loss=0.3305, ctc_loss=0.2416, cr_loss=0.4442, over 20680.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2297, cr_loss=0.4188, over 4092388.23 frames. ], batch size: 68, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:24:33,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2024-09-14 05:24:43,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.215e+02 2.474e+02 2.810e+02 4.187e+02, threshold=4.948e+02, percent-clipped=0.0 2024-09-14 05:24:52,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=80560.16666666667, ans=0.125 2024-09-14 05:25:02,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2024-09-14 05:25:06,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80588.5, ans=0.1 2024-09-14 05:25:09,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80588.5, ans=0.125 2024-09-14 05:25:30,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.30 vs. limit=22.5 2024-09-14 05:25:34,325 INFO [train.py:1198] (1/2) Epoch 5, batch 2900, loss[loss=0.351, ctc_loss=0.2569, cr_loss=0.4706, over 20998.00 frames. ], tot_loss[loss=0.3128, ctc_loss=0.2291, cr_loss=0.4184, over 4084658.41 frames. ], batch size: 63, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:25:37,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=80645.16666666667, ans=0.0 2024-09-14 05:25:37,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=80645.16666666667, ans=0.125 2024-09-14 05:25:48,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80673.5, ans=0.125 2024-09-14 05:25:51,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=80673.5, ans=0.2 2024-09-14 05:25:55,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=80673.5, ans=0.125 2024-09-14 05:26:05,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80701.83333333333, ans=0.1 2024-09-14 05:26:06,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80701.83333333333, ans=0.125 2024-09-14 05:26:29,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=80730.16666666667, ans=0.2 2024-09-14 05:26:41,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=80758.5, ans=0.125 2024-09-14 05:26:50,357 INFO [train.py:1198] (1/2) Epoch 5, batch 2950, loss[loss=0.2883, ctc_loss=0.2102, cr_loss=0.3903, over 21000.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.2289, cr_loss=0.4185, over 4097351.37 frames. ], batch size: 52, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:27:07,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=12.0 2024-09-14 05:27:17,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.189e+02 2.408e+02 2.676e+02 5.297e+02, threshold=4.816e+02, percent-clipped=2.0 2024-09-14 05:27:43,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-14 05:27:47,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=80871.83333333333, ans=0.0 2024-09-14 05:28:04,791 INFO [train.py:1198] (1/2) Epoch 5, batch 3000, loss[loss=0.3137, ctc_loss=0.2316, cr_loss=0.4106, over 21021.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2296, cr_loss=0.4192, over 4104043.68 frames. ], batch size: 63, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:28:04,791 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 05:28:17,952 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8423, 2.2027, 2.2185, 2.7517, 2.7157, 2.7348, 2.4330, 2.9424], device='cuda:1') 2024-09-14 05:28:25,121 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.07203, ctc_loss=0.07203, cr_loss=9.208e-15, over 944034.00 frames. 2024-09-14 05:28:25,122 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 05:28:54,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-14 05:29:18,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81013.5, ans=0.125 2024-09-14 05:29:31,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=81041.83333333333, ans=0.125 2024-09-14 05:29:36,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=81041.83333333333, ans=0.0 2024-09-14 05:29:40,500 INFO [train.py:1198] (1/2) Epoch 5, batch 3050, loss[loss=0.3139, ctc_loss=0.2302, cr_loss=0.4185, over 21050.00 frames. ], tot_loss[loss=0.3154, ctc_loss=0.2312, cr_loss=0.4214, over 4097039.45 frames. ], batch size: 63, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:29:52,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81070.16666666667, ans=0.125 2024-09-14 05:29:54,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81098.5, ans=0.1 2024-09-14 05:30:10,518 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.173e+02 2.418e+02 2.867e+02 4.391e+02, threshold=4.837e+02, percent-clipped=0.0 2024-09-14 05:31:02,194 INFO [train.py:1198] (1/2) Epoch 5, batch 3100, loss[loss=0.3153, ctc_loss=0.2325, cr_loss=0.4142, over 20714.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.2306, cr_loss=0.4208, over 4099019.60 frames. ], batch size: 71, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:31:06,950 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:31:33,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=81268.5, ans=0.035 2024-09-14 05:32:08,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81325.16666666667, ans=0.125 2024-09-14 05:32:16,924 INFO [train.py:1198] (1/2) Epoch 5, batch 3150, loss[loss=0.3379, ctc_loss=0.2472, cr_loss=0.4537, over 20673.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2298, cr_loss=0.4199, over 4105594.78 frames. ], batch size: 68, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:32:44,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.174e+02 2.414e+02 2.681e+02 3.799e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-14 05:32:47,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=81410.16666666667, ans=0.0 2024-09-14 05:33:04,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=81438.5, ans=10.0 2024-09-14 05:33:21,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=81466.83333333333, ans=0.2 2024-09-14 05:33:32,901 INFO [train.py:1198] (1/2) Epoch 5, batch 3200, loss[loss=0.3259, ctc_loss=0.2423, cr_loss=0.4182, over 20330.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.2307, cr_loss=0.4206, over 4101772.84 frames. ], batch size: 74, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:33:36,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81495.16666666667, ans=0.125 2024-09-14 05:34:03,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2024-09-14 05:34:48,588 INFO [train.py:1198] (1/2) Epoch 5, batch 3250, loss[loss=0.2879, ctc_loss=0.2118, cr_loss=0.3809, over 19016.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.2306, cr_loss=0.4206, over 4107703.70 frames. ], batch size: 42, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:35:03,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81665.16666666667, ans=0.1 2024-09-14 05:35:17,074 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.197e+02 2.501e+02 2.945e+02 4.319e+02, threshold=5.002e+02, percent-clipped=0.0 2024-09-14 05:35:20,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=81693.5, ans=0.0 2024-09-14 05:35:29,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=81693.5, ans=0.125 2024-09-14 05:36:09,427 INFO [train.py:1198] (1/2) Epoch 5, batch 3300, loss[loss=0.3364, ctc_loss=0.2461, cr_loss=0.4515, over 20069.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2295, cr_loss=0.4192, over 4113549.07 frames. ], batch size: 80, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:36:18,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-09-14 05:37:17,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=81891.83333333333, ans=0.125 2024-09-14 05:37:25,087 INFO [train.py:1198] (1/2) Epoch 5, batch 3350, loss[loss=0.3266, ctc_loss=0.2378, cr_loss=0.4442, over 20829.00 frames. ], tot_loss[loss=0.3123, ctc_loss=0.2286, cr_loss=0.4182, over 4104914.29 frames. ], batch size: 65, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:37:46,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-09-14 05:37:49,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=81948.5, ans=0.0 2024-09-14 05:37:53,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.230e+02 2.442e+02 3.002e+02 5.612e+02, threshold=4.884e+02, percent-clipped=2.0 2024-09-14 05:37:58,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=81976.83333333333, ans=0.0 2024-09-14 05:38:04,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=81976.83333333333, ans=0.95 2024-09-14 05:38:04,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=81976.83333333333, ans=0.07 2024-09-14 05:38:12,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=82005.16666666667, ans=0.125 2024-09-14 05:38:16,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=82005.16666666667, ans=0.125 2024-09-14 05:38:24,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=82033.5, ans=0.025 2024-09-14 05:38:38,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=22.5 2024-09-14 05:38:40,820 INFO [train.py:1198] (1/2) Epoch 5, batch 3400, loss[loss=0.3427, ctc_loss=0.2526, cr_loss=0.4505, over 19941.00 frames. ], tot_loss[loss=0.3106, ctc_loss=0.2271, cr_loss=0.4174, over 4119656.97 frames. ], batch size: 80, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:38:51,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=82061.83333333333, ans=0.0 2024-09-14 05:39:05,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=82090.16666666667, ans=0.025 2024-09-14 05:39:44,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=82175.16666666667, ans=0.125 2024-09-14 05:39:44,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-14 05:39:56,156 INFO [train.py:1198] (1/2) Epoch 5, batch 3450, loss[loss=0.3296, ctc_loss=0.239, cr_loss=0.4527, over 21033.00 frames. ], tot_loss[loss=0.3104, ctc_loss=0.2267, cr_loss=0.4184, over 4127272.08 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:40:07,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=82203.5, ans=0.0 2024-09-14 05:40:08,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2024-09-14 05:40:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=82231.83333333333, ans=0.0 2024-09-14 05:40:25,178 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.180e+02 2.416e+02 2.788e+02 3.622e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-14 05:40:39,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=82260.16666666667, ans=0.025 2024-09-14 05:40:47,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2024-09-14 05:40:53,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=82288.5, ans=0.05 2024-09-14 05:41:12,090 INFO [train.py:1198] (1/2) Epoch 5, batch 3500, loss[loss=0.2889, ctc_loss=0.203, cr_loss=0.4294, over 20970.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.2258, cr_loss=0.417, over 4124386.89 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:41:21,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=82345.16666666667, ans=0.125 2024-09-14 05:41:47,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=82401.83333333333, ans=0.0 2024-09-14 05:41:55,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=82401.83333333333, ans=0.2 2024-09-14 05:42:26,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=82458.5, ans=0.0 2024-09-14 05:42:31,923 INFO [train.py:1198] (1/2) Epoch 5, batch 3550, loss[loss=0.3288, ctc_loss=0.2362, cr_loss=0.4631, over 20856.00 frames. ], tot_loss[loss=0.3112, ctc_loss=0.2275, cr_loss=0.4184, over 4117412.25 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:42:38,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=82486.83333333333, ans=0.125 2024-09-14 05:42:38,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-14 05:43:00,042 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.177e+02 2.468e+02 2.916e+02 4.942e+02, threshold=4.936e+02, percent-clipped=1.0 2024-09-14 05:43:12,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=82543.5, ans=0.125 2024-09-14 05:43:31,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82600.16666666667, ans=0.1 2024-09-14 05:43:46,759 INFO [train.py:1198] (1/2) Epoch 5, batch 3600, loss[loss=0.3167, ctc_loss=0.2354, cr_loss=0.4067, over 20786.00 frames. ], tot_loss[loss=0.3141, ctc_loss=0.2302, cr_loss=0.4197, over 4091886.06 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:43:55,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=82628.5, ans=0.125 2024-09-14 05:44:03,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=82656.83333333333, ans=0.125 2024-09-14 05:44:08,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=82656.83333333333, ans=0.125 2024-09-14 05:45:01,577 INFO [train.py:1198] (1/2) Epoch 5, batch 3650, loss[loss=0.3122, ctc_loss=0.223, cr_loss=0.4461, over 20771.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2294, cr_loss=0.4193, over 4087548.46 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:45:04,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=82770.16666666667, ans=0.0 2024-09-14 05:45:24,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82798.5, ans=0.1 2024-09-14 05:45:29,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.234e+02 2.506e+02 3.067e+02 8.156e+02, threshold=5.011e+02, percent-clipped=2.0 2024-09-14 05:45:34,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=82826.83333333333, ans=0.125 2024-09-14 05:46:16,948 INFO [train.py:1198] (1/2) Epoch 5, batch 3700, loss[loss=0.2987, ctc_loss=0.2151, cr_loss=0.4179, over 20802.00 frames. ], tot_loss[loss=0.3136, ctc_loss=0.2296, cr_loss=0.4199, over 4083464.80 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:46:34,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-14 05:46:59,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=82968.5, ans=0.0 2024-09-14 05:47:14,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=82996.83333333333, ans=0.0 2024-09-14 05:47:38,164 INFO [train.py:1198] (1/2) Epoch 5, batch 3750, loss[loss=0.312, ctc_loss=0.2272, cr_loss=0.4238, over 21075.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2303, cr_loss=0.42, over 4075337.18 frames. ], batch size: 59, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:47:53,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=83081.83333333333, ans=0.125 2024-09-14 05:48:06,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.195e+02 2.435e+02 2.822e+02 4.241e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-14 05:48:48,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-14 05:48:54,028 INFO [train.py:1198] (1/2) Epoch 5, batch 3800, loss[loss=0.3581, ctc_loss=0.2632, cr_loss=0.4743, over 18337.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2294, cr_loss=0.4188, over 4080352.88 frames. ], batch size: 109, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:49:18,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2024-09-14 05:49:33,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=83251.83333333333, ans=0.0 2024-09-14 05:49:43,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=83280.16666666667, ans=0.2 2024-09-14 05:49:54,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2024-09-14 05:50:09,170 INFO [train.py:1198] (1/2) Epoch 5, batch 3850, loss[loss=0.3528, ctc_loss=0.2612, cr_loss=0.4582, over 20977.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2284, cr_loss=0.4181, over 4088467.35 frames. ], batch size: 64, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:50:20,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-14 05:50:37,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.200e+02 2.463e+02 2.934e+02 5.500e+02, threshold=4.926e+02, percent-clipped=1.0 2024-09-14 05:50:47,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=83393.5, ans=0.0 2024-09-14 05:51:07,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-14 05:51:24,401 INFO [train.py:1198] (1/2) Epoch 5, batch 3900, loss[loss=0.3466, ctc_loss=0.2582, cr_loss=0.4415, over 20973.00 frames. ], tot_loss[loss=0.3115, ctc_loss=0.2279, cr_loss=0.4182, over 4094194.37 frames. ], batch size: 64, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:51:30,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=83478.5, ans=0.125 2024-09-14 05:51:44,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=83506.83333333333, ans=0.0 2024-09-14 05:52:10,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-09-14 05:52:20,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83563.5, ans=0.1 2024-09-14 05:52:32,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=83591.83333333333, ans=0.125 2024-09-14 05:52:33,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=83591.83333333333, ans=15.0 2024-09-14 05:52:39,683 INFO [train.py:1198] (1/2) Epoch 5, batch 3950, loss[loss=0.3185, ctc_loss=0.2389, cr_loss=0.398, over 20702.00 frames. ], tot_loss[loss=0.3108, ctc_loss=0.2273, cr_loss=0.4173, over 4105440.39 frames. ], batch size: 66, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:52:40,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=22.5 2024-09-14 05:52:41,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=83620.16666666667, ans=0.0 2024-09-14 05:53:07,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83648.5, ans=0.1 2024-09-14 05:53:14,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.188e+02 2.319e+02 2.628e+02 3.864e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-14 05:53:36,154 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:53:36,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=83705.16666666667, ans=0.125 2024-09-14 05:53:46,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=83733.5, ans=0.125 2024-09-14 05:53:57,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=83733.5, ans=0.0 2024-09-14 05:54:01,333 INFO [train.py:1198] (1/2) Epoch 5, batch 4000, loss[loss=0.2627, ctc_loss=0.1881, cr_loss=0.3726, over 20959.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2266, cr_loss=0.4162, over 4102058.52 frames. ], batch size: 51, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:54:25,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=83790.16666666667, ans=0.05 2024-09-14 05:54:28,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83790.16666666667, ans=0.1 2024-09-14 05:54:34,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=83818.5, ans=0.025 2024-09-14 05:54:38,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-14 05:54:46,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=83846.83333333333, ans=0.125 2024-09-14 05:55:16,342 INFO [train.py:1198] (1/2) Epoch 5, batch 4050, loss[loss=0.2822, ctc_loss=0.2038, cr_loss=0.392, over 20796.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.2267, cr_loss=0.4159, over 4092269.97 frames. ], batch size: 53, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:55:19,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=83903.5, ans=0.05 2024-09-14 05:55:44,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.169e+02 2.337e+02 2.655e+02 4.904e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-14 05:56:04,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=83988.5, ans=0.5 2024-09-14 05:56:09,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83988.5, ans=0.1 2024-09-14 05:56:12,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-14 05:56:16,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=84016.83333333333, ans=0.125 2024-09-14 05:56:31,692 INFO [train.py:1198] (1/2) Epoch 5, batch 4100, loss[loss=0.3269, ctc_loss=0.239, cr_loss=0.4394, over 20841.00 frames. ], tot_loss[loss=0.3083, ctc_loss=0.2253, cr_loss=0.4152, over 4096419.51 frames. ], batch size: 65, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:56:46,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-14 05:56:49,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2024-09-14 05:56:51,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2024-09-14 05:56:56,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=84073.5, ans=0.0 2024-09-14 05:56:56,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=84073.5, ans=0.125 2024-09-14 05:57:08,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=84101.83333333333, ans=0.0 2024-09-14 05:57:09,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=84101.83333333333, ans=0.0 2024-09-14 05:57:46,725 INFO [train.py:1198] (1/2) Epoch 5, batch 4150, loss[loss=0.3105, ctc_loss=0.2293, cr_loss=0.4059, over 20965.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2255, cr_loss=0.4161, over 4096536.53 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:57:59,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=84186.83333333333, ans=0.125 2024-09-14 05:58:15,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.189e+02 2.356e+02 2.642e+02 4.278e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-14 05:58:21,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=84243.5, ans=0.0 2024-09-14 05:58:33,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=84271.83333333333, ans=0.125 2024-09-14 05:58:47,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-14 05:58:57,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-14 05:59:07,765 INFO [train.py:1198] (1/2) Epoch 5, batch 4200, loss[loss=0.2877, ctc_loss=0.2085, cr_loss=0.3963, over 21054.00 frames. ], tot_loss[loss=0.309, ctc_loss=0.2258, cr_loss=0.416, over 4095763.23 frames. ], batch size: 53, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 05:59:41,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=84385.16666666667, ans=0.0 2024-09-14 05:59:53,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2024-09-14 06:00:07,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=84441.83333333333, ans=0.125 2024-09-14 06:00:23,580 INFO [train.py:1198] (1/2) Epoch 5, batch 4250, loss[loss=0.3356, ctc_loss=0.2475, cr_loss=0.4404, over 20965.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.2262, cr_loss=0.4164, over 4100549.94 frames. ], batch size: 64, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:00:24,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=84470.16666666667, ans=0.0 2024-09-14 06:00:34,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=84470.16666666667, ans=0.125 2024-09-14 06:00:38,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=84498.5, ans=0.04949747468305833 2024-09-14 06:00:43,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=84498.5, ans=0.0 2024-09-14 06:00:49,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=84498.5, ans=0.0 2024-09-14 06:00:52,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.222e+02 2.393e+02 2.847e+02 4.628e+02, threshold=4.785e+02, percent-clipped=0.0 2024-09-14 06:01:11,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=84555.16666666667, ans=0.125 2024-09-14 06:01:38,930 INFO [train.py:1198] (1/2) Epoch 5, batch 4300, loss[loss=0.2818, ctc_loss=0.2053, cr_loss=0.3829, over 20927.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.2262, cr_loss=0.4167, over 4098772.59 frames. ], batch size: 50, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:01:40,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=84611.83333333333, ans=0.0 2024-09-14 06:01:48,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84611.83333333333, ans=0.1 2024-09-14 06:01:53,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=84640.16666666667, ans=0.09899494936611666 2024-09-14 06:02:00,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=84640.16666666667, ans=0.125 2024-09-14 06:02:09,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=84668.5, ans=0.025 2024-09-14 06:02:41,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=84725.16666666667, ans=0.2 2024-09-14 06:02:44,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=84725.16666666667, ans=0.0 2024-09-14 06:02:54,676 INFO [train.py:1198] (1/2) Epoch 5, batch 4350, loss[loss=0.3406, ctc_loss=0.243, cr_loss=0.4882, over 20865.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2262, cr_loss=0.4177, over 4100508.87 frames. ], batch size: 65, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:02:57,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84753.5, ans=0.1 2024-09-14 06:03:05,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-14 06:03:23,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=84810.16666666667, ans=0.125 2024-09-14 06:03:24,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.307e+02 2.722e+02 3.123e+02 5.218e+02, threshold=5.444e+02, percent-clipped=3.0 2024-09-14 06:03:55,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=84866.83333333333, ans=0.2 2024-09-14 06:03:55,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-14 06:04:09,519 INFO [train.py:1198] (1/2) Epoch 5, batch 4400, loss[loss=0.3397, ctc_loss=0.2492, cr_loss=0.4529, over 19352.00 frames. ], tot_loss[loss=0.3112, ctc_loss=0.2274, cr_loss=0.4192, over 4097264.64 frames. ], batch size: 90, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:04:15,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=84895.16666666667, ans=0.07 2024-09-14 06:04:37,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-14 06:04:45,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=84951.83333333333, ans=0.05 2024-09-14 06:05:25,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=85008.5, ans=0.025 2024-09-14 06:05:29,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=85036.83333333333, ans=0.125 2024-09-14 06:05:30,856 INFO [train.py:1198] (1/2) Epoch 5, batch 4450, loss[loss=0.3114, ctc_loss=0.2264, cr_loss=0.4252, over 20947.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2263, cr_loss=0.4172, over 4101433.89 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:05:37,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=85036.83333333333, ans=0.05 2024-09-14 06:06:01,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.162e+02 2.389e+02 2.769e+02 5.314e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-14 06:06:07,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=85093.5, ans=0.2 2024-09-14 06:06:25,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=85121.83333333333, ans=0.125 2024-09-14 06:06:26,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85121.83333333333, ans=0.1 2024-09-14 06:06:31,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=85150.16666666667, ans=0.0 2024-09-14 06:06:34,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=85150.16666666667, ans=0.0 2024-09-14 06:06:35,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85150.16666666667, ans=0.1 2024-09-14 06:06:43,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85150.16666666667, ans=0.0 2024-09-14 06:06:46,369 INFO [train.py:1198] (1/2) Epoch 5, batch 4500, loss[loss=0.2827, ctc_loss=0.2, cr_loss=0.4137, over 20921.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.226, cr_loss=0.4172, over 4102564.26 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:06:54,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=85178.5, ans=0.2 2024-09-14 06:07:23,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-14 06:07:41,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=85263.5, ans=0.125 2024-09-14 06:07:48,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85291.83333333333, ans=0.1 2024-09-14 06:08:01,889 INFO [train.py:1198] (1/2) Epoch 5, batch 4550, loss[loss=0.2542, ctc_loss=0.1826, cr_loss=0.3579, over 20971.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.2248, cr_loss=0.4162, over 4111818.79 frames. ], batch size: 51, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:08:05,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=85320.16666666667, ans=0.125 2024-09-14 06:08:08,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85320.16666666667, ans=0.1 2024-09-14 06:08:18,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85348.5, ans=0.1 2024-09-14 06:08:32,228 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.227e+02 2.442e+02 3.193e+02 6.739e+02, threshold=4.885e+02, percent-clipped=2.0 2024-09-14 06:09:12,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=85433.5, ans=0.07 2024-09-14 06:09:17,097 INFO [train.py:1198] (1/2) Epoch 5, batch 4600, loss[loss=0.275, ctc_loss=0.1966, cr_loss=0.3922, over 20952.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2246, cr_loss=0.4163, over 4115828.11 frames. ], batch size: 49, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:09:34,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=12.0 2024-09-14 06:09:41,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=85490.16666666667, ans=0.0 2024-09-14 06:10:01,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=85546.83333333333, ans=0.0 2024-09-14 06:10:32,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=85575.16666666667, ans=0.125 2024-09-14 06:10:34,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=85575.16666666667, ans=10.0 2024-09-14 06:10:38,395 INFO [train.py:1198] (1/2) Epoch 5, batch 4650, loss[loss=0.343, ctc_loss=0.2612, cr_loss=0.4092, over 18275.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2239, cr_loss=0.4137, over 4104198.89 frames. ], batch size: 108, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:10:49,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85603.5, ans=0.0 2024-09-14 06:11:04,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=85631.83333333333, ans=0.2 2024-09-14 06:11:08,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.172e+02 2.419e+02 2.724e+02 6.227e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-14 06:11:23,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=85688.5, ans=0.2 2024-09-14 06:11:37,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=85716.83333333333, ans=0.2 2024-09-14 06:11:51,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85716.83333333333, ans=0.125 2024-09-14 06:11:54,004 INFO [train.py:1198] (1/2) Epoch 5, batch 4700, loss[loss=0.2962, ctc_loss=0.2142, cr_loss=0.4098, over 20986.00 frames. ], tot_loss[loss=0.308, ctc_loss=0.225, cr_loss=0.415, over 4099359.94 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:12:37,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=85830.16666666667, ans=10.0 2024-09-14 06:12:45,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=85830.16666666667, ans=0.025 2024-09-14 06:12:45,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2024-09-14 06:13:08,884 INFO [train.py:1198] (1/2) Epoch 5, batch 4750, loss[loss=0.3166, ctc_loss=0.2273, cr_loss=0.4462, over 21067.00 frames. ], tot_loss[loss=0.3091, ctc_loss=0.2259, cr_loss=0.4163, over 4096714.54 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:13:38,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.258e+02 2.644e+02 2.905e+02 5.447e+02, threshold=5.288e+02, percent-clipped=1.0 2024-09-14 06:14:23,314 INFO [train.py:1198] (1/2) Epoch 5, batch 4800, loss[loss=0.2925, ctc_loss=0.2129, cr_loss=0.3979, over 20961.00 frames. ], tot_loss[loss=0.3093, ctc_loss=0.226, cr_loss=0.4163, over 4091926.10 frames. ], batch size: 50, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:14:31,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=86028.5, ans=0.0 2024-09-14 06:14:50,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=86056.83333333333, ans=0.0 2024-09-14 06:14:54,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-14 06:14:56,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.98 vs. limit=5.0 2024-09-14 06:15:14,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86113.5, ans=0.1 2024-09-14 06:15:24,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-14 06:15:35,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-09-14 06:15:39,069 INFO [train.py:1198] (1/2) Epoch 5, batch 4850, loss[loss=0.2674, ctc_loss=0.194, cr_loss=0.3674, over 20961.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2266, cr_loss=0.4164, over 4086844.26 frames. ], batch size: 50, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:15:49,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2024-09-14 06:16:10,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=86226.83333333333, ans=0.05 2024-09-14 06:16:12,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.150e+02 2.348e+02 2.725e+02 4.388e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-14 06:16:23,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=22.5 2024-09-14 06:16:36,660 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:16:57,467 INFO [train.py:1198] (1/2) Epoch 5, batch 4900, loss[loss=0.2518, ctc_loss=0.1794, cr_loss=0.3617, over 20973.00 frames. ], tot_loss[loss=0.3114, ctc_loss=0.2279, cr_loss=0.4175, over 4083912.23 frames. ], batch size: 51, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:17:03,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-09-14 06:17:26,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86368.5, ans=0.1 2024-09-14 06:17:28,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=86368.5, ans=0.125 2024-09-14 06:17:29,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86368.5, ans=0.1 2024-09-14 06:17:40,265 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:18:12,478 INFO [train.py:1198] (1/2) Epoch 5, batch 4950, loss[loss=0.3335, ctc_loss=0.2412, cr_loss=0.4618, over 20671.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.2281, cr_loss=0.4181, over 4076949.61 frames. ], batch size: 68, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:18:17,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86453.5, ans=0.125 2024-09-14 06:18:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86453.5, ans=0.1 2024-09-14 06:18:42,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.235e+02 2.425e+02 2.804e+02 5.355e+02, threshold=4.850e+02, percent-clipped=1.0 2024-09-14 06:19:26,480 INFO [train.py:1198] (1/2) Epoch 5, batch 5000, loss[loss=0.294, ctc_loss=0.2136, cr_loss=0.4019, over 21075.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.2267, cr_loss=0.4161, over 4084889.29 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:19:47,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=22.5 2024-09-14 06:20:32,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=86708.5, ans=22.5 2024-09-14 06:20:40,415 INFO [train.py:1198] (1/2) Epoch 5, batch 5050, loss[loss=0.3125, ctc_loss=0.2283, cr_loss=0.421, over 20981.00 frames. ], tot_loss[loss=0.3113, ctc_loss=0.2276, cr_loss=0.4182, over 4080167.07 frames. ], batch size: 64, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:21:07,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=86765.16666666667, ans=0.0 2024-09-14 06:21:10,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.212e+02 2.472e+02 3.120e+02 5.773e+02, threshold=4.944e+02, percent-clipped=3.0 2024-09-14 06:21:55,368 INFO [train.py:1198] (1/2) Epoch 5, batch 5100, loss[loss=0.2887, ctc_loss=0.2082, cr_loss=0.4026, over 21058.00 frames. ], tot_loss[loss=0.31, ctc_loss=0.2267, cr_loss=0.4164, over 4080435.97 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:22:18,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=86906.83333333333, ans=0.2 2024-09-14 06:22:19,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=86906.83333333333, ans=0.125 2024-09-14 06:22:40,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=86963.5, ans=0.125 2024-09-14 06:22:45,304 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:23:02,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=86991.83333333333, ans=0.2 2024-09-14 06:23:10,038 INFO [train.py:1198] (1/2) Epoch 5, batch 5150, loss[loss=0.294, ctc_loss=0.2156, cr_loss=0.3922, over 21064.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.226, cr_loss=0.4166, over 4091577.18 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:23:27,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=87048.5, ans=0.125 2024-09-14 06:23:39,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.173e+02 2.609e+02 3.097e+02 4.855e+02, threshold=5.218e+02, percent-clipped=0.0 2024-09-14 06:23:55,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=87105.16666666667, ans=0.05 2024-09-14 06:24:09,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-14 06:24:25,801 INFO [train.py:1198] (1/2) Epoch 5, batch 5200, loss[loss=0.2722, ctc_loss=0.1937, cr_loss=0.3928, over 21009.00 frames. ], tot_loss[loss=0.3089, ctc_loss=0.2257, cr_loss=0.4159, over 4092372.55 frames. ], batch size: 52, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:24:47,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=87190.16666666667, ans=0.025 2024-09-14 06:24:58,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=87218.5, ans=0.125 2024-09-14 06:25:02,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=87218.5, ans=0.125 2024-09-14 06:25:13,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87246.83333333333, ans=0.1 2024-09-14 06:25:42,441 INFO [train.py:1198] (1/2) Epoch 5, batch 5250, loss[loss=0.2769, ctc_loss=0.1985, cr_loss=0.392, over 21053.00 frames. ], tot_loss[loss=0.3085, ctc_loss=0.2254, cr_loss=0.4154, over 4091999.50 frames. ], batch size: 53, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:25:51,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=87303.5, ans=0.025 2024-09-14 06:25:56,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=87331.83333333333, ans=0.125 2024-09-14 06:26:11,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.240e+02 2.492e+02 2.833e+02 8.780e+02, threshold=4.984e+02, percent-clipped=1.0 2024-09-14 06:26:29,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=87388.5, ans=0.125 2024-09-14 06:26:47,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=87416.83333333333, ans=0.2 2024-09-14 06:26:50,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87416.83333333333, ans=0.125 2024-09-14 06:26:55,477 INFO [train.py:1198] (1/2) Epoch 5, batch 5300, loss[loss=0.3011, ctc_loss=0.2135, cr_loss=0.4383, over 20965.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2248, cr_loss=0.415, over 4092082.35 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:27:56,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=87558.5, ans=0.125 2024-09-14 06:28:04,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=87558.5, ans=0.07 2024-09-14 06:28:07,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=87558.5, ans=0.125 2024-09-14 06:28:09,881 INFO [train.py:1198] (1/2) Epoch 5, batch 5350, loss[loss=0.3414, ctc_loss=0.2515, cr_loss=0.4496, over 21032.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2246, cr_loss=0.4151, over 4090176.74 frames. ], batch size: 63, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:28:20,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=87586.83333333333, ans=0.05 2024-09-14 06:28:33,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87615.16666666667, ans=0.1 2024-09-14 06:28:39,468 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.198e+02 2.381e+02 2.732e+02 3.900e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-14 06:28:45,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=87643.5, ans=0.125 2024-09-14 06:28:53,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=87671.83333333333, ans=0.04949747468305833 2024-09-14 06:28:56,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-09-14 06:29:08,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=87700.16666666667, ans=0.0 2024-09-14 06:29:24,283 INFO [train.py:1198] (1/2) Epoch 5, batch 5400, loss[loss=0.305, ctc_loss=0.216, cr_loss=0.4449, over 21004.00 frames. ], tot_loss[loss=0.3073, ctc_loss=0.2241, cr_loss=0.416, over 4104465.21 frames. ], batch size: 63, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:29:36,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=87728.5, ans=0.125 2024-09-14 06:29:47,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=87756.83333333333, ans=0.125 2024-09-14 06:29:51,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87756.83333333333, ans=0.1 2024-09-14 06:30:07,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=87813.5, ans=0.0 2024-09-14 06:30:12,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-14 06:30:20,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=87813.5, ans=0.0 2024-09-14 06:30:38,565 INFO [train.py:1198] (1/2) Epoch 5, batch 5450, loss[loss=0.291, ctc_loss=0.2148, cr_loss=0.381, over 20949.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2244, cr_loss=0.4161, over 4099364.31 frames. ], batch size: 58, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:30:39,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-09-14 06:30:49,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=87870.16666666667, ans=0.0 2024-09-14 06:31:04,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=87898.5, ans=0.125 2024-09-14 06:31:08,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.336e+02 2.698e+02 3.124e+02 5.401e+02, threshold=5.396e+02, percent-clipped=3.0 2024-09-14 06:31:32,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-14 06:31:59,278 INFO [train.py:1198] (1/2) Epoch 5, batch 5500, loss[loss=0.2632, ctc_loss=0.1905, cr_loss=0.3635, over 20923.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2241, cr_loss=0.415, over 4089842.68 frames. ], batch size: 50, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:32:36,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=88068.5, ans=0.125 2024-09-14 06:33:13,313 INFO [train.py:1198] (1/2) Epoch 5, batch 5550, loss[loss=0.2804, ctc_loss=0.2016, cr_loss=0.3937, over 19998.00 frames. ], tot_loss[loss=0.3064, ctc_loss=0.2235, cr_loss=0.4145, over 4086783.91 frames. ], batch size: 44, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:33:45,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.172e+02 2.399e+02 2.777e+02 7.183e+02, threshold=4.797e+02, percent-clipped=1.0 2024-09-14 06:34:06,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-14 06:34:12,015 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:34:26,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=88266.83333333333, ans=0.125 2024-09-14 06:34:32,453 INFO [train.py:1198] (1/2) Epoch 5, batch 5600, loss[loss=0.2749, ctc_loss=0.2, cr_loss=0.375, over 20947.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2236, cr_loss=0.4152, over 4085700.37 frames. ], batch size: 49, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:34:32,928 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:34:33,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-14 06:34:50,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=88323.5, ans=0.125 2024-09-14 06:35:22,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88380.16666666667, ans=0.1 2024-09-14 06:35:43,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=12.0 2024-09-14 06:35:45,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=88436.83333333333, ans=0.125 2024-09-14 06:35:46,071 INFO [train.py:1198] (1/2) Epoch 5, batch 5650, loss[loss=0.2969, ctc_loss=0.2135, cr_loss=0.4171, over 21005.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.2238, cr_loss=0.4155, over 4074003.19 frames. ], batch size: 52, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:35:55,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=88436.83333333333, ans=0.125 2024-09-14 06:36:15,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.310e+02 2.734e+02 3.176e+02 4.614e+02, threshold=5.469e+02, percent-clipped=0.0 2024-09-14 06:36:32,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88521.83333333333, ans=0.125 2024-09-14 06:36:34,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=88521.83333333333, ans=0.2 2024-09-14 06:36:36,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88521.83333333333, ans=0.125 2024-09-14 06:36:39,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=88521.83333333333, ans=0.2 2024-09-14 06:37:00,095 INFO [train.py:1198] (1/2) Epoch 5, batch 5700, loss[loss=0.376, ctc_loss=0.2819, cr_loss=0.4706, over 18366.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2237, cr_loss=0.4144, over 4078020.22 frames. ], batch size: 108, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:37:09,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=88578.5, ans=15.0 2024-09-14 06:37:22,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=88606.83333333333, ans=0.0 2024-09-14 06:37:23,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88606.83333333333, ans=0.1 2024-09-14 06:37:38,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=88635.16666666667, ans=0.125 2024-09-14 06:37:55,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=88663.5, ans=0.125 2024-09-14 06:38:13,769 INFO [train.py:1198] (1/2) Epoch 5, batch 5750, loss[loss=0.318, ctc_loss=0.2288, cr_loss=0.446, over 20848.00 frames. ], tot_loss[loss=0.3074, ctc_loss=0.2242, cr_loss=0.4157, over 4080088.69 frames. ], batch size: 65, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:38:43,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.222e+02 2.411e+02 2.798e+02 5.182e+02, threshold=4.821e+02, percent-clipped=0.0 2024-09-14 06:39:04,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=88805.16666666667, ans=0.125 2024-09-14 06:39:27,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=88861.83333333333, ans=0.125 2024-09-14 06:39:28,269 INFO [train.py:1198] (1/2) Epoch 5, batch 5800, loss[loss=0.3429, ctc_loss=0.252, cr_loss=0.4543, over 21034.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2241, cr_loss=0.4153, over 4087649.39 frames. ], batch size: 61, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:39:51,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=88890.16666666667, ans=0.0 2024-09-14 06:40:28,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-14 06:40:33,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.40 vs. limit=10.0 2024-09-14 06:40:40,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=88975.16666666667, ans=0.025 2024-09-14 06:40:42,760 INFO [train.py:1198] (1/2) Epoch 5, batch 5850, loss[loss=0.239, ctc_loss=0.1686, cr_loss=0.3522, over 20986.00 frames. ], tot_loss[loss=0.306, ctc_loss=0.2232, cr_loss=0.4139, over 4084669.09 frames. ], batch size: 52, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:41:06,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89031.83333333333, ans=0.1 2024-09-14 06:41:12,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.198e+02 2.446e+02 2.758e+02 4.223e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-14 06:41:29,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=89088.5, ans=0.035 2024-09-14 06:41:57,344 INFO [train.py:1198] (1/2) Epoch 5, batch 5900, loss[loss=0.3329, ctc_loss=0.2473, cr_loss=0.4281, over 20725.00 frames. ], tot_loss[loss=0.3068, ctc_loss=0.2239, cr_loss=0.4144, over 4082014.15 frames. ], batch size: 71, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:42:09,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=89145.16666666667, ans=0.025 2024-09-14 06:42:18,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-14 06:43:14,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-14 06:43:16,224 INFO [train.py:1198] (1/2) Epoch 5, batch 5950, loss[loss=0.3082, ctc_loss=0.2272, cr_loss=0.4049, over 21020.00 frames. ], tot_loss[loss=0.3068, ctc_loss=0.224, cr_loss=0.4141, over 4070083.31 frames. ], batch size: 63, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:43:20,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89286.83333333333, ans=0.0 2024-09-14 06:43:43,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=89343.5, ans=0.0 2024-09-14 06:43:45,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.145e+02 2.332e+02 2.572e+02 3.096e+02, threshold=4.663e+02, percent-clipped=0.0 2024-09-14 06:43:58,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=89371.83333333333, ans=0.05 2024-09-14 06:44:26,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=89400.16666666667, ans=0.2 2024-09-14 06:44:28,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=89428.5, ans=0.125 2024-09-14 06:44:29,462 INFO [train.py:1198] (1/2) Epoch 5, batch 6000, loss[loss=0.2753, ctc_loss=0.1956, cr_loss=0.3984, over 20963.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.224, cr_loss=0.4148, over 4075067.11 frames. ], batch size: 52, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:44:29,462 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 06:44:49,647 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.06916, ctc_loss=0.06916, cr_loss=9.296e-15, over 944034.00 frames. 2024-09-14 06:44:49,648 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 06:45:21,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-14 06:45:51,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=89541.83333333333, ans=0.125 2024-09-14 06:45:55,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89541.83333333333, ans=0.1 2024-09-14 06:46:02,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-09-14 06:46:03,637 INFO [train.py:1198] (1/2) Epoch 5, batch 6050, loss[loss=0.3207, ctc_loss=0.2341, cr_loss=0.433, over 21074.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2246, cr_loss=0.4161, over 4079957.75 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:46:22,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89598.5, ans=0.125 2024-09-14 06:46:34,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.140e+02 2.421e+02 2.825e+02 4.309e+02, threshold=4.843e+02, percent-clipped=0.0 2024-09-14 06:46:49,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:46:56,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-14 06:47:07,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89683.5, ans=0.1 2024-09-14 06:47:13,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=89683.5, ans=0.025 2024-09-14 06:47:14,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-09-14 06:47:19,452 INFO [train.py:1198] (1/2) Epoch 5, batch 6100, loss[loss=0.3234, ctc_loss=0.2343, cr_loss=0.4453, over 20862.00 frames. ], tot_loss[loss=0.3068, ctc_loss=0.2236, cr_loss=0.4161, over 4088545.35 frames. ], batch size: 65, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:47:22,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=89711.83333333333, ans=0.0 2024-09-14 06:47:24,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=89711.83333333333, ans=0.5 2024-09-14 06:47:28,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89711.83333333333, ans=0.125 2024-09-14 06:48:00,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=89768.5, ans=0.025 2024-09-14 06:48:07,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=89796.83333333333, ans=0.0 2024-09-14 06:48:11,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=89796.83333333333, ans=0.2 2024-09-14 06:48:33,627 INFO [train.py:1198] (1/2) Epoch 5, batch 6150, loss[loss=0.2705, ctc_loss=0.1952, cr_loss=0.3767, over 19926.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2235, cr_loss=0.4162, over 4094755.36 frames. ], batch size: 44, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:48:45,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.87 vs. limit=22.5 2024-09-14 06:49:02,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.185e+02 2.420e+02 2.689e+02 5.052e+02, threshold=4.840e+02, percent-clipped=1.0 2024-09-14 06:49:26,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=89938.5, ans=0.0 2024-09-14 06:49:46,940 INFO [train.py:1198] (1/2) Epoch 5, batch 6200, loss[loss=0.3246, ctc_loss=0.2371, cr_loss=0.4376, over 20997.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2253, cr_loss=0.4168, over 4071203.23 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:50:00,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=90023.5, ans=0.025 2024-09-14 06:50:21,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=90051.83333333333, ans=0.0 2024-09-14 06:50:56,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=90108.5, ans=0.125 2024-09-14 06:51:02,367 INFO [train.py:1198] (1/2) Epoch 5, batch 6250, loss[loss=0.3554, ctc_loss=0.2791, cr_loss=0.3814, over 14254.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2244, cr_loss=0.416, over 4073832.27 frames. ], batch size: 150, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:51:31,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.186e+02 2.467e+02 2.736e+02 5.742e+02, threshold=4.934e+02, percent-clipped=1.0 2024-09-14 06:51:40,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=90193.5, ans=0.2 2024-09-14 06:52:01,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90250.16666666667, ans=0.125 2024-09-14 06:52:10,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=90250.16666666667, ans=0.04949747468305833 2024-09-14 06:52:14,314 INFO [train.py:1198] (1/2) Epoch 5, batch 6300, loss[loss=0.3234, ctc_loss=0.2347, cr_loss=0.4434, over 21038.00 frames. ], tot_loss[loss=0.3131, ctc_loss=0.2295, cr_loss=0.4184, over 3980751.33 frames. ], batch size: 62, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:52:36,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.50 vs. limit=10.0 2024-09-14 06:52:40,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=90306.83333333333, ans=0.025 2024-09-14 06:52:55,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90335.16666666667, ans=0.125 2024-09-14 06:53:15,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=90391.83333333333, ans=0.2 2024-09-14 06:53:20,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.49 vs. limit=12.0 2024-09-14 06:53:26,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=90420.16666666667, ans=0.025 2024-09-14 06:53:27,422 INFO [train.py:1198] (1/2) Epoch 5, batch 6350, loss[loss=0.3752, ctc_loss=0.2888, cr_loss=0.4321, over 14607.00 frames. ], tot_loss[loss=0.3217, ctc_loss=0.2375, cr_loss=0.4213, over 3819655.98 frames. ], batch size: 151, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:53:44,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=90448.5, ans=0.0 2024-09-14 06:53:55,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.453e+02 2.732e+02 3.097e+02 5.994e+02, threshold=5.463e+02, percent-clipped=2.0 2024-09-14 06:54:05,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=90476.83333333333, ans=0.0 2024-09-14 06:54:06,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=90476.83333333333, ans=0.5 2024-09-14 06:54:10,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90505.16666666667, ans=0.1 2024-09-14 06:55:10,264 INFO [train.py:1198] (1/2) Epoch 6, batch 0, loss[loss=0.347, ctc_loss=0.2574, cr_loss=0.4478, over 20832.00 frames. ], tot_loss[loss=0.347, ctc_loss=0.2574, cr_loss=0.4478, over 20832.00 frames. ], batch size: 65, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:55:10,264 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 06:55:29,609 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.07128, ctc_loss=0.07128, cr_loss=9.659e-15, over 944034.00 frames. 2024-09-14 06:55:29,610 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 06:55:34,644 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:56:30,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=90649.66666666667, ans=0.2 2024-09-14 06:56:52,293 INFO [train.py:1198] (1/2) Epoch 6, batch 50, loss[loss=0.273, ctc_loss=0.1968, cr_loss=0.3811, over 21060.00 frames. ], tot_loss[loss=0.3131, ctc_loss=0.229, cr_loss=0.4202, over 923632.89 frames. ], batch size: 59, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:56:55,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=90678.0, ans=0.125 2024-09-14 06:57:11,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=90706.33333333333, ans=0.2 2024-09-14 06:57:35,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.192e+02 2.577e+02 3.091e+02 5.422e+02, threshold=5.154e+02, percent-clipped=0.0 2024-09-14 06:58:07,869 INFO [train.py:1198] (1/2) Epoch 6, batch 100, loss[loss=0.3156, ctc_loss=0.2242, cr_loss=0.457, over 21079.00 frames. ], tot_loss[loss=0.309, ctc_loss=0.2254, cr_loss=0.4181, over 1620387.74 frames. ], batch size: 59, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:59:07,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=90933.0, ans=0.125 2024-09-14 06:59:23,671 INFO [train.py:1198] (1/2) Epoch 6, batch 150, loss[loss=0.3107, ctc_loss=0.2226, cr_loss=0.4402, over 20962.00 frames. ], tot_loss[loss=0.3056, ctc_loss=0.2225, cr_loss=0.4155, over 2175111.72 frames. ], batch size: 55, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 06:59:33,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=90961.33333333333, ans=0.125 2024-09-14 07:00:07,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.128e+02 2.301e+02 2.592e+02 4.019e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-14 07:00:08,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=91046.33333333333, ans=0.125 2024-09-14 07:00:20,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=91046.33333333333, ans=0.0 2024-09-14 07:00:39,212 INFO [train.py:1198] (1/2) Epoch 6, batch 200, loss[loss=0.259, ctc_loss=0.1899, cr_loss=0.3454, over 20997.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2216, cr_loss=0.4155, over 2606692.63 frames. ], batch size: 49, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:00:41,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=91103.0, ans=0.125 2024-09-14 07:00:42,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=91103.0, ans=0.2 2024-09-14 07:00:58,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2024-09-14 07:01:19,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=91159.66666666667, ans=0.125 2024-09-14 07:01:21,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=91159.66666666667, ans=0.0 2024-09-14 07:01:21,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2024-09-14 07:01:31,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=91188.0, ans=0.02 2024-09-14 07:01:53,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=91244.66666666667, ans=0.2 2024-09-14 07:01:54,191 INFO [train.py:1198] (1/2) Epoch 6, batch 250, loss[loss=0.3366, ctc_loss=0.2432, cr_loss=0.4672, over 21064.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2215, cr_loss=0.4158, over 2943514.22 frames. ], batch size: 59, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:02:11,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=91273.0, ans=0.125 2024-09-14 07:02:18,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=91273.0, ans=0.2 2024-09-14 07:02:45,209 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.162e+02 2.423e+02 2.706e+02 4.760e+02, threshold=4.847e+02, percent-clipped=1.0 2024-09-14 07:02:50,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=91329.66666666667, ans=0.2 2024-09-14 07:02:53,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91329.66666666667, ans=0.1 2024-09-14 07:02:57,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=91329.66666666667, ans=0.5 2024-09-14 07:02:59,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=91329.66666666667, ans=0.0 2024-09-14 07:03:16,848 INFO [train.py:1198] (1/2) Epoch 6, batch 300, loss[loss=0.2883, ctc_loss=0.2068, cr_loss=0.4076, over 19931.00 frames. ], tot_loss[loss=0.3035, ctc_loss=0.2206, cr_loss=0.4144, over 3195776.58 frames. ], batch size: 44, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:03:29,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=91386.33333333333, ans=0.5 2024-09-14 07:04:00,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-14 07:04:23,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-14 07:04:32,417 INFO [train.py:1198] (1/2) Epoch 6, batch 350, loss[loss=0.3087, ctc_loss=0.2234, cr_loss=0.4266, over 20869.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2216, cr_loss=0.4158, over 3390269.54 frames. ], batch size: 57, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:04:34,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=91528.0, ans=0.125 2024-09-14 07:05:00,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=91556.33333333333, ans=0.125 2024-09-14 07:05:03,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-14 07:05:16,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.322e+02 2.660e+02 3.181e+02 4.521e+02, threshold=5.320e+02, percent-clipped=0.0 2024-09-14 07:05:38,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=91641.33333333333, ans=0.125 2024-09-14 07:05:48,391 INFO [train.py:1198] (1/2) Epoch 6, batch 400, loss[loss=0.3092, ctc_loss=0.2255, cr_loss=0.4185, over 19542.00 frames. ], tot_loss[loss=0.3056, ctc_loss=0.2223, cr_loss=0.4165, over 3544899.17 frames. ], batch size: 90, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:05:50,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=91669.66666666667, ans=0.025 2024-09-14 07:05:59,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91669.66666666667, ans=0.125 2024-09-14 07:06:06,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-14 07:06:17,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=91726.33333333333, ans=10.0 2024-09-14 07:06:37,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91754.66666666667, ans=0.125 2024-09-14 07:06:43,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=91754.66666666667, ans=0.125 2024-09-14 07:07:03,014 INFO [train.py:1198] (1/2) Epoch 6, batch 450, loss[loss=0.3334, ctc_loss=0.2421, cr_loss=0.4569, over 20637.00 frames. ], tot_loss[loss=0.3054, ctc_loss=0.2222, cr_loss=0.4158, over 3666756.03 frames. ], batch size: 71, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:07:30,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91839.66666666667, ans=0.1 2024-09-14 07:07:46,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.092e+02 2.348e+02 2.671e+02 3.553e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-14 07:08:23,977 INFO [train.py:1198] (1/2) Epoch 6, batch 500, loss[loss=0.2513, ctc_loss=0.1767, cr_loss=0.3728, over 20931.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2233, cr_loss=0.417, over 3744559.73 frames. ], batch size: 48, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:08:26,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=91953.0, ans=15.0 2024-09-14 07:09:04,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92009.66666666667, ans=0.125 2024-09-14 07:09:33,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=92066.33333333333, ans=0.125 2024-09-14 07:09:39,079 INFO [train.py:1198] (1/2) Epoch 6, batch 550, loss[loss=0.2464, ctc_loss=0.1778, cr_loss=0.3433, over 20977.00 frames. ], tot_loss[loss=0.3075, ctc_loss=0.224, cr_loss=0.4175, over 3810456.86 frames. ], batch size: 49, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:09:43,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=92094.66666666667, ans=0.2 2024-09-14 07:09:51,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-14 07:10:24,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.659e+02 2.206e+02 2.463e+02 2.967e+02 4.675e+02, threshold=4.927e+02, percent-clipped=0.0 2024-09-14 07:10:48,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92208.0, ans=0.125 2024-09-14 07:10:54,442 INFO [train.py:1198] (1/2) Epoch 6, batch 600, loss[loss=0.3139, ctc_loss=0.2326, cr_loss=0.4069, over 20289.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.2235, cr_loss=0.4169, over 3871833.66 frames. ], batch size: 74, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:11:08,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=92264.66666666667, ans=0.125 2024-09-14 07:11:18,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92264.66666666667, ans=0.2 2024-09-14 07:11:20,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=92264.66666666667, ans=0.125 2024-09-14 07:11:23,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=92293.0, ans=0.2 2024-09-14 07:11:26,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=92293.0, ans=0.07 2024-09-14 07:11:32,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2024-09-14 07:12:02,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92349.66666666667, ans=0.125 2024-09-14 07:12:02,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=92349.66666666667, ans=0.2 2024-09-14 07:12:09,874 INFO [train.py:1198] (1/2) Epoch 6, batch 650, loss[loss=0.3478, ctc_loss=0.2582, cr_loss=0.448, over 20958.00 frames. ], tot_loss[loss=0.3058, ctc_loss=0.2228, cr_loss=0.4152, over 3906906.41 frames. ], batch size: 64, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:12:13,162 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:12:32,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=92406.33333333333, ans=0.0 2024-09-14 07:12:54,797 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.230e+02 2.474e+02 3.000e+02 5.178e+02, threshold=4.949e+02, percent-clipped=1.0 2024-09-14 07:13:24,600 INFO [train.py:1198] (1/2) Epoch 6, batch 700, loss[loss=0.2848, ctc_loss=0.206, cr_loss=0.3941, over 21099.00 frames. ], tot_loss[loss=0.3055, ctc_loss=0.2224, cr_loss=0.4155, over 3956070.53 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:13:38,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=92548.0, ans=10.0 2024-09-14 07:14:21,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=92604.66666666667, ans=0.05 2024-09-14 07:14:25,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92604.66666666667, ans=0.125 2024-09-14 07:14:46,084 INFO [train.py:1198] (1/2) Epoch 6, batch 750, loss[loss=0.2718, ctc_loss=0.1941, cr_loss=0.3883, over 20976.00 frames. ], tot_loss[loss=0.3039, ctc_loss=0.221, cr_loss=0.4147, over 3993667.35 frames. ], batch size: 52, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:14:52,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=92661.33333333333, ans=0.125 2024-09-14 07:15:31,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.110e+02 2.298e+02 2.553e+02 4.101e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-14 07:16:01,317 INFO [train.py:1198] (1/2) Epoch 6, batch 800, loss[loss=0.2736, ctc_loss=0.1965, cr_loss=0.3856, over 20963.00 frames. ], tot_loss[loss=0.3043, ctc_loss=0.2215, cr_loss=0.4143, over 4000702.46 frames. ], batch size: 48, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:16:25,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92831.33333333333, ans=0.0 2024-09-14 07:16:31,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=92859.66666666667, ans=0.2 2024-09-14 07:16:41,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-14 07:17:01,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92916.33333333333, ans=0.125 2024-09-14 07:17:15,999 INFO [train.py:1198] (1/2) Epoch 6, batch 850, loss[loss=0.3323, ctc_loss=0.2456, cr_loss=0.4338, over 20016.00 frames. ], tot_loss[loss=0.3045, ctc_loss=0.2215, cr_loss=0.415, over 4011888.83 frames. ], batch size: 80, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:17:37,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92973.0, ans=0.1 2024-09-14 07:17:48,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=93001.33333333333, ans=0.125 2024-09-14 07:18:01,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.159e+02 2.447e+02 2.716e+02 5.128e+02, threshold=4.894e+02, percent-clipped=1.0 2024-09-14 07:18:12,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=93029.66666666667, ans=0.125 2024-09-14 07:18:30,996 INFO [train.py:1198] (1/2) Epoch 6, batch 900, loss[loss=0.2798, ctc_loss=0.1992, cr_loss=0.403, over 20777.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2215, cr_loss=0.4146, over 4028841.89 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:18:31,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=93086.33333333333, ans=12.0 2024-09-14 07:18:35,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93086.33333333333, ans=0.0 2024-09-14 07:19:33,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=93171.33333333333, ans=0.125 2024-09-14 07:19:39,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2024-09-14 07:19:51,857 INFO [train.py:1198] (1/2) Epoch 6, batch 950, loss[loss=0.3299, ctc_loss=0.2395, cr_loss=0.4521, over 20836.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2218, cr_loss=0.4146, over 4038181.44 frames. ], batch size: 65, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:19:58,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2024-09-14 07:20:05,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93256.33333333333, ans=0.125 2024-09-14 07:20:07,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-14 07:20:25,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93284.66666666667, ans=0.125 2024-09-14 07:20:37,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.452e+02 2.996e+02 3.444e+02 5.252e+02, threshold=5.991e+02, percent-clipped=4.0 2024-09-14 07:20:40,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-14 07:20:49,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93313.0, ans=0.1 2024-09-14 07:20:54,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=93341.33333333333, ans=0.125 2024-09-14 07:21:07,337 INFO [train.py:1198] (1/2) Epoch 6, batch 1000, loss[loss=0.302, ctc_loss=0.2181, cr_loss=0.4192, over 20782.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2213, cr_loss=0.4152, over 4063204.49 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:21:15,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=93369.66666666667, ans=0.0 2024-09-14 07:21:43,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=93426.33333333333, ans=0.0 2024-09-14 07:22:16,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=93483.0, ans=0.025 2024-09-14 07:22:20,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-09-14 07:22:22,009 INFO [train.py:1198] (1/2) Epoch 6, batch 1050, loss[loss=0.2491, ctc_loss=0.1828, cr_loss=0.3314, over 20988.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2213, cr_loss=0.4151, over 4072157.91 frames. ], batch size: 49, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:22:23,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-09-14 07:22:28,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93511.33333333333, ans=0.0 2024-09-14 07:22:47,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=93539.66666666667, ans=0.125 2024-09-14 07:22:48,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=93539.66666666667, ans=0.125 2024-09-14 07:23:07,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.152e+02 2.467e+02 2.932e+02 4.425e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 07:23:37,061 INFO [train.py:1198] (1/2) Epoch 6, batch 1100, loss[loss=0.325, ctc_loss=0.2337, cr_loss=0.4568, over 20655.00 frames. ], tot_loss[loss=0.3036, ctc_loss=0.2207, cr_loss=0.4149, over 4082375.66 frames. ], batch size: 71, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:23:56,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-09-14 07:24:11,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2024-09-14 07:24:20,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=93709.66666666667, ans=0.025 2024-09-14 07:24:52,505 INFO [train.py:1198] (1/2) Epoch 6, batch 1150, loss[loss=0.2756, ctc_loss=0.195, cr_loss=0.4033, over 21000.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2197, cr_loss=0.4135, over 4078684.59 frames. ], batch size: 52, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:25:00,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=93794.66666666667, ans=0.0 2024-09-14 07:25:09,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93823.0, ans=0.1 2024-09-14 07:25:09,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93823.0, ans=0.1 2024-09-14 07:25:14,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.51 vs. limit=10.0 2024-09-14 07:25:20,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-14 07:25:43,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.333e+02 2.657e+02 3.154e+02 5.433e+02, threshold=5.314e+02, percent-clipped=1.0 2024-09-14 07:26:14,090 INFO [train.py:1198] (1/2) Epoch 6, batch 1200, loss[loss=0.2598, ctc_loss=0.1846, cr_loss=0.3758, over 20982.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2193, cr_loss=0.4125, over 4074293.85 frames. ], batch size: 48, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:26:40,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=93964.66666666667, ans=0.0 2024-09-14 07:27:13,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94049.66666666667, ans=0.125 2024-09-14 07:27:26,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=94049.66666666667, ans=0.125 2024-09-14 07:27:29,642 INFO [train.py:1198] (1/2) Epoch 6, batch 1250, loss[loss=0.3023, ctc_loss=0.2172, cr_loss=0.4257, over 20922.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.2183, cr_loss=0.4122, over 4085798.47 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:27:50,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-14 07:27:53,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94106.33333333333, ans=0.1 2024-09-14 07:28:14,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.128e+02 2.280e+02 2.536e+02 3.412e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 07:28:35,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=94191.33333333333, ans=0.0 2024-09-14 07:28:35,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=12.0 2024-09-14 07:28:36,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=94191.33333333333, ans=0.125 2024-09-14 07:28:43,932 INFO [train.py:1198] (1/2) Epoch 6, batch 1300, loss[loss=0.2618, ctc_loss=0.1841, cr_loss=0.3884, over 20886.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2191, cr_loss=0.4135, over 4093391.54 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:28:45,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=94219.66666666667, ans=0.07 2024-09-14 07:28:48,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=94219.66666666667, ans=0.125 2024-09-14 07:29:33,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=94304.66666666667, ans=0.125 2024-09-14 07:29:42,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-09-14 07:29:58,826 INFO [train.py:1198] (1/2) Epoch 6, batch 1350, loss[loss=0.3334, ctc_loss=0.2463, cr_loss=0.4357, over 21018.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.2196, cr_loss=0.4145, over 4090819.97 frames. ], batch size: 61, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:30:11,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94361.33333333333, ans=0.1 2024-09-14 07:30:15,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=94389.66666666667, ans=0.125 2024-09-14 07:30:37,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2024-09-14 07:30:47,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.106e+02 2.252e+02 2.526e+02 4.986e+02, threshold=4.503e+02, percent-clipped=1.0 2024-09-14 07:30:48,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=94446.33333333333, ans=0.05 2024-09-14 07:31:12,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=94474.66666666667, ans=0.125 2024-09-14 07:31:20,094 INFO [train.py:1198] (1/2) Epoch 6, batch 1400, loss[loss=0.31, ctc_loss=0.222, cr_loss=0.4399, over 20657.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.2196, cr_loss=0.4146, over 4090708.27 frames. ], batch size: 66, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:31:55,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-09-14 07:32:12,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=94588.0, ans=0.0 2024-09-14 07:32:35,383 INFO [train.py:1198] (1/2) Epoch 6, batch 1450, loss[loss=0.259, ctc_loss=0.1875, cr_loss=0.3577, over 20922.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2192, cr_loss=0.4139, over 4098729.01 frames. ], batch size: 50, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:33:08,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94701.33333333333, ans=0.125 2024-09-14 07:33:16,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=94701.33333333333, ans=0.05 2024-09-14 07:33:16,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=94701.33333333333, ans=0.125 2024-09-14 07:33:20,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.216e+02 2.482e+02 2.897e+02 5.742e+02, threshold=4.965e+02, percent-clipped=1.0 2024-09-14 07:33:50,309 INFO [train.py:1198] (1/2) Epoch 6, batch 1500, loss[loss=0.3041, ctc_loss=0.2216, cr_loss=0.4123, over 20699.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2195, cr_loss=0.4142, over 4096157.67 frames. ], batch size: 71, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:34:07,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=94814.66666666667, ans=0.025 2024-09-14 07:34:13,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=94814.66666666667, ans=0.2 2024-09-14 07:34:38,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=94871.33333333333, ans=0.125 2024-09-14 07:34:40,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94871.33333333333, ans=0.1 2024-09-14 07:35:05,996 INFO [train.py:1198] (1/2) Epoch 6, batch 1550, loss[loss=0.2466, ctc_loss=0.1743, cr_loss=0.3616, over 20936.00 frames. ], tot_loss[loss=0.3013, ctc_loss=0.2187, cr_loss=0.4129, over 4093028.99 frames. ], batch size: 49, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:35:24,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=94956.33333333333, ans=0.125 2024-09-14 07:35:24,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94956.33333333333, ans=0.1 2024-09-14 07:35:26,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2024-09-14 07:35:50,546 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.334e+02 2.720e+02 3.342e+02 4.593e+02, threshold=5.439e+02, percent-clipped=0.0 2024-09-14 07:36:23,704 INFO [train.py:1198] (1/2) Epoch 6, batch 1600, loss[loss=0.3273, ctc_loss=0.2366, cr_loss=0.4534, over 20981.00 frames. ], tot_loss[loss=0.3008, ctc_loss=0.2183, cr_loss=0.4129, over 4096548.35 frames. ], batch size: 64, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:36:34,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=95069.66666666667, ans=0.0 2024-09-14 07:36:55,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-14 07:37:03,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=95126.33333333333, ans=0.125 2024-09-14 07:37:17,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=95154.66666666667, ans=0.125 2024-09-14 07:37:22,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=95154.66666666667, ans=0.025 2024-09-14 07:37:41,693 INFO [train.py:1198] (1/2) Epoch 6, batch 1650, loss[loss=0.2878, ctc_loss=0.2076, cr_loss=0.401, over 21032.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2194, cr_loss=0.4138, over 4088366.71 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:37:57,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=95239.66666666667, ans=0.125 2024-09-14 07:38:12,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-09-14 07:38:13,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95268.0, ans=0.1 2024-09-14 07:38:16,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=95268.0, ans=0.125 2024-09-14 07:38:26,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.254e+02 2.468e+02 2.996e+02 5.862e+02, threshold=4.937e+02, percent-clipped=1.0 2024-09-14 07:38:50,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95324.66666666667, ans=0.1 2024-09-14 07:38:57,354 INFO [train.py:1198] (1/2) Epoch 6, batch 1700, loss[loss=0.3188, ctc_loss=0.2323, cr_loss=0.4323, over 21025.00 frames. ], tot_loss[loss=0.3014, ctc_loss=0.2186, cr_loss=0.4139, over 4095668.27 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:39:02,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=95353.0, ans=0.0 2024-09-14 07:39:59,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=95466.33333333333, ans=0.2 2024-09-14 07:40:05,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95466.33333333333, ans=0.1 2024-09-14 07:40:13,010 INFO [train.py:1198] (1/2) Epoch 6, batch 1750, loss[loss=0.2746, ctc_loss=0.1967, cr_loss=0.3895, over 20990.00 frames. ], tot_loss[loss=0.3001, ctc_loss=0.2174, cr_loss=0.4133, over 4104973.77 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:40:14,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95494.66666666667, ans=0.0 2024-09-14 07:40:35,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=95523.0, ans=0.0 2024-09-14 07:40:41,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95551.33333333333, ans=0.1 2024-09-14 07:40:52,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=95551.33333333333, ans=0.125 2024-09-14 07:40:55,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=95551.33333333333, ans=0.125 2024-09-14 07:40:59,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.226e+02 2.397e+02 2.692e+02 3.705e+02, threshold=4.795e+02, percent-clipped=0.0 2024-09-14 07:41:05,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=95579.66666666667, ans=0.5 2024-09-14 07:41:15,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95608.0, ans=0.1 2024-09-14 07:41:28,517 INFO [train.py:1198] (1/2) Epoch 6, batch 1800, loss[loss=0.2874, ctc_loss=0.2065, cr_loss=0.4047, over 20972.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2168, cr_loss=0.4125, over 4107068.93 frames. ], batch size: 48, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:41:51,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=95664.66666666667, ans=0.07 2024-09-14 07:41:57,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=95664.66666666667, ans=0.0 2024-09-14 07:42:42,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2024-09-14 07:42:49,614 INFO [train.py:1198] (1/2) Epoch 6, batch 1850, loss[loss=0.3379, ctc_loss=0.2473, cr_loss=0.4531, over 21045.00 frames. ], tot_loss[loss=0.2988, ctc_loss=0.2164, cr_loss=0.4119, over 4099399.66 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:42:53,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=12.0 2024-09-14 07:42:56,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-14 07:43:11,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2024-09-14 07:43:23,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95834.66666666667, ans=0.1 2024-09-14 07:43:36,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.132e+02 2.311e+02 2.496e+02 4.751e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-14 07:44:00,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95891.33333333333, ans=0.1 2024-09-14 07:44:04,456 INFO [train.py:1198] (1/2) Epoch 6, batch 1900, loss[loss=0.3386, ctc_loss=0.2551, cr_loss=0.4172, over 20650.00 frames. ], tot_loss[loss=0.3005, ctc_loss=0.2179, cr_loss=0.4129, over 4094572.07 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:44:19,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=95948.0, ans=0.2 2024-09-14 07:44:31,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=95948.0, ans=0.0 2024-09-14 07:44:39,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=95976.33333333333, ans=0.0 2024-09-14 07:44:58,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=96004.66666666667, ans=0.0 2024-09-14 07:45:17,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=96033.0, ans=0.0 2024-09-14 07:45:19,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=96061.33333333333, ans=0.2 2024-09-14 07:45:20,105 INFO [train.py:1198] (1/2) Epoch 6, batch 1950, loss[loss=0.3253, ctc_loss=0.2391, cr_loss=0.4313, over 20945.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2168, cr_loss=0.4123, over 4096631.91 frames. ], batch size: 60, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:46:06,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.169e+02 2.396e+02 2.758e+02 4.232e+02, threshold=4.791e+02, percent-clipped=0.0 2024-09-14 07:46:22,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=96174.66666666667, ans=0.125 2024-09-14 07:46:35,456 INFO [train.py:1198] (1/2) Epoch 6, batch 2000, loss[loss=0.2814, ctc_loss=0.2042, cr_loss=0.3865, over 20771.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2167, cr_loss=0.4125, over 4100797.29 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:46:43,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=96203.0, ans=0.125 2024-09-14 07:46:46,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=96203.0, ans=0.05 2024-09-14 07:46:48,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=96203.0, ans=0.025 2024-09-14 07:46:48,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=96203.0, ans=0.2 2024-09-14 07:46:48,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=96203.0, ans=0.0 2024-09-14 07:47:00,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-14 07:47:13,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=96259.66666666667, ans=0.0 2024-09-14 07:47:18,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:47:53,803 INFO [train.py:1198] (1/2) Epoch 6, batch 2050, loss[loss=0.2965, ctc_loss=0.2186, cr_loss=0.3897, over 20742.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2164, cr_loss=0.4116, over 4094392.38 frames. ], batch size: 71, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:48:43,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.134e+02 2.316e+02 2.679e+02 4.546e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-14 07:49:03,689 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:49:06,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=96458.0, ans=0.0 2024-09-14 07:49:12,167 INFO [train.py:1198] (1/2) Epoch 6, batch 2100, loss[loss=0.3947, ctc_loss=0.3011, cr_loss=0.4684, over 14362.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2163, cr_loss=0.4107, over 4083042.95 frames. ], batch size: 150, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:50:04,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-14 07:50:27,259 INFO [train.py:1198] (1/2) Epoch 6, batch 2150, loss[loss=0.2938, ctc_loss=0.2116, cr_loss=0.4112, over 21034.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2157, cr_loss=0.4115, over 4099869.04 frames. ], batch size: 61, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:50:47,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=96656.33333333333, ans=0.09899494936611666 2024-09-14 07:50:56,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=96684.66666666667, ans=0.2 2024-09-14 07:51:01,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-14 07:51:14,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.129e+02 2.323e+02 2.669e+02 3.552e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-14 07:51:16,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=96713.0, ans=0.125 2024-09-14 07:51:43,461 INFO [train.py:1198] (1/2) Epoch 6, batch 2200, loss[loss=0.3699, ctc_loss=0.2751, cr_loss=0.4744, over 17971.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2163, cr_loss=0.4121, over 4090565.92 frames. ], batch size: 108, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:51:45,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=96769.66666666667, ans=0.125 2024-09-14 07:52:29,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-14 07:52:57,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=96911.33333333333, ans=0.125 2024-09-14 07:52:58,964 INFO [train.py:1198] (1/2) Epoch 6, batch 2250, loss[loss=0.308, ctc_loss=0.2254, cr_loss=0.4128, over 21032.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2149, cr_loss=0.4106, over 4107740.14 frames. ], batch size: 62, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:53:19,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=96939.66666666667, ans=0.0 2024-09-14 07:53:20,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=96939.66666666667, ans=0.05 2024-09-14 07:53:20,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=96939.66666666667, ans=0.0 2024-09-14 07:53:30,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2024-09-14 07:53:37,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=96968.0, ans=0.0 2024-09-14 07:53:49,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.122e+02 2.479e+02 2.958e+02 5.399e+02, threshold=4.959e+02, percent-clipped=1.0 2024-09-14 07:54:19,933 INFO [train.py:1198] (1/2) Epoch 6, batch 2300, loss[loss=0.3658, ctc_loss=0.2707, cr_loss=0.4756, over 17983.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2168, cr_loss=0.4121, over 4093311.10 frames. ], batch size: 108, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:54:21,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97053.0, ans=0.1 2024-09-14 07:54:48,941 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:54:59,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-14 07:55:35,206 INFO [train.py:1198] (1/2) Epoch 6, batch 2350, loss[loss=0.2852, ctc_loss=0.208, cr_loss=0.3862, over 19868.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2166, cr_loss=0.4117, over 4103096.10 frames. ], batch size: 44, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:55:35,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=22.5 2024-09-14 07:56:21,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.220e+02 2.402e+02 2.900e+02 4.265e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-14 07:56:23,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=97279.66666666667, ans=0.0 2024-09-14 07:56:37,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97308.0, ans=0.125 2024-09-14 07:56:50,316 INFO [train.py:1198] (1/2) Epoch 6, batch 2400, loss[loss=0.2803, ctc_loss=0.2034, cr_loss=0.3847, over 21067.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2156, cr_loss=0.4111, over 4114144.39 frames. ], batch size: 53, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:56:53,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=97336.33333333333, ans=0.125 2024-09-14 07:57:02,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97336.33333333333, ans=0.0 2024-09-14 07:57:17,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97364.66666666667, ans=0.125 2024-09-14 07:57:50,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-14 07:58:04,865 INFO [train.py:1198] (1/2) Epoch 6, batch 2450, loss[loss=0.3054, ctc_loss=0.2267, cr_loss=0.3934, over 20942.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2162, cr_loss=0.4114, over 4117394.08 frames. ], batch size: 60, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:58:24,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=97506.33333333333, ans=0.0 2024-09-14 07:58:30,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97506.33333333333, ans=0.125 2024-09-14 07:58:51,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.158e+02 2.367e+02 2.694e+02 5.623e+02, threshold=4.734e+02, percent-clipped=1.0 2024-09-14 07:58:54,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=97563.0, ans=0.125 2024-09-14 07:59:22,803 INFO [train.py:1198] (1/2) Epoch 6, batch 2500, loss[loss=0.3231, ctc_loss=0.2391, cr_loss=0.4201, over 19332.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.216, cr_loss=0.4108, over 4110518.60 frames. ], batch size: 90, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:59:35,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-14 08:00:40,794 INFO [train.py:1198] (1/2) Epoch 6, batch 2550, loss[loss=0.3507, ctc_loss=0.2623, cr_loss=0.4419, over 18468.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2172, cr_loss=0.4118, over 4103616.88 frames. ], batch size: 108, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 08:01:09,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=97818.0, ans=0.125 2024-09-14 08:01:16,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=97818.0, ans=0.0 2024-09-14 08:01:26,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=97846.33333333333, ans=0.125 2024-09-14 08:01:28,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.179e+02 2.442e+02 2.743e+02 4.136e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-14 08:01:41,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=22.5 2024-09-14 08:01:56,858 INFO [train.py:1198] (1/2) Epoch 6, batch 2600, loss[loss=0.2709, ctc_loss=0.194, cr_loss=0.3849, over 20965.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2157, cr_loss=0.4104, over 4109116.43 frames. ], batch size: 48, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 08:02:06,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=97903.0, ans=0.0 2024-09-14 08:02:22,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=97931.33333333333, ans=0.0 2024-09-14 08:02:31,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=97959.66666666667, ans=0.125 2024-09-14 08:02:51,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=97988.0, ans=0.0 2024-09-14 08:03:10,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=98044.66666666667, ans=0.0 2024-09-14 08:03:11,943 INFO [train.py:1198] (1/2) Epoch 6, batch 2650, loss[loss=0.3021, ctc_loss=0.2215, cr_loss=0.4031, over 20978.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2168, cr_loss=0.4109, over 4090630.51 frames. ], batch size: 64, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:03:59,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.239e+02 2.576e+02 3.103e+02 5.234e+02, threshold=5.152e+02, percent-clipped=2.0 2024-09-14 08:04:27,470 INFO [train.py:1198] (1/2) Epoch 6, batch 2700, loss[loss=0.3079, ctc_loss=0.2237, cr_loss=0.421, over 20341.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2159, cr_loss=0.4104, over 4097014.48 frames. ], batch size: 74, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:05:00,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=98243.0, ans=0.0 2024-09-14 08:05:08,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=98243.0, ans=0.0 2024-09-14 08:05:48,138 INFO [train.py:1198] (1/2) Epoch 6, batch 2750, loss[loss=0.3094, ctc_loss=0.2242, cr_loss=0.4262, over 20758.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2148, cr_loss=0.4095, over 4099702.29 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 16.0 2024-09-14 08:05:49,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=98328.0, ans=0.2 2024-09-14 08:06:35,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.141e+02 2.401e+02 2.853e+02 4.229e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-14 08:06:48,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-09-14 08:07:02,679 INFO [train.py:1198] (1/2) Epoch 6, batch 2800, loss[loss=0.2651, ctc_loss=0.1896, cr_loss=0.3775, over 20941.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2161, cr_loss=0.4113, over 4086478.31 frames. ], batch size: 48, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:07:20,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=98498.0, ans=10.0 2024-09-14 08:07:45,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2024-09-14 08:08:18,254 INFO [train.py:1198] (1/2) Epoch 6, batch 2850, loss[loss=0.2368, ctc_loss=0.1691, cr_loss=0.3385, over 19960.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2162, cr_loss=0.4109, over 4084044.89 frames. ], batch size: 44, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:08:21,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=98611.33333333333, ans=0.025 2024-09-14 08:08:24,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=98611.33333333333, ans=0.0 2024-09-14 08:09:06,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.291e+02 2.592e+02 3.023e+02 9.557e+02, threshold=5.183e+02, percent-clipped=2.0 2024-09-14 08:09:21,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=98724.66666666667, ans=0.0 2024-09-14 08:09:33,592 INFO [train.py:1198] (1/2) Epoch 6, batch 2900, loss[loss=0.27, ctc_loss=0.1935, cr_loss=0.3823, over 20794.00 frames. ], tot_loss[loss=0.2986, ctc_loss=0.2164, cr_loss=0.4108, over 4078702.67 frames. ], batch size: 53, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:09:37,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-09-14 08:09:38,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98753.0, ans=0.1 2024-09-14 08:09:59,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=98781.33333333333, ans=0.0 2024-09-14 08:10:22,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2024-09-14 08:10:51,772 INFO [train.py:1198] (1/2) Epoch 6, batch 2950, loss[loss=0.2683, ctc_loss=0.1933, cr_loss=0.3748, over 20956.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2169, cr_loss=0.4111, over 4086836.55 frames. ], batch size: 50, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:11:42,700 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.133e+02 2.361e+02 2.750e+02 4.226e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-14 08:11:56,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-09-14 08:12:10,165 INFO [train.py:1198] (1/2) Epoch 6, batch 3000, loss[loss=0.2889, ctc_loss=0.2103, cr_loss=0.393, over 21074.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2172, cr_loss=0.4121, over 4094721.51 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:12:10,165 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 08:12:31,009 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.06701, ctc_loss=0.06701, cr_loss=9.557e-15, over 944034.00 frames. 2024-09-14 08:12:31,010 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 08:12:50,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=22.5 2024-09-14 08:13:15,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=99093.0, ans=0.025 2024-09-14 08:13:25,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99121.33333333333, ans=0.125 2024-09-14 08:13:47,804 INFO [train.py:1198] (1/2) Epoch 6, batch 3050, loss[loss=0.2415, ctc_loss=0.1689, cr_loss=0.3632, over 19873.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2167, cr_loss=0.4119, over 4083117.17 frames. ], batch size: 44, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:14:09,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=99206.33333333333, ans=0.125 2024-09-14 08:14:12,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=22.5 2024-09-14 08:14:15,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-14 08:14:33,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=99263.0, ans=0.125 2024-09-14 08:14:33,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99263.0, ans=0.1 2024-09-14 08:14:36,178 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.214e+02 2.500e+02 2.844e+02 3.726e+02, threshold=4.999e+02, percent-clipped=0.0 2024-09-14 08:15:00,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=99291.33333333333, ans=0.125 2024-09-14 08:15:03,412 INFO [train.py:1198] (1/2) Epoch 6, batch 3100, loss[loss=0.2497, ctc_loss=0.1777, cr_loss=0.3598, over 20968.00 frames. ], tot_loss[loss=0.3, ctc_loss=0.2174, cr_loss=0.4129, over 4088429.93 frames. ], batch size: 48, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:15:14,427 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:15:27,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=99348.0, ans=0.0 2024-09-14 08:15:34,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-09-14 08:15:53,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=99404.66666666667, ans=0.5 2024-09-14 08:16:21,503 INFO [train.py:1198] (1/2) Epoch 6, batch 3150, loss[loss=0.3251, ctc_loss=0.235, cr_loss=0.4506, over 20708.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2165, cr_loss=0.412, over 4093879.33 frames. ], batch size: 68, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:16:49,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=99489.66666666667, ans=0.125 2024-09-14 08:16:53,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=99518.0, ans=0.5 2024-09-14 08:17:12,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.248e+02 2.527e+02 3.281e+02 4.804e+02, threshold=5.055e+02, percent-clipped=0.0 2024-09-14 08:17:28,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-14 08:17:39,931 INFO [train.py:1198] (1/2) Epoch 6, batch 3200, loss[loss=0.2512, ctc_loss=0.1813, cr_loss=0.3495, over 20976.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2172, cr_loss=0.4131, over 4086659.80 frames. ], batch size: 50, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:17:59,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=99631.33333333333, ans=0.0 2024-09-14 08:18:04,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=99631.33333333333, ans=0.0 2024-09-14 08:18:11,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=99659.66666666667, ans=0.0 2024-09-14 08:18:22,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=99659.66666666667, ans=0.125 2024-09-14 08:18:54,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=99744.66666666667, ans=0.05 2024-09-14 08:18:55,449 INFO [train.py:1198] (1/2) Epoch 6, batch 3250, loss[loss=0.3441, ctc_loss=0.2524, cr_loss=0.4588, over 20968.00 frames. ], tot_loss[loss=0.3001, ctc_loss=0.2173, cr_loss=0.4138, over 4088410.67 frames. ], batch size: 64, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:19:43,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.121e+02 2.272e+02 2.581e+02 3.879e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-14 08:19:45,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=99829.66666666667, ans=0.125 2024-09-14 08:19:47,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=99829.66666666667, ans=0.125 2024-09-14 08:20:08,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99858.0, ans=0.0 2024-09-14 08:20:10,851 INFO [train.py:1198] (1/2) Epoch 6, batch 3300, loss[loss=0.3493, ctc_loss=0.2634, cr_loss=0.4295, over 19853.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.217, cr_loss=0.4135, over 4092319.38 frames. ], batch size: 80, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:20:12,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=99886.33333333333, ans=0.125 2024-09-14 08:20:26,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=99914.66666666667, ans=0.025 2024-09-14 08:21:22,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=99999.66666666667, ans=0.0 2024-09-14 08:21:26,436 INFO [train.py:1198] (1/2) Epoch 6, batch 3350, loss[loss=0.2679, ctc_loss=0.1906, cr_loss=0.3865, over 21056.00 frames. ], tot_loss[loss=0.3004, ctc_loss=0.2177, cr_loss=0.4136, over 4087161.81 frames. ], batch size: 53, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:21:55,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100056.33333333333, ans=0.125 2024-09-14 08:22:17,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.300e+02 2.722e+02 3.531e+02 5.777e+02, threshold=5.444e+02, percent-clipped=5.0 2024-09-14 08:22:20,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=100113.0, ans=0.0 2024-09-14 08:22:47,516 INFO [train.py:1198] (1/2) Epoch 6, batch 3400, loss[loss=0.3234, ctc_loss=0.2366, cr_loss=0.4339, over 20979.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2173, cr_loss=0.4126, over 4089855.75 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:23:38,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=100254.66666666667, ans=0.0 2024-09-14 08:24:03,107 INFO [train.py:1198] (1/2) Epoch 6, batch 3450, loss[loss=0.3182, ctc_loss=0.2302, cr_loss=0.4402, over 20983.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2167, cr_loss=0.412, over 4097350.48 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:24:06,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=22.5 2024-09-14 08:24:22,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=100339.66666666667, ans=0.125 2024-09-14 08:24:44,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-09-14 08:24:51,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.270e+02 2.590e+02 2.910e+02 4.055e+02, threshold=5.180e+02, percent-clipped=0.0 2024-09-14 08:24:59,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=100396.33333333333, ans=0.125 2024-09-14 08:25:18,684 INFO [train.py:1198] (1/2) Epoch 6, batch 3500, loss[loss=0.2685, ctc_loss=0.1882, cr_loss=0.4019, over 20979.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2156, cr_loss=0.4108, over 4091157.29 frames. ], batch size: 51, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:25:38,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100481.33333333333, ans=0.125 2024-09-14 08:25:55,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=100509.66666666667, ans=0.0 2024-09-14 08:25:59,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=100509.66666666667, ans=0.2 2024-09-14 08:25:59,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100509.66666666667, ans=0.1 2024-09-14 08:26:06,232 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:26:11,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100538.0, ans=0.125 2024-09-14 08:26:33,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=100594.66666666667, ans=0.125 2024-09-14 08:26:33,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=100594.66666666667, ans=0.2 2024-09-14 08:26:34,715 INFO [train.py:1198] (1/2) Epoch 6, batch 3550, loss[loss=0.3106, ctc_loss=0.2267, cr_loss=0.4197, over 20642.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2148, cr_loss=0.4106, over 4103372.59 frames. ], batch size: 71, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:26:39,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=100594.66666666667, ans=0.125 2024-09-14 08:26:42,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=100594.66666666667, ans=0.04949747468305833 2024-09-14 08:27:11,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-09-14 08:27:24,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=100679.66666666667, ans=0.025 2024-09-14 08:27:25,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.086e+02 2.280e+02 2.546e+02 4.280e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-14 08:27:52,580 INFO [train.py:1198] (1/2) Epoch 6, batch 3600, loss[loss=0.3855, ctc_loss=0.2949, cr_loss=0.4527, over 14000.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2149, cr_loss=0.4102, over 4089015.27 frames. ], batch size: 149, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:28:19,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100764.66666666667, ans=0.125 2024-09-14 08:28:35,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-14 08:28:42,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-14 08:28:52,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=100821.33333333333, ans=0.125 2024-09-14 08:29:10,929 INFO [train.py:1198] (1/2) Epoch 6, batch 3650, loss[loss=0.2644, ctc_loss=0.1877, cr_loss=0.3837, over 20012.00 frames. ], tot_loss[loss=0.2949, ctc_loss=0.2132, cr_loss=0.4086, over 4100154.77 frames. ], batch size: 44, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:29:26,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=100906.33333333333, ans=0.0 2024-09-14 08:29:39,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=100934.66666666667, ans=0.0 2024-09-14 08:29:55,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=100963.0, ans=0.04949747468305833 2024-09-14 08:29:58,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=100963.0, ans=0.0 2024-09-14 08:29:59,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.126e+02 2.389e+02 2.658e+02 5.184e+02, threshold=4.778e+02, percent-clipped=1.0 2024-09-14 08:30:07,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=100963.0, ans=0.0 2024-09-14 08:30:26,133 INFO [train.py:1198] (1/2) Epoch 6, batch 3700, loss[loss=0.2488, ctc_loss=0.1758, cr_loss=0.3649, over 21001.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2138, cr_loss=0.4101, over 4093804.51 frames. ], batch size: 48, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:30:27,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=101019.66666666667, ans=0.125 2024-09-14 08:30:32,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:30:33,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2024-09-14 08:30:35,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=101019.66666666667, ans=0.2 2024-09-14 08:30:44,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=22.5 2024-09-14 08:30:49,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=101048.0, ans=0.125 2024-09-14 08:31:10,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=101104.66666666667, ans=0.125 2024-09-14 08:31:13,517 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:31:24,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=101104.66666666667, ans=0.125 2024-09-14 08:31:41,905 INFO [train.py:1198] (1/2) Epoch 6, batch 3750, loss[loss=0.3346, ctc_loss=0.2471, cr_loss=0.4373, over 18298.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2147, cr_loss=0.4106, over 4080573.84 frames. ], batch size: 108, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:32:30,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.249e+02 2.659e+02 3.159e+02 4.836e+02, threshold=5.318e+02, percent-clipped=1.0 2024-09-14 08:32:38,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=101246.33333333333, ans=0.125 2024-09-14 08:32:47,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=101274.66666666667, ans=0.0 2024-09-14 08:32:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=101303.0, ans=0.025 2024-09-14 08:32:57,783 INFO [train.py:1198] (1/2) Epoch 6, batch 3800, loss[loss=0.3008, ctc_loss=0.2219, cr_loss=0.3947, over 20289.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2145, cr_loss=0.4099, over 4083124.81 frames. ], batch size: 74, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:33:10,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=101303.0, ans=0.125 2024-09-14 08:33:36,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2024-09-14 08:34:18,823 INFO [train.py:1198] (1/2) Epoch 6, batch 3850, loss[loss=0.2555, ctc_loss=0.1785, cr_loss=0.3848, over 19907.00 frames. ], tot_loss[loss=0.2977, ctc_loss=0.2156, cr_loss=0.4107, over 4077317.27 frames. ], batch size: 44, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:34:57,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=101501.33333333333, ans=0.2 2024-09-14 08:35:01,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=101501.33333333333, ans=0.04949747468305833 2024-09-14 08:35:07,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.191e+02 2.419e+02 2.889e+02 4.083e+02, threshold=4.839e+02, percent-clipped=0.0 2024-09-14 08:35:16,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=101529.66666666667, ans=0.0 2024-09-14 08:35:34,198 INFO [train.py:1198] (1/2) Epoch 6, batch 3900, loss[loss=0.3139, ctc_loss=0.2265, cr_loss=0.4368, over 20734.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2158, cr_loss=0.4114, over 4069554.01 frames. ], batch size: 71, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:35:41,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=101586.33333333333, ans=0.125 2024-09-14 08:35:57,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=101614.66666666667, ans=0.0 2024-09-14 08:36:04,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=101643.0, ans=0.0 2024-09-14 08:36:38,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101699.66666666667, ans=0.1 2024-09-14 08:36:40,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=101699.66666666667, ans=0.125 2024-09-14 08:36:41,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=101699.66666666667, ans=0.125 2024-09-14 08:36:48,623 INFO [train.py:1198] (1/2) Epoch 6, batch 3950, loss[loss=0.2943, ctc_loss=0.2143, cr_loss=0.4001, over 20696.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.216, cr_loss=0.4119, over 4076359.34 frames. ], batch size: 68, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:37:22,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=101784.66666666667, ans=0.125 2024-09-14 08:37:29,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=101784.66666666667, ans=0.0 2024-09-14 08:37:33,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=101813.0, ans=15.0 2024-09-14 08:37:37,129 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.217e+02 2.436e+02 2.828e+02 5.297e+02, threshold=4.873e+02, percent-clipped=1.0 2024-09-14 08:37:45,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=101813.0, ans=0.2 2024-09-14 08:37:49,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101841.33333333333, ans=0.1 2024-09-14 08:38:04,428 INFO [train.py:1198] (1/2) Epoch 6, batch 4000, loss[loss=0.3508, ctc_loss=0.2575, cr_loss=0.4664, over 20847.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2159, cr_loss=0.4124, over 4083740.64 frames. ], batch size: 65, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:39:08,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=101983.0, ans=0.0 2024-09-14 08:39:10,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101983.0, ans=0.1 2024-09-14 08:39:24,153 INFO [train.py:1198] (1/2) Epoch 6, batch 4050, loss[loss=0.3037, ctc_loss=0.2179, cr_loss=0.4285, over 21058.00 frames. ], tot_loss[loss=0.299, ctc_loss=0.2163, cr_loss=0.4136, over 4086109.18 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:39:24,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=102011.33333333333, ans=0.125 2024-09-14 08:39:53,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102039.66666666667, ans=0.0 2024-09-14 08:40:16,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.240e+02 2.572e+02 3.256e+02 5.234e+02, threshold=5.143e+02, percent-clipped=2.0 2024-09-14 08:40:43,493 INFO [train.py:1198] (1/2) Epoch 6, batch 4100, loss[loss=0.2759, ctc_loss=0.1983, cr_loss=0.3883, over 20936.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2167, cr_loss=0.4141, over 4087114.51 frames. ], batch size: 50, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:41:17,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=102209.66666666667, ans=0.125 2024-09-14 08:41:20,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=102209.66666666667, ans=0.125 2024-09-14 08:41:56,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=102266.33333333333, ans=0.0 2024-09-14 08:41:59,455 INFO [train.py:1198] (1/2) Epoch 6, batch 4150, loss[loss=0.3028, ctc_loss=0.2224, cr_loss=0.4018, over 20801.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2161, cr_loss=0.4139, over 4091499.12 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:42:28,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=22.5 2024-09-14 08:42:34,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-14 08:42:47,311 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.196e+02 2.505e+02 2.718e+02 6.723e+02, threshold=5.010e+02, percent-clipped=1.0 2024-09-14 08:42:55,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=102379.66666666667, ans=0.0 2024-09-14 08:43:14,194 INFO [train.py:1198] (1/2) Epoch 6, batch 4200, loss[loss=0.2915, ctc_loss=0.2066, cr_loss=0.4243, over 20795.00 frames. ], tot_loss[loss=0.2979, ctc_loss=0.2152, cr_loss=0.4135, over 4105578.74 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:43:35,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=102464.66666666667, ans=0.0 2024-09-14 08:44:33,373 INFO [train.py:1198] (1/2) Epoch 6, batch 4250, loss[loss=0.2911, ctc_loss=0.21, cr_loss=0.4055, over 20428.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2148, cr_loss=0.4134, over 4118631.96 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:44:48,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=102606.33333333333, ans=0.0 2024-09-14 08:44:54,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=102606.33333333333, ans=0.125 2024-09-14 08:45:11,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102634.66666666667, ans=0.1 2024-09-14 08:45:21,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=102663.0, ans=0.0 2024-09-14 08:45:24,568 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.179e+02 2.361e+02 2.812e+02 4.445e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-14 08:45:51,928 INFO [train.py:1198] (1/2) Epoch 6, batch 4300, loss[loss=0.2915, ctc_loss=0.2085, cr_loss=0.4154, over 21029.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2165, cr_loss=0.4154, over 4116387.74 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:45:52,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-09-14 08:46:07,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=102748.0, ans=0.125 2024-09-14 08:46:19,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-14 08:47:06,398 INFO [train.py:1198] (1/2) Epoch 6, batch 4350, loss[loss=0.344, ctc_loss=0.2521, cr_loss=0.459, over 18051.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2164, cr_loss=0.4155, over 4117790.97 frames. ], batch size: 108, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:47:16,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=102861.33333333333, ans=0.125 2024-09-14 08:47:30,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=102889.66666666667, ans=0.125 2024-09-14 08:47:54,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.283e+02 2.599e+02 3.001e+02 4.794e+02, threshold=5.197e+02, percent-clipped=1.0 2024-09-14 08:48:20,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=103003.0, ans=0.125 2024-09-14 08:48:21,768 INFO [train.py:1198] (1/2) Epoch 6, batch 4400, loss[loss=0.251, ctc_loss=0.1774, cr_loss=0.368, over 20981.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2165, cr_loss=0.4146, over 4107955.21 frames. ], batch size: 51, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:48:54,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=103059.66666666667, ans=0.125 2024-09-14 08:49:23,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=103116.33333333333, ans=0.0 2024-09-14 08:49:37,290 INFO [train.py:1198] (1/2) Epoch 6, batch 4450, loss[loss=0.3311, ctc_loss=0.2428, cr_loss=0.4413, over 20932.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2168, cr_loss=0.4141, over 4100146.21 frames. ], batch size: 67, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:49:42,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=22.5 2024-09-14 08:50:19,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103201.33333333333, ans=0.125 2024-09-14 08:50:28,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.123e+02 2.377e+02 2.745e+02 5.864e+02, threshold=4.754e+02, percent-clipped=1.0 2024-09-14 08:50:31,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=103229.66666666667, ans=0.125 2024-09-14 08:50:53,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=103258.0, ans=0.2 2024-09-14 08:50:55,988 INFO [train.py:1198] (1/2) Epoch 6, batch 4500, loss[loss=0.2666, ctc_loss=0.1908, cr_loss=0.3789, over 21052.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.216, cr_loss=0.4131, over 4089752.37 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:52:13,907 INFO [train.py:1198] (1/2) Epoch 6, batch 4550, loss[loss=0.2771, ctc_loss=0.1988, cr_loss=0.3918, over 21065.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2162, cr_loss=0.4128, over 4080085.65 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:52:42,961 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:53:02,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.247e+02 2.416e+02 2.805e+02 4.163e+02, threshold=4.832e+02, percent-clipped=0.0 2024-09-14 08:53:04,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-14 08:53:13,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-14 08:53:26,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=103541.33333333333, ans=0.2 2024-09-14 08:53:28,887 INFO [train.py:1198] (1/2) Epoch 6, batch 4600, loss[loss=0.2478, ctc_loss=0.1786, cr_loss=0.346, over 20972.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.2157, cr_loss=0.4123, over 4086737.23 frames. ], batch size: 48, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:53:38,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=103569.66666666667, ans=0.05 2024-09-14 08:53:44,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-14 08:53:48,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-14 08:53:52,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-09-14 08:53:55,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=103598.0, ans=0.0 2024-09-14 08:53:55,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=103598.0, ans=0.125 2024-09-14 08:54:05,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=103626.33333333333, ans=0.125 2024-09-14 08:54:06,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-14 08:54:31,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103683.0, ans=0.125 2024-09-14 08:54:31,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2024-09-14 08:54:45,031 INFO [train.py:1198] (1/2) Epoch 6, batch 4650, loss[loss=0.234, ctc_loss=0.1646, cr_loss=0.3472, over 19903.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2142, cr_loss=0.4106, over 4092756.36 frames. ], batch size: 44, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:55:03,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=103739.66666666667, ans=0.125 2024-09-14 08:55:19,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103768.0, ans=0.1 2024-09-14 08:55:30,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103796.33333333333, ans=0.1 2024-09-14 08:55:32,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.179e+02 2.337e+02 2.741e+02 4.133e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-14 08:55:50,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2024-09-14 08:56:02,830 INFO [train.py:1198] (1/2) Epoch 6, batch 4700, loss[loss=0.2686, ctc_loss=0.1934, cr_loss=0.376, over 20961.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2133, cr_loss=0.4094, over 4094453.35 frames. ], batch size: 51, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:56:18,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=22.5 2024-09-14 08:56:28,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=103881.33333333333, ans=0.0 2024-09-14 08:57:21,063 INFO [train.py:1198] (1/2) Epoch 6, batch 4750, loss[loss=0.2862, ctc_loss=0.2026, cr_loss=0.418, over 21030.00 frames. ], tot_loss[loss=0.2949, ctc_loss=0.213, cr_loss=0.4094, over 4104219.53 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:57:27,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103994.66666666667, ans=0.0 2024-09-14 08:57:30,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103994.66666666667, ans=0.1 2024-09-14 08:57:46,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104023.0, ans=0.125 2024-09-14 08:57:47,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-14 08:58:04,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=104079.66666666667, ans=0.2 2024-09-14 08:58:06,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=104079.66666666667, ans=0.0 2024-09-14 08:58:10,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.181e+02 2.355e+02 2.604e+02 3.883e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-14 08:58:21,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=104108.0, ans=0.125 2024-09-14 08:58:30,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=104108.0, ans=0.2 2024-09-14 08:58:35,971 INFO [train.py:1198] (1/2) Epoch 6, batch 4800, loss[loss=0.2819, ctc_loss=0.204, cr_loss=0.3897, over 21059.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2129, cr_loss=0.4094, over 4099125.15 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 08:58:42,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-14 08:58:50,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-14 08:59:30,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=104221.33333333333, ans=0.2 2024-09-14 08:59:32,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=104221.33333333333, ans=0.0 2024-09-14 08:59:35,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104249.66666666667, ans=0.1 2024-09-14 08:59:36,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=104249.66666666667, ans=0.125 2024-09-14 08:59:45,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104249.66666666667, ans=0.1 2024-09-14 08:59:51,761 INFO [train.py:1198] (1/2) Epoch 6, batch 4850, loss[loss=0.257, ctc_loss=0.1824, cr_loss=0.3726, over 20972.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2118, cr_loss=0.4081, over 4104809.12 frames. ], batch size: 49, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 08:59:58,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-14 09:00:41,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.121e+02 2.391e+02 2.822e+02 4.771e+02, threshold=4.781e+02, percent-clipped=1.0 2024-09-14 09:00:56,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:01:06,291 INFO [train.py:1198] (1/2) Epoch 6, batch 4900, loss[loss=0.3532, ctc_loss=0.2593, cr_loss=0.4694, over 20952.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2112, cr_loss=0.4073, over 4107304.58 frames. ], batch size: 64, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:01:18,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-14 09:02:10,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=104533.0, ans=0.0 2024-09-14 09:02:23,128 INFO [train.py:1198] (1/2) Epoch 6, batch 4950, loss[loss=0.2646, ctc_loss=0.1903, cr_loss=0.3711, over 21046.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2112, cr_loss=0.4068, over 4108966.61 frames. ], batch size: 53, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:02:35,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104561.33333333333, ans=0.1 2024-09-14 09:02:50,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=104589.66666666667, ans=0.125 2024-09-14 09:03:13,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.200e+02 2.395e+02 2.783e+02 3.877e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-14 09:03:25,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=104674.66666666667, ans=0.0 2024-09-14 09:03:37,966 INFO [train.py:1198] (1/2) Epoch 6, batch 5000, loss[loss=0.2894, ctc_loss=0.2088, cr_loss=0.4034, over 20885.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2129, cr_loss=0.4087, over 4093008.73 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:03:39,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=104703.0, ans=0.125 2024-09-14 09:03:58,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=104731.33333333333, ans=0.0 2024-09-14 09:04:07,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=104731.33333333333, ans=0.2 2024-09-14 09:04:10,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=104759.66666666667, ans=0.125 2024-09-14 09:04:54,854 INFO [train.py:1198] (1/2) Epoch 6, batch 5050, loss[loss=0.3031, ctc_loss=0.2222, cr_loss=0.4047, over 20941.00 frames. ], tot_loss[loss=0.2944, ctc_loss=0.2126, cr_loss=0.4089, over 4093263.99 frames. ], batch size: 60, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:05:22,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=104873.0, ans=0.2 2024-09-14 09:05:45,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.077e+02 2.241e+02 2.493e+02 4.745e+02, threshold=4.482e+02, percent-clipped=0.0 2024-09-14 09:06:09,260 INFO [train.py:1198] (1/2) Epoch 6, batch 5100, loss[loss=0.316, ctc_loss=0.2289, cr_loss=0.4354, over 20637.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2137, cr_loss=0.41, over 4086681.36 frames. ], batch size: 68, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:06:17,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-14 09:06:21,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=104986.33333333333, ans=0.2 2024-09-14 09:07:06,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-14 09:07:19,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=105099.66666666667, ans=0.04949747468305833 2024-09-14 09:07:23,154 INFO [train.py:1198] (1/2) Epoch 6, batch 5150, loss[loss=0.266, ctc_loss=0.1873, cr_loss=0.3934, over 20953.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2128, cr_loss=0.4093, over 4096453.75 frames. ], batch size: 49, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:07:23,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-09-14 09:07:37,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=105156.33333333333, ans=0.125 2024-09-14 09:07:49,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=105156.33333333333, ans=0.0 2024-09-14 09:08:14,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.184e+02 2.580e+02 3.007e+02 4.983e+02, threshold=5.160e+02, percent-clipped=2.0 2024-09-14 09:08:16,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=105213.0, ans=0.025 2024-09-14 09:08:17,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-09-14 09:08:29,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-14 09:08:30,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105241.33333333333, ans=0.1 2024-09-14 09:08:37,412 INFO [train.py:1198] (1/2) Epoch 6, batch 5200, loss[loss=0.3261, ctc_loss=0.235, cr_loss=0.4553, over 20332.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2128, cr_loss=0.4091, over 4097226.05 frames. ], batch size: 74, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:08:52,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=105298.0, ans=0.2 2024-09-14 09:09:11,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=105326.33333333333, ans=0.125 2024-09-14 09:09:51,783 INFO [train.py:1198] (1/2) Epoch 6, batch 5250, loss[loss=0.3109, ctc_loss=0.2237, cr_loss=0.4356, over 20845.00 frames. ], tot_loss[loss=0.2955, ctc_loss=0.2134, cr_loss=0.4104, over 4104162.01 frames. ], batch size: 65, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:10:19,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-14 09:10:42,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.226e+02 2.488e+02 3.042e+02 5.046e+02, threshold=4.977e+02, percent-clipped=0.0 2024-09-14 09:11:02,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=105524.66666666667, ans=0.0 2024-09-14 09:11:03,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=105524.66666666667, ans=0.5 2024-09-14 09:11:06,187 INFO [train.py:1198] (1/2) Epoch 6, batch 5300, loss[loss=0.2764, ctc_loss=0.1955, cr_loss=0.4042, over 21067.00 frames. ], tot_loss[loss=0.2941, ctc_loss=0.2123, cr_loss=0.4092, over 4102060.75 frames. ], batch size: 62, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:11:11,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-14 09:12:02,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=105638.0, ans=0.2 2024-09-14 09:12:05,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-09-14 09:12:14,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=105666.33333333333, ans=0.125 2024-09-14 09:12:22,818 INFO [train.py:1198] (1/2) Epoch 6, batch 5350, loss[loss=0.2974, ctc_loss=0.2109, cr_loss=0.4326, over 20930.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2129, cr_loss=0.4099, over 4099761.72 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:12:29,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-09-14 09:12:30,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-09-14 09:12:40,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=105723.0, ans=0.025 2024-09-14 09:12:51,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=105751.33333333333, ans=0.0 2024-09-14 09:12:58,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=105751.33333333333, ans=0.125 2024-09-14 09:13:13,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.121e+02 2.316e+02 2.792e+02 3.922e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-14 09:13:14,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=105779.66666666667, ans=0.125 2024-09-14 09:13:36,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105808.0, ans=0.1 2024-09-14 09:13:39,568 INFO [train.py:1198] (1/2) Epoch 6, batch 5400, loss[loss=0.2975, ctc_loss=0.2165, cr_loss=0.4048, over 20647.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2131, cr_loss=0.4104, over 4090025.83 frames. ], batch size: 68, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:13:55,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=105864.66666666667, ans=0.125 2024-09-14 09:13:56,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=105864.66666666667, ans=0.125 2024-09-14 09:14:17,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=105893.0, ans=0.0 2024-09-14 09:14:28,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=105921.33333333333, ans=0.125 2024-09-14 09:14:42,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=105949.66666666667, ans=0.0 2024-09-14 09:14:54,010 INFO [train.py:1198] (1/2) Epoch 6, batch 5450, loss[loss=0.3021, ctc_loss=0.2195, cr_loss=0.413, over 21032.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2127, cr_loss=0.4104, over 4097293.73 frames. ], batch size: 62, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:15:28,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106034.66666666667, ans=0.1 2024-09-14 09:15:35,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=106034.66666666667, ans=0.125 2024-09-14 09:15:38,694 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:15:41,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106063.0, ans=0.125 2024-09-14 09:15:44,129 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.253e+02 2.505e+02 2.912e+02 6.066e+02, threshold=5.009e+02, percent-clipped=2.0 2024-09-14 09:15:52,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-14 09:15:58,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-14 09:16:08,063 INFO [train.py:1198] (1/2) Epoch 6, batch 5500, loss[loss=0.2632, ctc_loss=0.191, cr_loss=0.3611, over 20955.00 frames. ], tot_loss[loss=0.2954, ctc_loss=0.2132, cr_loss=0.411, over 4099878.56 frames. ], batch size: 51, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:16:30,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106148.0, ans=0.1 2024-09-14 09:17:06,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=106233.0, ans=0.125 2024-09-14 09:17:22,267 INFO [train.py:1198] (1/2) Epoch 6, batch 5550, loss[loss=0.2806, ctc_loss=0.1982, cr_loss=0.412, over 20804.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2134, cr_loss=0.411, over 4090821.90 frames. ], batch size: 53, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:17:32,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106261.33333333333, ans=0.1 2024-09-14 09:18:12,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.128e+02 2.265e+02 2.495e+02 3.567e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-14 09:18:31,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=106374.66666666667, ans=0.125 2024-09-14 09:18:36,068 INFO [train.py:1198] (1/2) Epoch 6, batch 5600, loss[loss=0.2964, ctc_loss=0.2122, cr_loss=0.4211, over 21071.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2138, cr_loss=0.4115, over 4074665.46 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:19:10,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106459.66666666667, ans=0.1 2024-09-14 09:19:15,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=106459.66666666667, ans=0.2 2024-09-14 09:19:18,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106459.66666666667, ans=0.0 2024-09-14 09:19:25,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-09-14 09:19:42,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=106516.33333333333, ans=0.125 2024-09-14 09:19:50,765 INFO [train.py:1198] (1/2) Epoch 6, batch 5650, loss[loss=0.3393, ctc_loss=0.2499, cr_loss=0.447, over 19280.00 frames. ], tot_loss[loss=0.2963, ctc_loss=0.2139, cr_loss=0.4118, over 4081060.87 frames. ], batch size: 90, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:19:52,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=106544.66666666667, ans=0.0 2024-09-14 09:20:01,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106544.66666666667, ans=0.0 2024-09-14 09:20:41,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.152e+02 2.410e+02 2.740e+02 3.893e+02, threshold=4.820e+02, percent-clipped=0.0 2024-09-14 09:20:51,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=106658.0, ans=0.125 2024-09-14 09:21:07,806 INFO [train.py:1198] (1/2) Epoch 6, batch 5700, loss[loss=0.2732, ctc_loss=0.1968, cr_loss=0.382, over 20771.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.2141, cr_loss=0.4115, over 4083084.35 frames. ], batch size: 53, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:21:25,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=106714.66666666667, ans=0.025 2024-09-14 09:21:29,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2024-09-14 09:22:09,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106799.66666666667, ans=0.1 2024-09-14 09:22:18,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2024-09-14 09:22:21,983 INFO [train.py:1198] (1/2) Epoch 6, batch 5750, loss[loss=0.2818, ctc_loss=0.2043, cr_loss=0.3874, over 20966.00 frames. ], tot_loss[loss=0.2953, ctc_loss=0.2132, cr_loss=0.4105, over 4094126.35 frames. ], batch size: 49, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:22:29,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=106828.0, ans=0.025 2024-09-14 09:23:14,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.113e+02 2.306e+02 2.589e+02 3.888e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-14 09:23:19,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=106913.0, ans=0.0 2024-09-14 09:23:23,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=106941.33333333333, ans=0.2 2024-09-14 09:23:26,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=106941.33333333333, ans=0.125 2024-09-14 09:23:38,701 INFO [train.py:1198] (1/2) Epoch 6, batch 5800, loss[loss=0.2851, ctc_loss=0.2051, cr_loss=0.4, over 21072.00 frames. ], tot_loss[loss=0.2949, ctc_loss=0.2129, cr_loss=0.4103, over 4094000.48 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:23:58,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=106998.0, ans=0.025 2024-09-14 09:24:01,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106998.0, ans=0.1 2024-09-14 09:24:11,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=107026.33333333333, ans=0.2 2024-09-14 09:24:14,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=107026.33333333333, ans=0.125 2024-09-14 09:24:20,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=107026.33333333333, ans=0.2 2024-09-14 09:24:32,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=107054.66666666667, ans=0.0 2024-09-14 09:24:52,808 INFO [train.py:1198] (1/2) Epoch 6, batch 5850, loss[loss=0.3194, ctc_loss=0.2324, cr_loss=0.4347, over 20969.00 frames. ], tot_loss[loss=0.2954, ctc_loss=0.213, cr_loss=0.4117, over 4098083.82 frames. ], batch size: 64, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:25:02,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=107111.33333333333, ans=0.125 2024-09-14 09:25:05,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-09-14 09:25:43,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.138e+02 2.348e+02 2.763e+02 4.155e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-14 09:25:56,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=107224.66666666667, ans=0.125 2024-09-14 09:26:06,872 INFO [train.py:1198] (1/2) Epoch 6, batch 5900, loss[loss=0.3631, ctc_loss=0.2683, cr_loss=0.474, over 18473.00 frames. ], tot_loss[loss=0.296, ctc_loss=0.2137, cr_loss=0.4118, over 4095774.66 frames. ], batch size: 108, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:26:10,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107253.0, ans=0.1 2024-09-14 09:26:13,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=107253.0, ans=0.0 2024-09-14 09:26:46,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=107309.66666666667, ans=0.125 2024-09-14 09:26:57,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=107338.0, ans=0.2 2024-09-14 09:27:09,549 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:27:16,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=107366.33333333333, ans=10.0 2024-09-14 09:27:21,089 INFO [train.py:1198] (1/2) Epoch 6, batch 5950, loss[loss=0.3131, ctc_loss=0.2283, cr_loss=0.4241, over 20729.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2144, cr_loss=0.413, over 4092479.00 frames. ], batch size: 71, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:27:34,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107423.0, ans=0.125 2024-09-14 09:27:48,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=107423.0, ans=0.0 2024-09-14 09:28:12,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.063e+02 2.261e+02 2.552e+02 4.834e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-14 09:28:23,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=107508.0, ans=0.0 2024-09-14 09:28:36,077 INFO [train.py:1198] (1/2) Epoch 6, batch 6000, loss[loss=0.322, ctc_loss=0.2331, cr_loss=0.4443, over 20974.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2133, cr_loss=0.4113, over 4095428.80 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:28:36,077 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 09:28:58,859 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.06491, ctc_loss=0.06491, cr_loss=9.455e-15, over 944034.00 frames. 2024-09-14 09:28:58,860 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 09:29:30,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=107593.0, ans=0.0 2024-09-14 09:29:59,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=107649.66666666667, ans=0.125 2024-09-14 09:30:00,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-09-14 09:30:03,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-09-14 09:30:13,096 INFO [train.py:1198] (1/2) Epoch 6, batch 6050, loss[loss=0.244, ctc_loss=0.1713, cr_loss=0.3637, over 20958.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2127, cr_loss=0.4108, over 4099074.97 frames. ], batch size: 51, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:30:17,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-09-14 09:30:57,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=107734.66666666667, ans=0.125 2024-09-14 09:31:05,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.153e+02 2.341e+02 2.684e+02 6.077e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-14 09:31:07,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=107763.0, ans=0.025 2024-09-14 09:31:27,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107791.33333333333, ans=0.125 2024-09-14 09:31:29,987 INFO [train.py:1198] (1/2) Epoch 6, batch 6100, loss[loss=0.251, ctc_loss=0.1792, cr_loss=0.359, over 20934.00 frames. ], tot_loss[loss=0.2946, ctc_loss=0.2126, cr_loss=0.4099, over 4091086.54 frames. ], batch size: 48, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:31:51,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=107848.0, ans=0.2 2024-09-14 09:32:16,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107904.66666666667, ans=0.1 2024-09-14 09:32:44,858 INFO [train.py:1198] (1/2) Epoch 6, batch 6150, loss[loss=0.3517, ctc_loss=0.257, cr_loss=0.4732, over 19495.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2136, cr_loss=0.4112, over 4087948.74 frames. ], batch size: 90, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:32:51,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-14 09:33:32,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-09-14 09:33:35,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.158e+02 2.452e+02 2.758e+02 5.690e+02, threshold=4.905e+02, percent-clipped=1.0 2024-09-14 09:33:59,198 INFO [train.py:1198] (1/2) Epoch 6, batch 6200, loss[loss=0.3566, ctc_loss=0.2745, cr_loss=0.4101, over 14437.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2141, cr_loss=0.4105, over 4066301.93 frames. ], batch size: 150, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:34:37,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=108159.66666666667, ans=0.0 2024-09-14 09:34:40,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=108159.66666666667, ans=0.2 2024-09-14 09:35:10,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2024-09-14 09:35:14,289 INFO [train.py:1198] (1/2) Epoch 6, batch 6250, loss[loss=0.2645, ctc_loss=0.1895, cr_loss=0.3748, over 20942.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2137, cr_loss=0.4101, over 4065449.64 frames. ], batch size: 49, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:35:31,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=108273.0, ans=0.95 2024-09-14 09:36:06,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.214e+02 2.419e+02 2.929e+02 6.352e+02, threshold=4.837e+02, percent-clipped=3.0 2024-09-14 09:36:09,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=108329.66666666667, ans=0.2 2024-09-14 09:36:29,619 INFO [train.py:1198] (1/2) Epoch 6, batch 6300, loss[loss=0.3026, ctc_loss=0.217, cr_loss=0.428, over 20751.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2161, cr_loss=0.4112, over 4030233.72 frames. ], batch size: 71, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:37:35,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108499.66666666667, ans=0.125 2024-09-14 09:37:40,252 INFO [train.py:1198] (1/2) Epoch 6, batch 6350, loss[loss=0.375, ctc_loss=0.2833, cr_loss=0.4585, over 14943.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2255, cr_loss=0.4161, over 3842272.01 frames. ], batch size: 149, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:37:56,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=108556.33333333333, ans=0.125 2024-09-14 09:38:08,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=108584.66666666667, ans=0.125 2024-09-14 09:38:29,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.339e+02 2.477e+02 2.708e+02 4.665e+02, threshold=4.953e+02, percent-clipped=0.0 2024-09-14 09:39:27,387 INFO [train.py:1198] (1/2) Epoch 7, batch 0, loss[loss=0.281, ctc_loss=0.202, cr_loss=0.3951, over 20775.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.202, cr_loss=0.3951, over 20775.00 frames. ], batch size: 56, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:39:27,387 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 09:39:49,200 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.06594, ctc_loss=0.06594, cr_loss=9.26e-15, over 944034.00 frames. 2024-09-14 09:39:49,201 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 09:40:31,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=108700.83333333333, ans=0.125 2024-09-14 09:40:57,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-14 09:41:04,375 INFO [train.py:1198] (1/2) Epoch 7, batch 50, loss[loss=0.2893, ctc_loss=0.2099, cr_loss=0.3971, over 20910.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2095, cr_loss=0.4064, over 928450.13 frames. ], batch size: 54, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:41:29,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2024-09-14 09:42:00,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=108870.83333333333, ans=0.125 2024-09-14 09:42:09,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.131e+02 2.316e+02 2.574e+02 5.024e+02, threshold=4.631e+02, percent-clipped=1.0 2024-09-14 09:42:20,095 INFO [train.py:1198] (1/2) Epoch 7, batch 100, loss[loss=0.2867, ctc_loss=0.2024, cr_loss=0.4213, over 21075.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2108, cr_loss=0.4089, over 1630791.27 frames. ], batch size: 56, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:42:47,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108955.83333333333, ans=0.1 2024-09-14 09:43:07,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=109012.5, ans=0.125 2024-09-14 09:43:13,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109012.5, ans=0.125 2024-09-14 09:43:36,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=109040.83333333333, ans=0.125 2024-09-14 09:43:38,926 INFO [train.py:1198] (1/2) Epoch 7, batch 150, loss[loss=0.3079, ctc_loss=0.2257, cr_loss=0.4109, over 19527.00 frames. ], tot_loss[loss=0.2919, ctc_loss=0.2102, cr_loss=0.4085, over 2172740.18 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:43:48,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=109069.16666666667, ans=0.07 2024-09-14 09:44:43,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-09-14 09:44:44,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.118e+02 2.261e+02 2.574e+02 3.936e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-14 09:44:55,150 INFO [train.py:1198] (1/2) Epoch 7, batch 200, loss[loss=0.3152, ctc_loss=0.2284, cr_loss=0.4341, over 20307.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2109, cr_loss=0.4094, over 2595074.74 frames. ], batch size: 74, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:45:39,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-14 09:46:13,796 INFO [train.py:1198] (1/2) Epoch 7, batch 250, loss[loss=0.2686, ctc_loss=0.1899, cr_loss=0.3935, over 20892.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.211, cr_loss=0.4108, over 2928412.87 frames. ], batch size: 54, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:46:31,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=109380.83333333333, ans=0.0 2024-09-14 09:46:35,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=109380.83333333333, ans=0.2 2024-09-14 09:47:07,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=109437.5, ans=0.125 2024-09-14 09:47:08,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=109437.5, ans=0.0 2024-09-14 09:47:17,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109465.83333333333, ans=0.1 2024-09-14 09:47:18,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.090e+02 2.265e+02 2.566e+02 4.003e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-14 09:47:29,353 INFO [train.py:1198] (1/2) Epoch 7, batch 300, loss[loss=0.2656, ctc_loss=0.1918, cr_loss=0.3691, over 20969.00 frames. ], tot_loss[loss=0.2936, ctc_loss=0.2115, cr_loss=0.4105, over 3175702.18 frames. ], batch size: 51, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:48:02,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-14 09:48:42,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109607.5, ans=0.1 2024-09-14 09:48:47,810 INFO [train.py:1198] (1/2) Epoch 7, batch 350, loss[loss=0.3045, ctc_loss=0.2211, cr_loss=0.4172, over 20231.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2106, cr_loss=0.4095, over 3379042.31 frames. ], batch size: 74, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:49:35,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-09-14 09:49:53,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.071e+02 2.305e+02 2.810e+02 4.321e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-14 09:50:03,612 INFO [train.py:1198] (1/2) Epoch 7, batch 400, loss[loss=0.3119, ctc_loss=0.2215, cr_loss=0.4521, over 21086.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2099, cr_loss=0.409, over 3533272.76 frames. ], batch size: 59, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:50:14,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=109777.5, ans=0.2 2024-09-14 09:50:25,729 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.418e-03 2024-09-14 09:50:25,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=109805.83333333333, ans=0.2 2024-09-14 09:50:41,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=22.5 2024-09-14 09:51:07,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=109890.83333333333, ans=0.125 2024-09-14 09:51:23,718 INFO [train.py:1198] (1/2) Epoch 7, batch 450, loss[loss=0.2943, ctc_loss=0.213, cr_loss=0.4067, over 20980.00 frames. ], tot_loss[loss=0.2903, ctc_loss=0.2088, cr_loss=0.4076, over 3668635.84 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:52:13,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=110004.16666666667, ans=0.0 2024-09-14 09:52:13,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=110004.16666666667, ans=0.025 2024-09-14 09:52:29,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.067e+02 2.280e+02 2.580e+02 4.527e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 09:52:32,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110032.5, ans=0.1 2024-09-14 09:52:39,899 INFO [train.py:1198] (1/2) Epoch 7, batch 500, loss[loss=0.29, ctc_loss=0.203, cr_loss=0.4347, over 21073.00 frames. ], tot_loss[loss=0.292, ctc_loss=0.2101, cr_loss=0.4095, over 3764466.74 frames. ], batch size: 59, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:52:50,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110060.83333333333, ans=0.1 2024-09-14 09:53:41,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=110174.16666666667, ans=0.125 2024-09-14 09:53:55,322 INFO [train.py:1198] (1/2) Epoch 7, batch 550, loss[loss=0.3142, ctc_loss=0.2274, cr_loss=0.4342, over 21036.00 frames. ], tot_loss[loss=0.293, ctc_loss=0.2109, cr_loss=0.4104, over 3838731.93 frames. ], batch size: 62, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:54:03,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=110202.5, ans=10.0 2024-09-14 09:54:21,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=110230.83333333333, ans=0.025 2024-09-14 09:54:32,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-09-14 09:54:33,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=110259.16666666667, ans=0.125 2024-09-14 09:54:47,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=110287.5, ans=0.09899494936611666 2024-09-14 09:55:04,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.091e+02 2.360e+02 2.552e+02 4.215e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-14 09:55:10,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-14 09:55:14,571 INFO [train.py:1198] (1/2) Epoch 7, batch 600, loss[loss=0.3242, ctc_loss=0.238, cr_loss=0.4313, over 20669.00 frames. ], tot_loss[loss=0.2938, ctc_loss=0.2116, cr_loss=0.4109, over 3887078.16 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 64.0 2024-09-14 09:55:19,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110344.16666666667, ans=0.0 2024-09-14 09:55:54,349 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:56:29,905 INFO [train.py:1198] (1/2) Epoch 7, batch 650, loss[loss=0.2987, ctc_loss=0.2159, cr_loss=0.4136, over 20894.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2108, cr_loss=0.4096, over 3937096.96 frames. ], batch size: 57, lr: 1.23e-02, grad_scale: 64.0 2024-09-14 09:57:08,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=110542.5, ans=0.125 2024-09-14 09:57:10,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2024-09-14 09:57:39,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.076e+02 2.242e+02 2.482e+02 3.772e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-14 09:57:46,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=110599.16666666667, ans=0.125 2024-09-14 09:57:48,719 INFO [train.py:1198] (1/2) Epoch 7, batch 700, loss[loss=0.2524, ctc_loss=0.1791, cr_loss=0.3666, over 19867.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2113, cr_loss=0.4108, over 3961652.86 frames. ], batch size: 44, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 09:58:16,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-14 09:58:23,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=110684.16666666667, ans=0.0 2024-09-14 09:58:32,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=110712.5, ans=0.2 2024-09-14 09:58:37,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=110712.5, ans=0.125 2024-09-14 09:59:04,100 INFO [train.py:1198] (1/2) Epoch 7, batch 750, loss[loss=0.2853, ctc_loss=0.2091, cr_loss=0.381, over 21014.00 frames. ], tot_loss[loss=0.2939, ctc_loss=0.2118, cr_loss=0.4106, over 3994089.55 frames. ], batch size: 63, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 09:59:11,906 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:59:39,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=110825.83333333333, ans=0.05 2024-09-14 09:59:45,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110825.83333333333, ans=0.1 2024-09-14 09:59:59,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-14 10:00:13,803 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.111e+02 2.325e+02 2.832e+02 4.896e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-14 10:00:23,088 INFO [train.py:1198] (1/2) Epoch 7, batch 800, loss[loss=0.3033, ctc_loss=0.2201, cr_loss=0.4164, over 20635.00 frames. ], tot_loss[loss=0.2945, ctc_loss=0.2121, cr_loss=0.4119, over 4019367.31 frames. ], batch size: 68, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:00:26,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=110910.83333333333, ans=0.0 2024-09-14 10:00:56,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=110967.5, ans=0.125 2024-09-14 10:01:27,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=111024.16666666667, ans=0.025 2024-09-14 10:01:39,255 INFO [train.py:1198] (1/2) Epoch 7, batch 850, loss[loss=0.326, ctc_loss=0.2358, cr_loss=0.4514, over 19958.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2106, cr_loss=0.4097, over 4029334.95 frames. ], batch size: 80, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:01:50,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=111052.5, ans=0.2 2024-09-14 10:01:51,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111052.5, ans=0.125 2024-09-14 10:01:54,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111080.83333333333, ans=0.1 2024-09-14 10:02:04,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=12.0 2024-09-14 10:02:18,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=111109.16666666667, ans=0.125 2024-09-14 10:02:30,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=111137.5, ans=0.2 2024-09-14 10:02:36,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-14 10:02:43,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=111165.83333333333, ans=0.1 2024-09-14 10:02:48,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.181e+02 2.357e+02 2.694e+02 4.495e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-14 10:02:58,169 INFO [train.py:1198] (1/2) Epoch 7, batch 900, loss[loss=0.2793, ctc_loss=0.1998, cr_loss=0.3977, over 20835.00 frames. ], tot_loss[loss=0.2935, ctc_loss=0.2114, cr_loss=0.4109, over 4032754.74 frames. ], batch size: 59, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:03:10,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=111194.16666666667, ans=0.125 2024-09-14 10:03:13,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=111222.5, ans=0.2 2024-09-14 10:03:21,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=111222.5, ans=0.125 2024-09-14 10:03:49,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=111279.16666666667, ans=0.125 2024-09-14 10:04:13,824 INFO [train.py:1198] (1/2) Epoch 7, batch 950, loss[loss=0.3239, ctc_loss=0.2367, cr_loss=0.4357, over 20078.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2107, cr_loss=0.4113, over 4049747.54 frames. ], batch size: 80, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:04:54,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=111392.5, ans=0.125 2024-09-14 10:05:20,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.117e+02 2.264e+02 2.484e+02 5.104e+02, threshold=4.529e+02, percent-clipped=2.0 2024-09-14 10:05:29,491 INFO [train.py:1198] (1/2) Epoch 7, batch 1000, loss[loss=0.2453, ctc_loss=0.1741, cr_loss=0.3561, over 21002.00 frames. ], tot_loss[loss=0.2914, ctc_loss=0.2094, cr_loss=0.4101, over 4071388.05 frames. ], batch size: 51, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:06:09,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111534.16666666667, ans=0.0 2024-09-14 10:06:25,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=111562.5, ans=0.025 2024-09-14 10:06:37,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=111590.83333333333, ans=0.0 2024-09-14 10:06:47,981 INFO [train.py:1198] (1/2) Epoch 7, batch 1050, loss[loss=0.2614, ctc_loss=0.1824, cr_loss=0.3952, over 21055.00 frames. ], tot_loss[loss=0.2921, ctc_loss=0.2099, cr_loss=0.4107, over 4075483.79 frames. ], batch size: 56, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:07:27,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-14 10:07:33,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=111704.16666666667, ans=0.125 2024-09-14 10:07:37,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-14 10:07:54,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.102e+02 2.215e+02 2.394e+02 3.325e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-14 10:08:00,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111732.5, ans=0.1 2024-09-14 10:08:03,632 INFO [train.py:1198] (1/2) Epoch 7, batch 1100, loss[loss=0.2957, ctc_loss=0.2149, cr_loss=0.4039, over 20770.00 frames. ], tot_loss[loss=0.292, ctc_loss=0.2099, cr_loss=0.4108, over 4078344.29 frames. ], batch size: 56, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:08:05,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=111760.83333333333, ans=0.2 2024-09-14 10:08:08,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111760.83333333333, ans=0.125 2024-09-14 10:08:10,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=111760.83333333333, ans=0.125 2024-09-14 10:08:33,554 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:09:00,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=111845.83333333333, ans=0.0 2024-09-14 10:09:06,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=111874.16666666667, ans=0.0 2024-09-14 10:09:22,257 INFO [train.py:1198] (1/2) Epoch 7, batch 1150, loss[loss=0.3134, ctc_loss=0.2245, cr_loss=0.4445, over 20658.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.2109, cr_loss=0.4119, over 4073836.60 frames. ], batch size: 68, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:09:38,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-14 10:09:49,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-09-14 10:10:15,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=111987.5, ans=0.07 2024-09-14 10:10:16,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=111987.5, ans=0.125 2024-09-14 10:10:28,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.073e+02 2.271e+02 2.649e+02 5.867e+02, threshold=4.542e+02, percent-clipped=1.0 2024-09-14 10:10:37,500 INFO [train.py:1198] (1/2) Epoch 7, batch 1200, loss[loss=0.3088, ctc_loss=0.2265, cr_loss=0.4114, over 21024.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2097, cr_loss=0.41, over 4073312.48 frames. ], batch size: 63, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:10:39,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112044.16666666667, ans=0.125 2024-09-14 10:10:43,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=112044.16666666667, ans=0.125 2024-09-14 10:10:48,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112044.16666666667, ans=0.1 2024-09-14 10:10:48,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=112044.16666666667, ans=0.0 2024-09-14 10:10:54,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112072.5, ans=0.1 2024-09-14 10:11:55,987 INFO [train.py:1198] (1/2) Epoch 7, batch 1250, loss[loss=0.352, ctc_loss=0.258, cr_loss=0.4702, over 19445.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2107, cr_loss=0.4111, over 4072753.39 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:11:59,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=112185.83333333333, ans=0.125 2024-09-14 10:12:20,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=112214.16666666667, ans=0.95 2024-09-14 10:12:38,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=112242.5, ans=0.0 2024-09-14 10:13:02,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.125e+02 2.290e+02 2.823e+02 4.648e+02, threshold=4.581e+02, percent-clipped=1.0 2024-09-14 10:13:11,840 INFO [train.py:1198] (1/2) Epoch 7, batch 1300, loss[loss=0.2722, ctc_loss=0.1965, cr_loss=0.3782, over 19872.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2104, cr_loss=0.4105, over 4073142.13 frames. ], batch size: 44, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:13:18,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=112327.5, ans=0.2 2024-09-14 10:13:38,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-14 10:14:07,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-14 10:14:31,171 INFO [train.py:1198] (1/2) Epoch 7, batch 1350, loss[loss=0.2684, ctc_loss=0.1892, cr_loss=0.3958, over 20791.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2106, cr_loss=0.4106, over 4075744.59 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:14:36,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=112469.16666666667, ans=0.0 2024-09-14 10:14:41,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-14 10:15:20,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=112554.16666666667, ans=0.125 2024-09-14 10:15:32,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=112582.5, ans=0.0 2024-09-14 10:15:38,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.151e+02 2.366e+02 2.823e+02 4.048e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-14 10:15:47,595 INFO [train.py:1198] (1/2) Epoch 7, batch 1400, loss[loss=0.238, ctc_loss=0.1682, cr_loss=0.349, over 20958.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2103, cr_loss=0.41, over 4073321.19 frames. ], batch size: 49, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:16:10,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=112639.16666666667, ans=0.0 2024-09-14 10:16:34,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=112695.83333333333, ans=0.125 2024-09-14 10:16:35,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=112695.83333333333, ans=0.125 2024-09-14 10:16:38,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-09-14 10:16:46,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=112724.16666666667, ans=0.125 2024-09-14 10:17:03,106 INFO [train.py:1198] (1/2) Epoch 7, batch 1450, loss[loss=0.3263, ctc_loss=0.2365, cr_loss=0.4487, over 20966.00 frames. ], tot_loss[loss=0.2911, ctc_loss=0.2093, cr_loss=0.4094, over 4081939.14 frames. ], batch size: 64, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:17:09,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112752.5, ans=0.1 2024-09-14 10:17:35,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112809.16666666667, ans=0.125 2024-09-14 10:18:13,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.149e+02 2.360e+02 2.729e+02 4.113e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-14 10:18:14,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-14 10:18:18,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112865.83333333333, ans=0.1 2024-09-14 10:18:22,403 INFO [train.py:1198] (1/2) Epoch 7, batch 1500, loss[loss=0.2998, ctc_loss=0.2119, cr_loss=0.4398, over 20896.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2101, cr_loss=0.4115, over 4083209.47 frames. ], batch size: 54, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:18:46,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-14 10:18:51,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=112950.83333333333, ans=0.125 2024-09-14 10:18:56,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-09-14 10:19:15,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=112979.16666666667, ans=0.2 2024-09-14 10:19:37,620 INFO [train.py:1198] (1/2) Epoch 7, batch 1550, loss[loss=0.3118, ctc_loss=0.2272, cr_loss=0.4232, over 20851.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2103, cr_loss=0.4116, over 4085165.35 frames. ], batch size: 65, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:19:45,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=113035.83333333333, ans=0.0 2024-09-14 10:20:11,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=113092.5, ans=0.125 2024-09-14 10:20:14,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=113092.5, ans=0.125 2024-09-14 10:20:45,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=113149.16666666667, ans=0.125 2024-09-14 10:20:45,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113149.16666666667, ans=0.1 2024-09-14 10:20:47,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.192e+02 2.442e+02 2.769e+02 5.210e+02, threshold=4.884e+02, percent-clipped=1.0 2024-09-14 10:20:56,683 INFO [train.py:1198] (1/2) Epoch 7, batch 1600, loss[loss=0.3238, ctc_loss=0.2415, cr_loss=0.4117, over 18085.00 frames. ], tot_loss[loss=0.2936, ctc_loss=0.2112, cr_loss=0.4119, over 4076915.42 frames. ], batch size: 108, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:21:09,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=113177.5, ans=0.025 2024-09-14 10:21:36,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=113234.16666666667, ans=0.04949747468305833 2024-09-14 10:22:02,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=113290.83333333333, ans=0.07 2024-09-14 10:22:11,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=113319.16666666667, ans=0.07 2024-09-14 10:22:12,820 INFO [train.py:1198] (1/2) Epoch 7, batch 1650, loss[loss=0.2575, ctc_loss=0.1806, cr_loss=0.3845, over 21072.00 frames. ], tot_loss[loss=0.2935, ctc_loss=0.2111, cr_loss=0.412, over 4071915.53 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:22:21,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=113319.16666666667, ans=0.0 2024-09-14 10:22:26,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=113319.16666666667, ans=0.95 2024-09-14 10:22:29,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-09-14 10:23:02,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=113404.16666666667, ans=0.025 2024-09-14 10:23:08,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113404.16666666667, ans=0.1 2024-09-14 10:23:22,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.157e+02 2.384e+02 2.743e+02 4.246e+02, threshold=4.769e+02, percent-clipped=0.0 2024-09-14 10:23:31,837 INFO [train.py:1198] (1/2) Epoch 7, batch 1700, loss[loss=0.3055, ctc_loss=0.2192, cr_loss=0.4316, over 21070.00 frames. ], tot_loss[loss=0.2938, ctc_loss=0.2113, cr_loss=0.4123, over 4073630.64 frames. ], batch size: 59, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:23:54,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113489.16666666667, ans=0.0 2024-09-14 10:24:09,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113517.5, ans=0.1 2024-09-14 10:24:11,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=113517.5, ans=0.04949747468305833 2024-09-14 10:24:20,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=113545.83333333333, ans=0.025 2024-09-14 10:24:47,344 INFO [train.py:1198] (1/2) Epoch 7, batch 1750, loss[loss=0.3085, ctc_loss=0.2204, cr_loss=0.4403, over 20669.00 frames. ], tot_loss[loss=0.2938, ctc_loss=0.2114, cr_loss=0.412, over 4066136.35 frames. ], batch size: 68, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:24:58,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=113602.5, ans=0.0 2024-09-14 10:25:05,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=113630.83333333333, ans=0.0 2024-09-14 10:25:34,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=113687.5, ans=0.2 2024-09-14 10:25:56,980 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.134e+02 2.288e+02 2.521e+02 6.349e+02, threshold=4.577e+02, percent-clipped=1.0 2024-09-14 10:26:05,982 INFO [train.py:1198] (1/2) Epoch 7, batch 1800, loss[loss=0.2857, ctc_loss=0.2034, cr_loss=0.4114, over 20802.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2105, cr_loss=0.4112, over 4073840.52 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:26:16,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2024-09-14 10:26:29,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113772.5, ans=0.1 2024-09-14 10:27:02,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-14 10:27:09,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113857.5, ans=0.125 2024-09-14 10:27:11,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113857.5, ans=0.125 2024-09-14 10:27:12,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113857.5, ans=0.1 2024-09-14 10:27:15,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=113857.5, ans=0.125 2024-09-14 10:27:18,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=113857.5, ans=0.0 2024-09-14 10:27:21,553 INFO [train.py:1198] (1/2) Epoch 7, batch 1850, loss[loss=0.2691, ctc_loss=0.1943, cr_loss=0.3741, over 21072.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2097, cr_loss=0.4097, over 4075764.41 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 16.0 2024-09-14 10:27:21,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113885.83333333333, ans=0.125 2024-09-14 10:27:38,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=113914.16666666667, ans=0.2 2024-09-14 10:27:55,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=113942.5, ans=0.0 2024-09-14 10:28:13,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-14 10:28:21,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=113999.16666666667, ans=0.0 2024-09-14 10:28:30,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.142e+02 2.441e+02 2.773e+02 5.535e+02, threshold=4.882e+02, percent-clipped=1.0 2024-09-14 10:28:30,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=113999.16666666667, ans=0.0 2024-09-14 10:28:36,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=114027.5, ans=0.025 2024-09-14 10:28:37,789 INFO [train.py:1198] (1/2) Epoch 7, batch 1900, loss[loss=0.323, ctc_loss=0.2295, cr_loss=0.4676, over 20879.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2102, cr_loss=0.4102, over 4069414.28 frames. ], batch size: 57, lr: 1.22e-02, grad_scale: 16.0 2024-09-14 10:28:41,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=114027.5, ans=0.025 2024-09-14 10:29:56,223 INFO [train.py:1198] (1/2) Epoch 7, batch 1950, loss[loss=0.2861, ctc_loss=0.2043, cr_loss=0.4091, over 20968.00 frames. ], tot_loss[loss=0.2914, ctc_loss=0.2093, cr_loss=0.4105, over 4081925.68 frames. ], batch size: 58, lr: 1.21e-02, grad_scale: 16.0 2024-09-14 10:29:58,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.07 vs. limit=10.0 2024-09-14 10:30:07,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=114169.16666666667, ans=0.0 2024-09-14 10:30:16,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=114197.5, ans=0.0 2024-09-14 10:30:18,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.51 vs. limit=10.0 2024-09-14 10:31:07,340 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.120e+02 2.285e+02 2.612e+02 4.419e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-14 10:31:15,021 INFO [train.py:1198] (1/2) Epoch 7, batch 2000, loss[loss=0.2881, ctc_loss=0.2047, cr_loss=0.417, over 21022.00 frames. ], tot_loss[loss=0.2911, ctc_loss=0.2091, cr_loss=0.4098, over 4088105.88 frames. ], batch size: 62, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:31:22,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-14 10:31:34,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=114339.16666666667, ans=0.0 2024-09-14 10:31:35,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=114339.16666666667, ans=0.125 2024-09-14 10:31:49,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=114367.5, ans=0.125 2024-09-14 10:32:18,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114424.16666666667, ans=0.125 2024-09-14 10:32:22,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=114424.16666666667, ans=0.0 2024-09-14 10:32:22,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114424.16666666667, ans=0.0 2024-09-14 10:32:28,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-09-14 10:32:31,343 INFO [train.py:1198] (1/2) Epoch 7, batch 2050, loss[loss=0.2973, ctc_loss=0.211, cr_loss=0.4313, over 20972.00 frames. ], tot_loss[loss=0.2902, ctc_loss=0.2083, cr_loss=0.4093, over 4102383.10 frames. ], batch size: 64, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:32:32,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=114452.5, ans=0.04949747468305833 2024-09-14 10:32:36,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=114452.5, ans=0.125 2024-09-14 10:32:54,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114480.83333333333, ans=0.1 2024-09-14 10:32:57,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=114480.83333333333, ans=0.025 2024-09-14 10:33:18,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.02 vs. limit=10.0 2024-09-14 10:33:18,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=114537.5, ans=0.0 2024-09-14 10:33:25,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114537.5, ans=0.1 2024-09-14 10:33:35,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=114565.83333333333, ans=0.125 2024-09-14 10:33:40,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.101e+02 2.291e+02 2.639e+02 5.111e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-14 10:33:47,921 INFO [train.py:1198] (1/2) Epoch 7, batch 2100, loss[loss=0.3544, ctc_loss=0.2626, cr_loss=0.4591, over 20867.00 frames. ], tot_loss[loss=0.2902, ctc_loss=0.2082, cr_loss=0.4097, over 4108517.84 frames. ], batch size: 65, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:34:24,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=114650.83333333333, ans=0.125 2024-09-14 10:35:07,000 INFO [train.py:1198] (1/2) Epoch 7, batch 2150, loss[loss=0.262, ctc_loss=0.1831, cr_loss=0.3948, over 20883.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2074, cr_loss=0.4088, over 4110319.08 frames. ], batch size: 54, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:35:24,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=114764.16666666667, ans=0.05 2024-09-14 10:35:27,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=114764.16666666667, ans=0.2 2024-09-14 10:35:41,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=114792.5, ans=0.125 2024-09-14 10:36:02,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=114820.83333333333, ans=0.2 2024-09-14 10:36:14,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.123e+02 2.357e+02 2.620e+02 4.145e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-14 10:36:19,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=114849.16666666667, ans=0.125 2024-09-14 10:36:22,043 INFO [train.py:1198] (1/2) Epoch 7, batch 2200, loss[loss=0.3025, ctc_loss=0.2148, cr_loss=0.4384, over 21008.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2095, cr_loss=0.4108, over 4083273.77 frames. ], batch size: 61, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:36:54,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=114934.16666666667, ans=0.04949747468305833 2024-09-14 10:37:00,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=114934.16666666667, ans=15.0 2024-09-14 10:37:14,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=114962.5, ans=0.125 2024-09-14 10:37:26,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=114990.83333333333, ans=0.1 2024-09-14 10:37:29,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=114990.83333333333, ans=0.2 2024-09-14 10:37:39,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-14 10:37:40,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-09-14 10:37:41,435 INFO [train.py:1198] (1/2) Epoch 7, batch 2250, loss[loss=0.3313, ctc_loss=0.241, cr_loss=0.4515, over 20963.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2088, cr_loss=0.4102, over 4092784.12 frames. ], batch size: 58, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:38:15,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115075.83333333333, ans=0.125 2024-09-14 10:38:22,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-14 10:38:39,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-14 10:38:47,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=115132.5, ans=0.2 2024-09-14 10:38:49,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.138e+02 2.343e+02 2.723e+02 4.756e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-14 10:38:57,408 INFO [train.py:1198] (1/2) Epoch 7, batch 2300, loss[loss=0.2369, ctc_loss=0.169, cr_loss=0.3395, over 20977.00 frames. ], tot_loss[loss=0.2889, ctc_loss=0.2074, cr_loss=0.4073, over 4088100.56 frames. ], batch size: 48, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:39:23,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-09-14 10:39:27,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115217.5, ans=0.125 2024-09-14 10:39:45,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=115245.83333333333, ans=0.0 2024-09-14 10:40:12,466 INFO [train.py:1198] (1/2) Epoch 7, batch 2350, loss[loss=0.376, ctc_loss=0.2859, cr_loss=0.4504, over 13867.00 frames. ], tot_loss[loss=0.2902, ctc_loss=0.2085, cr_loss=0.4087, over 4084100.95 frames. ], batch size: 150, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:40:26,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115330.83333333333, ans=0.125 2024-09-14 10:40:34,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=115330.83333333333, ans=0.125 2024-09-14 10:40:46,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-14 10:40:48,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-14 10:40:58,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=115359.16666666667, ans=0.015 2024-09-14 10:40:58,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115359.16666666667, ans=0.1 2024-09-14 10:40:58,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=115359.16666666667, ans=0.0 2024-09-14 10:41:00,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.34 vs. limit=15.0 2024-09-14 10:41:06,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115387.5, ans=0.125 2024-09-14 10:41:10,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115387.5, ans=0.1 2024-09-14 10:41:13,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115387.5, ans=0.125 2024-09-14 10:41:19,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=115415.83333333333, ans=0.5 2024-09-14 10:41:23,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.116e+02 2.345e+02 2.707e+02 4.552e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-14 10:41:31,624 INFO [train.py:1198] (1/2) Epoch 7, batch 2400, loss[loss=0.2554, ctc_loss=0.1815, cr_loss=0.3695, over 20975.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.208, cr_loss=0.4085, over 4084788.29 frames. ], batch size: 49, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:42:49,853 INFO [train.py:1198] (1/2) Epoch 7, batch 2450, loss[loss=0.2704, ctc_loss=0.1903, cr_loss=0.4007, over 20961.00 frames. ], tot_loss[loss=0.2894, ctc_loss=0.2077, cr_loss=0.4085, over 4075003.30 frames. ], batch size: 48, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:42:57,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115585.83333333333, ans=0.125 2024-09-14 10:43:04,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115614.16666666667, ans=0.125 2024-09-14 10:43:28,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=115642.5, ans=0.125 2024-09-14 10:43:46,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115670.83333333333, ans=0.1 2024-09-14 10:43:51,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=115699.16666666667, ans=0.0 2024-09-14 10:43:58,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.055e+02 2.185e+02 2.524e+02 3.228e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-14 10:44:03,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=115699.16666666667, ans=0.125 2024-09-14 10:44:06,161 INFO [train.py:1198] (1/2) Epoch 7, batch 2500, loss[loss=0.2578, ctc_loss=0.1794, cr_loss=0.392, over 20970.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2068, cr_loss=0.408, over 4083720.39 frames. ], batch size: 50, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:44:26,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-14 10:44:38,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115784.16666666667, ans=0.1 2024-09-14 10:44:54,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=115812.5, ans=0.2 2024-09-14 10:44:56,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=115812.5, ans=0.125 2024-09-14 10:44:56,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=115812.5, ans=0.125 2024-09-14 10:44:56,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=115812.5, ans=0.0 2024-09-14 10:45:22,097 INFO [train.py:1198] (1/2) Epoch 7, batch 2550, loss[loss=0.299, ctc_loss=0.2116, cr_loss=0.4371, over 20822.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2067, cr_loss=0.4083, over 4080802.06 frames. ], batch size: 59, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:45:32,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=115869.16666666667, ans=0.04949747468305833 2024-09-14 10:45:48,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=115897.5, ans=0.0 2024-09-14 10:45:57,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=115925.83333333333, ans=0.125 2024-09-14 10:46:33,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.166e+02 2.412e+02 2.780e+02 5.008e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-14 10:46:40,833 INFO [train.py:1198] (1/2) Epoch 7, batch 2600, loss[loss=0.2987, ctc_loss=0.2158, cr_loss=0.4145, over 21064.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2061, cr_loss=0.4075, over 4087522.86 frames. ], batch size: 59, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:47:02,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116039.16666666667, ans=0.125 2024-09-14 10:47:17,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=116067.5, ans=0.2 2024-09-14 10:47:25,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=116095.83333333333, ans=0.0 2024-09-14 10:47:52,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=116124.16666666667, ans=0.2 2024-09-14 10:47:53,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116124.16666666667, ans=0.1 2024-09-14 10:47:56,312 INFO [train.py:1198] (1/2) Epoch 7, batch 2650, loss[loss=0.2508, ctc_loss=0.1756, cr_loss=0.3758, over 19879.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2061, cr_loss=0.4079, over 4095917.57 frames. ], batch size: 44, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:47:59,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=116152.5, ans=0.0 2024-09-14 10:48:01,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=116152.5, ans=0.125 2024-09-14 10:48:20,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-14 10:48:33,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116209.16666666667, ans=0.1 2024-09-14 10:48:39,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.06 vs. limit=10.0 2024-09-14 10:48:54,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=116237.5, ans=0.025 2024-09-14 10:49:07,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.066e+02 2.216e+02 2.463e+02 5.166e+02, threshold=4.432e+02, percent-clipped=1.0 2024-09-14 10:49:15,234 INFO [train.py:1198] (1/2) Epoch 7, batch 2700, loss[loss=0.2641, ctc_loss=0.1951, cr_loss=0.3449, over 21054.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2061, cr_loss=0.4064, over 4086419.95 frames. ], batch size: 53, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:49:27,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=116294.16666666667, ans=0.025 2024-09-14 10:49:27,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=116294.16666666667, ans=0.0 2024-09-14 10:49:48,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116350.83333333333, ans=0.1 2024-09-14 10:49:50,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-14 10:50:22,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=116407.5, ans=0.125 2024-09-14 10:50:25,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=116407.5, ans=0.04949747468305833 2024-09-14 10:50:30,909 INFO [train.py:1198] (1/2) Epoch 7, batch 2750, loss[loss=0.266, ctc_loss=0.1856, cr_loss=0.4022, over 20383.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2056, cr_loss=0.4057, over 4086000.50 frames. ], batch size: 45, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:50:35,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116435.83333333333, ans=0.1 2024-09-14 10:51:16,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=116520.83333333333, ans=0.0 2024-09-14 10:51:25,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=116520.83333333333, ans=0.2 2024-09-14 10:51:38,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.148e+02 2.366e+02 2.624e+02 4.175e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-14 10:51:40,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=116549.16666666667, ans=0.125 2024-09-14 10:51:46,177 INFO [train.py:1198] (1/2) Epoch 7, batch 2800, loss[loss=0.3095, ctc_loss=0.217, cr_loss=0.4627, over 20292.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2072, cr_loss=0.4075, over 4085223.38 frames. ], batch size: 74, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:51:46,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=116577.5, ans=0.0 2024-09-14 10:52:39,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=116662.5, ans=0.125 2024-09-14 10:53:04,518 INFO [train.py:1198] (1/2) Epoch 7, batch 2850, loss[loss=0.2887, ctc_loss=0.209, cr_loss=0.3986, over 20831.00 frames. ], tot_loss[loss=0.2893, ctc_loss=0.2077, cr_loss=0.408, over 4084396.32 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:54:06,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=116832.5, ans=0.125 2024-09-14 10:54:15,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.089e+02 2.446e+02 2.967e+02 4.494e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-14 10:54:16,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-09-14 10:54:22,767 INFO [train.py:1198] (1/2) Epoch 7, batch 2900, loss[loss=0.3079, ctc_loss=0.219, cr_loss=0.4444, over 20060.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2072, cr_loss=0.4072, over 4071199.09 frames. ], batch size: 80, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:54:29,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=116860.83333333333, ans=0.125 2024-09-14 10:54:53,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=116917.5, ans=0.125 2024-09-14 10:55:38,991 INFO [train.py:1198] (1/2) Epoch 7, batch 2950, loss[loss=0.2992, ctc_loss=0.2172, cr_loss=0.41, over 21026.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2074, cr_loss=0.407, over 4068295.86 frames. ], batch size: 63, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:55:42,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.42 vs. limit=10.0 2024-09-14 10:56:26,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-14 10:56:34,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=117087.5, ans=0.125 2024-09-14 10:56:40,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-14 10:56:41,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=117115.83333333333, ans=0.0 2024-09-14 10:56:46,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.145e+02 2.311e+02 2.716e+02 4.685e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-14 10:56:47,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=117115.83333333333, ans=0.2 2024-09-14 10:56:53,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117144.16666666667, ans=0.1 2024-09-14 10:56:54,620 INFO [train.py:1198] (1/2) Epoch 7, batch 3000, loss[loss=0.2809, ctc_loss=0.2001, cr_loss=0.4043, over 20672.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2065, cr_loss=0.407, over 4080474.66 frames. ], batch size: 68, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:56:54,621 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 10:57:16,784 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.06173, ctc_loss=0.06173, cr_loss=8.781e-15, over 944034.00 frames. 2024-09-14 10:57:16,785 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 10:57:41,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=117172.5, ans=0.125 2024-09-14 10:57:44,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=117172.5, ans=0.0 2024-09-14 10:57:45,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=117200.83333333333, ans=0.125 2024-09-14 10:58:06,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-14 10:58:08,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117229.16666666667, ans=0.1 2024-09-14 10:58:22,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=117257.5, ans=0.125 2024-09-14 10:58:29,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=117257.5, ans=0.04949747468305833 2024-09-14 10:58:32,603 INFO [train.py:1198] (1/2) Epoch 7, batch 3050, loss[loss=0.3132, ctc_loss=0.2211, cr_loss=0.4605, over 20873.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2064, cr_loss=0.4075, over 4086581.21 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:58:32,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=117285.83333333333, ans=0.0 2024-09-14 10:59:38,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=117399.16666666667, ans=0.0 2024-09-14 10:59:44,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.268e+02 2.424e+02 2.793e+02 4.428e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-14 10:59:51,832 INFO [train.py:1198] (1/2) Epoch 7, batch 3100, loss[loss=0.2956, ctc_loss=0.2117, cr_loss=0.4192, over 20688.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2068, cr_loss=0.4074, over 4085653.91 frames. ], batch size: 71, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:00:02,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=117427.5, ans=0.125 2024-09-14 11:00:51,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=117540.83333333333, ans=0.2 2024-09-14 11:01:07,786 INFO [train.py:1198] (1/2) Epoch 7, batch 3150, loss[loss=0.2435, ctc_loss=0.1713, cr_loss=0.3606, over 20277.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2066, cr_loss=0.4067, over 4078892.60 frames. ], batch size: 45, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:01:31,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-09-14 11:02:15,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.073e+02 2.278e+02 2.514e+02 3.593e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-14 11:02:22,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=117710.83333333333, ans=10.0 2024-09-14 11:02:23,432 INFO [train.py:1198] (1/2) Epoch 7, batch 3200, loss[loss=0.2504, ctc_loss=0.18, cr_loss=0.3522, over 20997.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2071, cr_loss=0.4078, over 4097603.08 frames. ], batch size: 52, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:02:35,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=117710.83333333333, ans=0.0 2024-09-14 11:02:51,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=117739.16666666667, ans=0.2 2024-09-14 11:03:04,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=117767.5, ans=0.125 2024-09-14 11:03:09,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=117767.5, ans=0.025 2024-09-14 11:03:23,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=117795.83333333333, ans=0.0 2024-09-14 11:03:38,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=117824.16666666667, ans=0.07 2024-09-14 11:03:41,493 INFO [train.py:1198] (1/2) Epoch 7, batch 3250, loss[loss=0.2681, ctc_loss=0.1932, cr_loss=0.3743, over 18978.00 frames. ], tot_loss[loss=0.2893, ctc_loss=0.2074, cr_loss=0.4096, over 4101905.82 frames. ], batch size: 42, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:03:58,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=117880.83333333333, ans=0.2 2024-09-14 11:04:02,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=117880.83333333333, ans=0.07 2024-09-14 11:04:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=117909.16666666667, ans=0.125 2024-09-14 11:04:22,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=117909.16666666667, ans=0.125 2024-09-14 11:04:39,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=117937.5, ans=0.025 2024-09-14 11:04:49,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.120e+02 2.281e+02 2.578e+02 3.524e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-14 11:04:50,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-09-14 11:04:51,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117965.83333333333, ans=0.1 2024-09-14 11:04:51,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=22.5 2024-09-14 11:04:55,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=117994.16666666667, ans=0.125 2024-09-14 11:04:57,212 INFO [train.py:1198] (1/2) Epoch 7, batch 3300, loss[loss=0.2577, ctc_loss=0.1844, cr_loss=0.3668, over 20970.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2079, cr_loss=0.4102, over 4096576.91 frames. ], batch size: 52, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:05:25,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-14 11:05:28,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118022.5, ans=0.125 2024-09-14 11:05:38,913 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:06:15,984 INFO [train.py:1198] (1/2) Epoch 7, batch 3350, loss[loss=0.2892, ctc_loss=0.2094, cr_loss=0.3989, over 20980.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2086, cr_loss=0.411, over 4099146.73 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:06:18,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=118135.83333333333, ans=0.0 2024-09-14 11:07:10,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=118220.83333333333, ans=0.125 2024-09-14 11:07:11,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-14 11:07:24,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.119e+02 2.422e+02 2.854e+02 5.353e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-14 11:07:29,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-14 11:07:31,626 INFO [train.py:1198] (1/2) Epoch 7, batch 3400, loss[loss=0.2931, ctc_loss=0.2106, cr_loss=0.4126, over 20809.00 frames. ], tot_loss[loss=0.2915, ctc_loss=0.2091, cr_loss=0.4123, over 4107812.27 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:07:48,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.56 vs. limit=10.0 2024-09-14 11:08:18,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=118362.5, ans=0.0 2024-09-14 11:08:31,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-14 11:08:50,950 INFO [train.py:1198] (1/2) Epoch 7, batch 3450, loss[loss=0.2917, ctc_loss=0.2133, cr_loss=0.3921, over 20940.00 frames. ], tot_loss[loss=0.2911, ctc_loss=0.2088, cr_loss=0.4114, over 4105036.51 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:08:54,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-14 11:09:26,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=118475.83333333333, ans=0.125 2024-09-14 11:09:32,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=118475.83333333333, ans=0.035 2024-09-14 11:09:58,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=118532.5, ans=0.07 2024-09-14 11:09:59,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.162e+02 2.361e+02 2.765e+02 4.066e+02, threshold=4.723e+02, percent-clipped=0.0 2024-09-14 11:10:06,961 INFO [train.py:1198] (1/2) Epoch 7, batch 3500, loss[loss=0.2732, ctc_loss=0.1978, cr_loss=0.3769, over 20785.00 frames. ], tot_loss[loss=0.2906, ctc_loss=0.2084, cr_loss=0.4108, over 4098119.91 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:10:48,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=118617.5, ans=0.025 2024-09-14 11:10:48,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=118617.5, ans=0.125 2024-09-14 11:11:03,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=118645.83333333333, ans=0.125 2024-09-14 11:11:11,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=118674.16666666667, ans=0.125 2024-09-14 11:11:16,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=118674.16666666667, ans=0.0 2024-09-14 11:11:24,906 INFO [train.py:1198] (1/2) Epoch 7, batch 3550, loss[loss=0.2748, ctc_loss=0.193, cr_loss=0.4087, over 20958.00 frames. ], tot_loss[loss=0.292, ctc_loss=0.2095, cr_loss=0.4123, over 4093785.44 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:12:01,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=118759.16666666667, ans=0.0 2024-09-14 11:12:32,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.194e+02 2.382e+02 2.801e+02 4.228e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-14 11:12:40,386 INFO [train.py:1198] (1/2) Epoch 7, batch 3600, loss[loss=0.3137, ctc_loss=0.2269, cr_loss=0.4339, over 20641.00 frames. ], tot_loss[loss=0.2919, ctc_loss=0.2096, cr_loss=0.4115, over 4070711.50 frames. ], batch size: 68, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:13:26,625 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:13:26,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=118929.16666666667, ans=0.5 2024-09-14 11:13:32,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=118929.16666666667, ans=0.0 2024-09-14 11:13:33,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=118929.16666666667, ans=0.125 2024-09-14 11:13:56,352 INFO [train.py:1198] (1/2) Epoch 7, batch 3650, loss[loss=0.265, ctc_loss=0.1866, cr_loss=0.3923, over 21005.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2102, cr_loss=0.4122, over 4079665.05 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:14:08,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=118985.83333333333, ans=0.035 2024-09-14 11:14:35,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=119042.5, ans=0.125 2024-09-14 11:14:46,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-09-14 11:14:59,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=22.5 2024-09-14 11:15:07,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.101e+02 2.330e+02 2.666e+02 3.785e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-14 11:15:14,990 INFO [train.py:1198] (1/2) Epoch 7, batch 3700, loss[loss=0.3113, ctc_loss=0.2243, cr_loss=0.4346, over 20968.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2099, cr_loss=0.4123, over 4082147.95 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:15:19,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=119127.5, ans=0.2 2024-09-14 11:16:22,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-14 11:16:31,288 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:16:33,745 INFO [train.py:1198] (1/2) Epoch 7, batch 3750, loss[loss=0.308, ctc_loss=0.2216, cr_loss=0.4321, over 20825.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2103, cr_loss=0.412, over 4073666.20 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:16:52,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=119297.5, ans=0.0 2024-09-14 11:16:58,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=119297.5, ans=0.0 2024-09-14 11:17:28,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119354.16666666667, ans=0.1 2024-09-14 11:17:41,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.097e+02 2.294e+02 2.621e+02 4.657e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-14 11:17:46,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=119382.5, ans=0.2 2024-09-14 11:17:48,807 INFO [train.py:1198] (1/2) Epoch 7, batch 3800, loss[loss=0.2946, ctc_loss=0.2094, cr_loss=0.4258, over 21036.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.2106, cr_loss=0.4131, over 4088029.90 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:19:04,577 INFO [train.py:1198] (1/2) Epoch 7, batch 3850, loss[loss=0.2494, ctc_loss=0.1702, cr_loss=0.3957, over 20990.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.209, cr_loss=0.411, over 4087959.93 frames. ], batch size: 52, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:19:28,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-09-14 11:20:02,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=119637.5, ans=0.2 2024-09-14 11:20:02,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=119637.5, ans=0.125 2024-09-14 11:20:16,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.149e+02 2.314e+02 2.701e+02 4.576e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-14 11:20:22,913 INFO [train.py:1198] (1/2) Epoch 7, batch 3900, loss[loss=0.3262, ctc_loss=0.2344, cr_loss=0.4589, over 20673.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2083, cr_loss=0.4106, over 4091711.19 frames. ], batch size: 68, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:20:29,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=119694.16666666667, ans=0.015 2024-09-14 11:21:16,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119779.16666666667, ans=0.0 2024-09-14 11:21:16,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=12.0 2024-09-14 11:21:25,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=119807.5, ans=0.025 2024-09-14 11:21:30,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.62 vs. limit=10.0 2024-09-14 11:21:31,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119807.5, ans=0.1 2024-09-14 11:21:39,028 INFO [train.py:1198] (1/2) Epoch 7, batch 3950, loss[loss=0.3024, ctc_loss=0.2194, cr_loss=0.4148, over 20688.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2079, cr_loss=0.4098, over 4091650.10 frames. ], batch size: 68, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:22:47,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119949.16666666667, ans=0.125 2024-09-14 11:22:51,315 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.127e+02 2.414e+02 2.679e+02 4.173e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-14 11:22:57,402 INFO [train.py:1198] (1/2) Epoch 7, batch 4000, loss[loss=0.2649, ctc_loss=0.1869, cr_loss=0.3898, over 20991.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2079, cr_loss=0.4092, over 4089368.78 frames. ], batch size: 52, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:23:02,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=119977.5, ans=0.2 2024-09-14 11:23:03,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=12.0 2024-09-14 11:23:15,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-09-14 11:23:19,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=120005.83333333333, ans=0.0 2024-09-14 11:23:19,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=120005.83333333333, ans=0.0 2024-09-14 11:23:43,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=120062.5, ans=0.025 2024-09-14 11:23:54,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=120062.5, ans=0.5 2024-09-14 11:23:57,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2024-09-14 11:24:08,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2024-09-14 11:24:13,812 INFO [train.py:1198] (1/2) Epoch 7, batch 4050, loss[loss=0.2567, ctc_loss=0.1776, cr_loss=0.3953, over 20980.00 frames. ], tot_loss[loss=0.2892, ctc_loss=0.2073, cr_loss=0.4091, over 4091621.05 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:24:25,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-14 11:24:33,925 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:24:59,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=120204.16666666667, ans=0.2 2024-09-14 11:25:23,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.146e+02 2.354e+02 2.750e+02 3.593e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-14 11:25:29,722 INFO [train.py:1198] (1/2) Epoch 7, batch 4100, loss[loss=0.2691, ctc_loss=0.1893, cr_loss=0.3992, over 21043.00 frames. ], tot_loss[loss=0.2892, ctc_loss=0.2073, cr_loss=0.4092, over 4092280.32 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:25:31,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=120260.83333333333, ans=0.125 2024-09-14 11:26:20,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=120345.83333333333, ans=0.025 2024-09-14 11:26:28,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-09-14 11:26:44,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=120374.16666666667, ans=0.0 2024-09-14 11:26:49,162 INFO [train.py:1198] (1/2) Epoch 7, batch 4150, loss[loss=0.3136, ctc_loss=0.227, cr_loss=0.4328, over 20674.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2065, cr_loss=0.4073, over 4080665.66 frames. ], batch size: 71, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:27:04,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=120430.83333333333, ans=0.2 2024-09-14 11:27:16,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=120430.83333333333, ans=0.2 2024-09-14 11:28:01,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.172e+02 2.521e+02 2.902e+02 4.553e+02, threshold=5.042e+02, percent-clipped=0.0 2024-09-14 11:28:07,616 INFO [train.py:1198] (1/2) Epoch 7, batch 4200, loss[loss=0.3141, ctc_loss=0.2268, cr_loss=0.4366, over 21049.00 frames. ], tot_loss[loss=0.2869, ctc_loss=0.2055, cr_loss=0.4068, over 4099302.04 frames. ], batch size: 62, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:28:12,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=120544.16666666667, ans=0.0 2024-09-14 11:28:59,340 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:29:22,770 INFO [train.py:1198] (1/2) Epoch 7, batch 4250, loss[loss=0.2644, ctc_loss=0.1844, cr_loss=0.3999, over 21055.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2061, cr_loss=0.4072, over 4087248.50 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:29:38,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=120714.16666666667, ans=0.125 2024-09-14 11:29:44,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=120714.16666666667, ans=0.125 2024-09-14 11:29:45,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=120714.16666666667, ans=0.125 2024-09-14 11:29:52,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=120742.5, ans=0.125 2024-09-14 11:29:53,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=120742.5, ans=0.0 2024-09-14 11:30:02,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=120742.5, ans=0.015 2024-09-14 11:30:32,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.136e+02 2.310e+02 2.620e+02 1.080e+03, threshold=4.620e+02, percent-clipped=1.0 2024-09-14 11:30:38,396 INFO [train.py:1198] (1/2) Epoch 7, batch 4300, loss[loss=0.2887, ctc_loss=0.2049, cr_loss=0.4189, over 21075.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2059, cr_loss=0.4071, over 4080478.56 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:30:51,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=120827.5, ans=0.07 2024-09-14 11:30:53,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=120855.83333333333, ans=0.0 2024-09-14 11:31:01,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=120855.83333333333, ans=0.125 2024-09-14 11:31:32,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=120912.5, ans=0.0 2024-09-14 11:31:34,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=120912.5, ans=0.0 2024-09-14 11:31:55,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=120969.16666666667, ans=0.125 2024-09-14 11:31:55,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=120969.16666666667, ans=0.125 2024-09-14 11:31:56,613 INFO [train.py:1198] (1/2) Epoch 7, batch 4350, loss[loss=0.3291, ctc_loss=0.24, cr_loss=0.4459, over 20233.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2069, cr_loss=0.4092, over 4091077.90 frames. ], batch size: 74, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:31:57,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2024-09-14 11:32:09,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=120969.16666666667, ans=0.125 2024-09-14 11:32:12,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=120997.5, ans=15.0 2024-09-14 11:32:23,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120997.5, ans=0.0 2024-09-14 11:32:54,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=121054.16666666667, ans=0.025 2024-09-14 11:32:55,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=121082.5, ans=0.04949747468305833 2024-09-14 11:33:06,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.123e+02 2.299e+02 2.704e+02 4.882e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-14 11:33:09,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-14 11:33:12,244 INFO [train.py:1198] (1/2) Epoch 7, batch 4400, loss[loss=0.299, ctc_loss=0.2176, cr_loss=0.4069, over 20828.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2067, cr_loss=0.409, over 4106970.68 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:33:15,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=121110.83333333333, ans=0.125 2024-09-14 11:33:15,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121110.83333333333, ans=0.1 2024-09-14 11:33:18,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=121110.83333333333, ans=0.1 2024-09-14 11:33:23,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=121110.83333333333, ans=0.025 2024-09-14 11:33:24,746 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:34:00,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-14 11:34:22,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121224.16666666667, ans=0.125 2024-09-14 11:34:30,702 INFO [train.py:1198] (1/2) Epoch 7, batch 4450, loss[loss=0.2615, ctc_loss=0.1873, cr_loss=0.3709, over 20792.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2047, cr_loss=0.4063, over 4107515.13 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:34:46,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=121280.83333333333, ans=0.125 2024-09-14 11:34:50,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=121280.83333333333, ans=0.05 2024-09-14 11:35:25,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2024-09-14 11:35:41,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.113e+02 2.333e+02 2.634e+02 4.859e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-14 11:35:43,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=121365.83333333333, ans=0.2 2024-09-14 11:35:46,204 INFO [train.py:1198] (1/2) Epoch 7, batch 4500, loss[loss=0.2415, ctc_loss=0.1714, cr_loss=0.3504, over 19963.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.205, cr_loss=0.4069, over 4108070.05 frames. ], batch size: 44, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:36:39,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121479.16666666667, ans=0.1 2024-09-14 11:37:02,239 INFO [train.py:1198] (1/2) Epoch 7, batch 4550, loss[loss=0.2713, ctc_loss=0.1946, cr_loss=0.3835, over 20980.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2052, cr_loss=0.4077, over 4102834.42 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:37:30,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=121564.16666666667, ans=0.2 2024-09-14 11:37:44,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=121592.5, ans=0.0 2024-09-14 11:38:08,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=121649.16666666667, ans=0.025 2024-09-14 11:38:17,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.164e+02 2.393e+02 2.614e+02 3.644e+02, threshold=4.786e+02, percent-clipped=0.0 2024-09-14 11:38:21,879 INFO [train.py:1198] (1/2) Epoch 7, batch 4600, loss[loss=0.3374, ctc_loss=0.251, cr_loss=0.4319, over 13776.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2044, cr_loss=0.4062, over 4089909.92 frames. ], batch size: 149, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:38:37,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121705.83333333333, ans=0.0 2024-09-14 11:38:48,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=121705.83333333333, ans=0.2 2024-09-14 11:38:51,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=121734.16666666667, ans=0.125 2024-09-14 11:39:14,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121762.5, ans=0.125 2024-09-14 11:39:24,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=121790.83333333333, ans=0.125 2024-09-14 11:39:41,030 INFO [train.py:1198] (1/2) Epoch 7, batch 4650, loss[loss=0.2396, ctc_loss=0.1718, cr_loss=0.3392, over 20946.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2054, cr_loss=0.4076, over 4095862.43 frames. ], batch size: 49, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:39:48,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=121819.16666666667, ans=0.125 2024-09-14 11:39:51,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=121819.16666666667, ans=0.07 2024-09-14 11:40:19,157 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:40:26,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=121904.16666666667, ans=0.0 2024-09-14 11:40:30,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-14 11:40:38,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=121904.16666666667, ans=0.2 2024-09-14 11:40:43,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=22.5 2024-09-14 11:40:46,295 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:40:51,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.133e+02 2.347e+02 2.729e+02 3.956e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-14 11:40:56,633 INFO [train.py:1198] (1/2) Epoch 7, batch 4700, loss[loss=0.3362, ctc_loss=0.2477, cr_loss=0.4428, over 18240.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2053, cr_loss=0.4068, over 4090323.49 frames. ], batch size: 108, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:40:57,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=121960.83333333333, ans=0.2 2024-09-14 11:41:37,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=122017.5, ans=0.125 2024-09-14 11:42:11,393 INFO [train.py:1198] (1/2) Epoch 7, batch 4750, loss[loss=0.2919, ctc_loss=0.2038, cr_loss=0.4407, over 20829.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.2046, cr_loss=0.4061, over 4098083.09 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:42:13,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=122102.5, ans=0.0 2024-09-14 11:42:19,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=122102.5, ans=0.125 2024-09-14 11:42:25,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=122130.83333333333, ans=0.125 2024-09-14 11:42:29,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=122130.83333333333, ans=0.0 2024-09-14 11:42:47,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-14 11:42:56,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=122187.5, ans=0.125 2024-09-14 11:43:11,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-14 11:43:14,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=122215.83333333333, ans=0.125 2024-09-14 11:43:24,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.214e+02 2.404e+02 2.623e+02 6.465e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-14 11:43:28,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122244.16666666667, ans=0.1 2024-09-14 11:43:29,335 INFO [train.py:1198] (1/2) Epoch 7, batch 4800, loss[loss=0.3007, ctc_loss=0.2162, cr_loss=0.4226, over 20814.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2064, cr_loss=0.4086, over 4089221.38 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:44:48,776 INFO [train.py:1198] (1/2) Epoch 7, batch 4850, loss[loss=0.2995, ctc_loss=0.2167, cr_loss=0.4143, over 21011.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2063, cr_loss=0.4082, over 4082636.79 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:45:59,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.137e+02 2.416e+02 2.758e+02 3.946e+02, threshold=4.833e+02, percent-clipped=0.0 2024-09-14 11:46:03,544 INFO [train.py:1198] (1/2) Epoch 7, batch 4900, loss[loss=0.3089, ctc_loss=0.2252, cr_loss=0.4185, over 20341.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2063, cr_loss=0.4084, over 4094255.89 frames. ], batch size: 74, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:46:04,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-09-14 11:46:21,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-14 11:46:24,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122555.83333333333, ans=0.1 2024-09-14 11:46:39,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-14 11:46:59,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=122612.5, ans=0.125 2024-09-14 11:47:14,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=122640.83333333333, ans=0.0 2024-09-14 11:47:17,309 INFO [train.py:1198] (1/2) Epoch 7, batch 4950, loss[loss=0.2969, ctc_loss=0.2169, cr_loss=0.4, over 21021.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2064, cr_loss=0.4084, over 4102202.65 frames. ], batch size: 63, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:47:19,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122669.16666666667, ans=0.1 2024-09-14 11:47:31,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=122697.5, ans=0.125 2024-09-14 11:47:39,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=122697.5, ans=0.025 2024-09-14 11:47:45,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-14 11:48:03,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=122754.16666666667, ans=0.125 2024-09-14 11:48:10,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122754.16666666667, ans=0.125 2024-09-14 11:48:27,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.097e+02 2.353e+02 2.641e+02 7.950e+02, threshold=4.707e+02, percent-clipped=1.0 2024-09-14 11:48:32,126 INFO [train.py:1198] (1/2) Epoch 7, batch 5000, loss[loss=0.2504, ctc_loss=0.1721, cr_loss=0.3919, over 20952.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.206, cr_loss=0.4076, over 4106715.82 frames. ], batch size: 51, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:48:44,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=122810.83333333333, ans=0.125 2024-09-14 11:48:45,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=122839.16666666667, ans=0.025 2024-09-14 11:48:55,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=122839.16666666667, ans=0.5 2024-09-14 11:49:46,395 INFO [train.py:1198] (1/2) Epoch 7, batch 5050, loss[loss=0.3182, ctc_loss=0.231, cr_loss=0.4356, over 20176.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2062, cr_loss=0.4082, over 4107878.48 frames. ], batch size: 80, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:49:54,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=122952.5, ans=0.125 2024-09-14 11:49:56,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=122952.5, ans=0.125 2024-09-14 11:50:11,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122980.83333333333, ans=0.125 2024-09-14 11:50:42,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-14 11:50:49,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123065.83333333333, ans=0.1 2024-09-14 11:50:56,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.133e+02 2.354e+02 2.967e+02 5.718e+02, threshold=4.709e+02, percent-clipped=3.0 2024-09-14 11:51:00,820 INFO [train.py:1198] (1/2) Epoch 7, batch 5100, loss[loss=0.2865, ctc_loss=0.1984, cr_loss=0.4404, over 20885.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2062, cr_loss=0.4086, over 4102094.14 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:51:06,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=123094.16666666667, ans=0.125 2024-09-14 11:51:13,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=123094.16666666667, ans=0.2 2024-09-14 11:51:26,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-14 11:51:34,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=123150.83333333333, ans=0.05 2024-09-14 11:51:34,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=123150.83333333333, ans=0.0 2024-09-14 11:51:58,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-14 11:52:03,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=123207.5, ans=0.2 2024-09-14 11:52:18,044 INFO [train.py:1198] (1/2) Epoch 7, batch 5150, loss[loss=0.2822, ctc_loss=0.1982, cr_loss=0.4197, over 20897.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2064, cr_loss=0.409, over 4105632.68 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:52:43,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=123264.16666666667, ans=0.125 2024-09-14 11:53:14,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=123320.83333333333, ans=0.125 2024-09-14 11:53:27,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.173e+02 2.363e+02 2.756e+02 5.802e+02, threshold=4.725e+02, percent-clipped=3.0 2024-09-14 11:53:28,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-14 11:53:32,060 INFO [train.py:1198] (1/2) Epoch 7, batch 5200, loss[loss=0.2757, ctc_loss=0.1939, cr_loss=0.4094, over 20994.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2062, cr_loss=0.4087, over 4094097.98 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:53:38,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=123377.5, ans=0.125 2024-09-14 11:54:05,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=123434.16666666667, ans=0.125 2024-09-14 11:54:06,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-09-14 11:54:37,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123490.83333333333, ans=0.1 2024-09-14 11:54:46,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-14 11:54:48,576 INFO [train.py:1198] (1/2) Epoch 7, batch 5250, loss[loss=0.3729, ctc_loss=0.2869, cr_loss=0.4301, over 14137.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2061, cr_loss=0.4085, over 4085732.19 frames. ], batch size: 149, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:55:03,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.54 vs. limit=10.0 2024-09-14 11:55:05,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=123547.5, ans=0.0 2024-09-14 11:55:40,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-09-14 11:55:44,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=123604.16666666667, ans=0.0 2024-09-14 11:55:55,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=123632.5, ans=0.5 2024-09-14 11:55:58,541 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.108e+02 2.293e+02 2.539e+02 4.247e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-14 11:56:02,958 INFO [train.py:1198] (1/2) Epoch 7, batch 5300, loss[loss=0.3062, ctc_loss=0.221, cr_loss=0.426, over 20964.00 frames. ], tot_loss[loss=0.2868, ctc_loss=0.2054, cr_loss=0.4072, over 4097246.19 frames. ], batch size: 64, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:56:07,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=123660.83333333333, ans=0.0 2024-09-14 11:56:07,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=123660.83333333333, ans=0.0 2024-09-14 11:56:08,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=123660.83333333333, ans=0.05 2024-09-14 11:56:29,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-14 11:56:44,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=123717.5, ans=0.025 2024-09-14 11:57:05,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.27 vs. limit=22.5 2024-09-14 11:57:17,108 INFO [train.py:1198] (1/2) Epoch 7, batch 5350, loss[loss=0.2692, ctc_loss=0.1909, cr_loss=0.3913, over 20339.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2041, cr_loss=0.4059, over 4111848.87 frames. ], batch size: 45, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:57:26,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=123802.5, ans=0.2 2024-09-14 11:57:48,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=123859.16666666667, ans=0.125 2024-09-14 11:58:00,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=123887.5, ans=0.0 2024-09-14 11:58:12,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=123887.5, ans=0.125 2024-09-14 11:58:16,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=123915.83333333333, ans=0.125 2024-09-14 11:58:26,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.218e+02 2.456e+02 2.819e+02 4.306e+02, threshold=4.913e+02, percent-clipped=0.0 2024-09-14 11:58:31,055 INFO [train.py:1198] (1/2) Epoch 7, batch 5400, loss[loss=0.323, ctc_loss=0.2335, cr_loss=0.4478, over 20340.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2046, cr_loss=0.4069, over 4118239.67 frames. ], batch size: 74, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:58:35,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-09-14 11:58:37,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=123944.16666666667, ans=0.125 2024-09-14 11:58:52,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=123972.5, ans=0.125 2024-09-14 11:58:53,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=123972.5, ans=0.0 2024-09-14 11:58:56,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-14 11:59:24,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-14 11:59:33,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=124057.5, ans=0.035 2024-09-14 11:59:44,993 INFO [train.py:1198] (1/2) Epoch 7, batch 5450, loss[loss=0.2575, ctc_loss=0.1856, cr_loss=0.3595, over 20892.00 frames. ], tot_loss[loss=0.2849, ctc_loss=0.2037, cr_loss=0.4059, over 4120639.48 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:59:54,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2024-09-14 11:59:55,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=124085.83333333333, ans=0.0 2024-09-14 12:00:04,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=124114.16666666667, ans=0.0 2024-09-14 12:00:17,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=124142.5, ans=0.0 2024-09-14 12:00:18,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124142.5, ans=0.125 2024-09-14 12:00:44,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=124170.83333333333, ans=0.125 2024-09-14 12:00:57,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.125e+02 2.257e+02 2.638e+02 4.621e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-14 12:01:01,671 INFO [train.py:1198] (1/2) Epoch 7, batch 5500, loss[loss=0.264, ctc_loss=0.186, cr_loss=0.39, over 20980.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2036, cr_loss=0.4052, over 4109390.34 frames. ], batch size: 52, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 12:01:33,505 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:02:01,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=124340.83333333333, ans=0.125 2024-09-14 12:02:04,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=124340.83333333333, ans=0.125 2024-09-14 12:02:07,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=124340.83333333333, ans=0.125 2024-09-14 12:02:16,239 INFO [train.py:1198] (1/2) Epoch 7, batch 5550, loss[loss=0.3052, ctc_loss=0.2163, cr_loss=0.4445, over 20931.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.204, cr_loss=0.4062, over 4103310.19 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 12:02:18,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-14 12:02:19,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=124369.16666666667, ans=0.0 2024-09-14 12:02:32,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=124397.5, ans=0.125 2024-09-14 12:02:43,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=124397.5, ans=0.2 2024-09-14 12:02:47,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124425.83333333333, ans=0.1 2024-09-14 12:03:30,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.094e+02 2.256e+02 2.546e+02 3.746e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-14 12:03:31,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=124510.83333333333, ans=0.2 2024-09-14 12:03:33,029 INFO [train.py:1198] (1/2) Epoch 7, batch 5600, loss[loss=0.2778, ctc_loss=0.1957, cr_loss=0.4101, over 20779.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2056, cr_loss=0.4091, over 4101493.08 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:03:51,148 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:03:52,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=124539.16666666667, ans=0.0 2024-09-14 12:03:57,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-14 12:04:03,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=124567.5, ans=0.0 2024-09-14 12:04:17,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=124595.83333333333, ans=0.2 2024-09-14 12:04:46,684 INFO [train.py:1198] (1/2) Epoch 7, batch 5650, loss[loss=0.2923, ctc_loss=0.2074, cr_loss=0.4247, over 20863.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2055, cr_loss=0.409, over 4101464.84 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:04:48,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124652.5, ans=0.125 2024-09-14 12:05:52,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=124765.83333333333, ans=10.0 2024-09-14 12:05:58,340 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.077e+02 2.282e+02 2.474e+02 3.865e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-14 12:06:01,315 INFO [train.py:1198] (1/2) Epoch 7, batch 5700, loss[loss=0.2892, ctc_loss=0.2088, cr_loss=0.4017, over 21048.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.206, cr_loss=0.4099, over 4094714.39 frames. ], batch size: 62, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:06:16,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124822.5, ans=0.1 2024-09-14 12:07:07,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=124907.5, ans=0.125 2024-09-14 12:07:15,887 INFO [train.py:1198] (1/2) Epoch 7, batch 5750, loss[loss=0.282, ctc_loss=0.2022, cr_loss=0.3989, over 20979.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2062, cr_loss=0.4096, over 4100229.99 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:08:14,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125049.16666666667, ans=0.0 2024-09-14 12:08:27,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.111e+02 2.305e+02 2.520e+02 6.733e+02, threshold=4.611e+02, percent-clipped=1.0 2024-09-14 12:08:29,920 INFO [train.py:1198] (1/2) Epoch 7, batch 5800, loss[loss=0.3318, ctc_loss=0.2404, cr_loss=0.457, over 18451.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2063, cr_loss=0.4101, over 4097847.70 frames. ], batch size: 108, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:09:30,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=125190.83333333333, ans=0.125 2024-09-14 12:09:36,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-14 12:09:46,409 INFO [train.py:1198] (1/2) Epoch 7, batch 5850, loss[loss=0.284, ctc_loss=0.204, cr_loss=0.4001, over 20765.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2063, cr_loss=0.4097, over 4106577.31 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:09:48,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=125219.16666666667, ans=0.04949747468305833 2024-09-14 12:10:00,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=125247.5, ans=0.0 2024-09-14 12:10:24,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-14 12:10:57,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.185e+02 2.478e+02 2.800e+02 3.985e+02, threshold=4.957e+02, percent-clipped=0.0 2024-09-14 12:11:00,605 INFO [train.py:1198] (1/2) Epoch 7, batch 5900, loss[loss=0.2707, ctc_loss=0.1864, cr_loss=0.4213, over 20958.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2057, cr_loss=0.4087, over 4103470.14 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:11:47,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=125445.83333333333, ans=0.125 2024-09-14 12:12:11,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-09-14 12:12:16,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=125502.5, ans=0.125 2024-09-14 12:12:17,244 INFO [train.py:1198] (1/2) Epoch 7, batch 5950, loss[loss=0.2901, ctc_loss=0.2092, cr_loss=0.4042, over 20838.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2047, cr_loss=0.4069, over 4098257.14 frames. ], batch size: 65, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:12:24,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=125502.5, ans=0.0 2024-09-14 12:13:18,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=125615.83333333333, ans=0.125 2024-09-14 12:13:21,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125615.83333333333, ans=0.1 2024-09-14 12:13:28,738 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.117e+02 2.312e+02 2.797e+02 7.411e+02, threshold=4.624e+02, percent-clipped=1.0 2024-09-14 12:13:31,594 INFO [train.py:1198] (1/2) Epoch 7, batch 6000, loss[loss=0.3055, ctc_loss=0.2214, cr_loss=0.4206, over 20953.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2043, cr_loss=0.4066, over 4111878.38 frames. ], batch size: 64, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:13:31,595 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 12:13:53,200 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.06086, ctc_loss=0.06086, cr_loss=9.403e-15, over 944034.00 frames. 2024-09-14 12:13:53,201 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 12:14:10,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-14 12:14:35,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=22.5 2024-09-14 12:14:36,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2024-09-14 12:15:08,838 INFO [train.py:1198] (1/2) Epoch 7, batch 6050, loss[loss=0.31, ctc_loss=0.2219, cr_loss=0.4403, over 21013.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2039, cr_loss=0.4059, over 4101203.17 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:15:41,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=125842.5, ans=0.125 2024-09-14 12:16:19,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.109e+02 2.341e+02 2.718e+02 4.092e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-14 12:16:22,624 INFO [train.py:1198] (1/2) Epoch 7, batch 6100, loss[loss=0.2603, ctc_loss=0.1861, cr_loss=0.3711, over 21003.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2047, cr_loss=0.4076, over 4104009.51 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:17:38,035 INFO [train.py:1198] (1/2) Epoch 7, batch 6150, loss[loss=0.3058, ctc_loss=0.2223, cr_loss=0.4173, over 19221.00 frames. ], tot_loss[loss=0.2856, ctc_loss=0.2043, cr_loss=0.4067, over 4088016.64 frames. ], batch size: 90, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:17:38,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-14 12:17:39,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=126069.16666666667, ans=0.0 2024-09-14 12:18:02,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=126097.5, ans=0.0 2024-09-14 12:18:25,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=126154.16666666667, ans=0.04949747468305833 2024-09-14 12:18:36,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=126182.5, ans=0.0 2024-09-14 12:18:49,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.159e+02 2.331e+02 2.631e+02 5.856e+02, threshold=4.662e+02, percent-clipped=1.0 2024-09-14 12:18:52,013 INFO [train.py:1198] (1/2) Epoch 7, batch 6200, loss[loss=0.2393, ctc_loss=0.1674, cr_loss=0.3596, over 20943.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2051, cr_loss=0.4073, over 4080551.08 frames. ], batch size: 50, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:18:54,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-14 12:19:10,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-14 12:20:00,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=126324.16666666667, ans=0.0 2024-09-14 12:20:06,101 INFO [train.py:1198] (1/2) Epoch 7, batch 6250, loss[loss=0.2763, ctc_loss=0.1929, cr_loss=0.4169, over 20987.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2063, cr_loss=0.4078, over 4063958.06 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:21:10,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=126465.83333333333, ans=0.025 2024-09-14 12:21:17,119 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.181e+02 2.370e+02 2.742e+02 4.115e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-14 12:21:20,027 INFO [train.py:1198] (1/2) Epoch 7, batch 6300, loss[loss=0.3341, ctc_loss=0.2463, cr_loss=0.4387, over 19468.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2105, cr_loss=0.4115, over 4010279.12 frames. ], batch size: 90, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:21:22,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=22.5 2024-09-14 12:21:33,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=126522.5, ans=0.125 2024-09-14 12:22:17,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=126607.5, ans=10.0 2024-09-14 12:22:30,904 INFO [train.py:1198] (1/2) Epoch 7, batch 6350, loss[loss=0.3285, ctc_loss=0.2438, cr_loss=0.4236, over 17935.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2162, cr_loss=0.4146, over 3907442.61 frames. ], batch size: 108, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:22:35,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=126635.83333333333, ans=0.125 2024-09-14 12:22:43,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126664.16666666667, ans=0.125 2024-09-14 12:22:52,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=126664.16666666667, ans=0.0 2024-09-14 12:24:19,570 INFO [train.py:1198] (1/2) Epoch 8, batch 0, loss[loss=0.3028, ctc_loss=0.2205, cr_loss=0.4116, over 20366.00 frames. ], tot_loss[loss=0.3028, ctc_loss=0.2205, cr_loss=0.4116, over 20366.00 frames. ], batch size: 74, lr: 1.09e-02, grad_scale: 32.0 2024-09-14 12:24:19,571 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 12:24:37,918 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.06254, ctc_loss=0.06254, cr_loss=8.928e-15, over 944034.00 frames. 2024-09-14 12:24:37,918 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 12:24:48,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.280e+02 2.532e+02 2.915e+02 3.914e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 12:25:03,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=22.5 2024-09-14 12:25:45,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=126865.33333333333, ans=0.0 2024-09-14 12:25:54,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=126893.66666666667, ans=0.0 2024-09-14 12:25:55,313 INFO [train.py:1198] (1/2) Epoch 8, batch 50, loss[loss=0.2341, ctc_loss=0.1612, cr_loss=0.3643, over 21015.00 frames. ], tot_loss[loss=0.2915, ctc_loss=0.2087, cr_loss=0.4137, over 929523.30 frames. ], batch size: 52, lr: 1.09e-02, grad_scale: 32.0 2024-09-14 12:25:55,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126893.66666666667, ans=0.1 2024-09-14 12:26:06,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=126893.66666666667, ans=0.125 2024-09-14 12:26:36,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2024-09-14 12:26:50,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=126978.66666666667, ans=0.0 2024-09-14 12:26:56,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127007.0, ans=0.1 2024-09-14 12:27:11,375 INFO [train.py:1198] (1/2) Epoch 8, batch 100, loss[loss=0.3495, ctc_loss=0.2607, cr_loss=0.4438, over 14517.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2075, cr_loss=0.4122, over 1630261.59 frames. ], batch size: 149, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:27:23,366 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.092e+02 2.283e+02 2.878e+02 5.109e+02, threshold=4.566e+02, percent-clipped=1.0 2024-09-14 12:27:29,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127063.66666666667, ans=0.1 2024-09-14 12:27:52,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=127092.0, ans=0.125 2024-09-14 12:28:26,952 INFO [train.py:1198] (1/2) Epoch 8, batch 150, loss[loss=0.2492, ctc_loss=0.1747, cr_loss=0.3727, over 20941.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.208, cr_loss=0.4139, over 2165009.26 frames. ], batch size: 49, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:29:11,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127233.66666666667, ans=0.125 2024-09-14 12:29:40,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=127290.33333333333, ans=0.125 2024-09-14 12:29:45,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=127318.66666666667, ans=0.0 2024-09-14 12:29:46,023 INFO [train.py:1198] (1/2) Epoch 8, batch 200, loss[loss=0.3056, ctc_loss=0.2185, cr_loss=0.4358, over 20418.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2044, cr_loss=0.4102, over 2599563.31 frames. ], batch size: 74, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:29:58,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.081e+02 2.253e+02 2.626e+02 5.255e+02, threshold=4.506e+02, percent-clipped=1.0 2024-09-14 12:29:58,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=12.0 2024-09-14 12:30:43,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=127403.66666666667, ans=10.0 2024-09-14 12:30:48,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2024-09-14 12:30:51,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127432.0, ans=0.0 2024-09-14 12:31:05,158 INFO [train.py:1198] (1/2) Epoch 8, batch 250, loss[loss=0.3108, ctc_loss=0.2238, cr_loss=0.4348, over 20611.00 frames. ], tot_loss[loss=0.2845, ctc_loss=0.2031, cr_loss=0.4071, over 2928451.59 frames. ], batch size: 68, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:31:45,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=127517.0, ans=0.0 2024-09-14 12:32:04,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=127573.66666666667, ans=0.125 2024-09-14 12:32:17,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=127573.66666666667, ans=0.05 2024-09-14 12:32:20,665 INFO [train.py:1198] (1/2) Epoch 8, batch 300, loss[loss=0.2874, ctc_loss=0.201, cr_loss=0.4321, over 21004.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2043, cr_loss=0.4086, over 3182501.81 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:32:28,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=127602.0, ans=0.0 2024-09-14 12:32:32,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.046e+02 2.173e+02 2.391e+02 4.775e+02, threshold=4.346e+02, percent-clipped=1.0 2024-09-14 12:32:37,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=127630.33333333333, ans=0.025 2024-09-14 12:33:18,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=127687.0, ans=0.025 2024-09-14 12:33:21,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=127715.33333333333, ans=0.125 2024-09-14 12:33:22,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=127715.33333333333, ans=0.125 2024-09-14 12:33:35,996 INFO [train.py:1198] (1/2) Epoch 8, batch 350, loss[loss=0.3367, ctc_loss=0.2489, cr_loss=0.439, over 13926.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2061, cr_loss=0.41, over 3361664.48 frames. ], batch size: 149, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:33:42,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=127743.66666666667, ans=0.0 2024-09-14 12:34:29,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=127828.66666666667, ans=0.2 2024-09-14 12:34:41,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=127857.0, ans=0.125 2024-09-14 12:34:55,075 INFO [train.py:1198] (1/2) Epoch 8, batch 400, loss[loss=0.3058, ctc_loss=0.2205, cr_loss=0.4263, over 21008.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2059, cr_loss=0.4096, over 3524689.13 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:35:02,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=127885.33333333333, ans=0.0 2024-09-14 12:35:03,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=127885.33333333333, ans=6.0 2024-09-14 12:35:07,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.102e+02 2.355e+02 2.791e+02 4.628e+02, threshold=4.709e+02, percent-clipped=1.0 2024-09-14 12:35:25,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127942.0, ans=0.125 2024-09-14 12:36:11,269 INFO [train.py:1198] (1/2) Epoch 8, batch 450, loss[loss=0.2237, ctc_loss=0.1559, cr_loss=0.3391, over 20939.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2045, cr_loss=0.4079, over 3643796.83 frames. ], batch size: 49, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:36:46,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=128083.66666666667, ans=0.125 2024-09-14 12:36:48,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=128083.66666666667, ans=0.0 2024-09-14 12:37:06,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128112.0, ans=0.1 2024-09-14 12:37:11,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-14 12:37:30,536 INFO [train.py:1198] (1/2) Epoch 8, batch 500, loss[loss=0.3515, ctc_loss=0.261, cr_loss=0.4523, over 14843.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2047, cr_loss=0.4084, over 3739956.54 frames. ], batch size: 150, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:37:35,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=128168.66666666667, ans=0.125 2024-09-14 12:37:41,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128168.66666666667, ans=0.1 2024-09-14 12:37:42,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.135e+02 2.293e+02 2.573e+02 3.481e+02, threshold=4.587e+02, percent-clipped=0.0 2024-09-14 12:38:11,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=128225.33333333333, ans=0.125 2024-09-14 12:38:13,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=128225.33333333333, ans=15.0 2024-09-14 12:38:46,165 INFO [train.py:1198] (1/2) Epoch 8, batch 550, loss[loss=0.2762, ctc_loss=0.1949, cr_loss=0.4066, over 20983.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.2046, cr_loss=0.4088, over 3816623.05 frames. ], batch size: 64, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:39:18,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=128367.0, ans=0.05 2024-09-14 12:40:02,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-14 12:40:05,085 INFO [train.py:1198] (1/2) Epoch 8, batch 600, loss[loss=0.3362, ctc_loss=0.2571, cr_loss=0.3952, over 14613.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2037, cr_loss=0.4076, over 3883182.01 frames. ], batch size: 150, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:40:10,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=128452.0, ans=0.125 2024-09-14 12:40:17,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 2.068e+02 2.286e+02 2.632e+02 3.343e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-14 12:41:10,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-14 12:41:19,960 INFO [train.py:1198] (1/2) Epoch 8, batch 650, loss[loss=0.3405, ctc_loss=0.2475, cr_loss=0.4652, over 18417.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2036, cr_loss=0.4076, over 3914207.14 frames. ], batch size: 108, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:41:32,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2024-09-14 12:41:32,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2024-09-14 12:41:40,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128622.0, ans=0.1 2024-09-14 12:41:57,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128650.33333333333, ans=0.125 2024-09-14 12:42:08,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=128678.66666666667, ans=0.125 2024-09-14 12:42:20,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=128678.66666666667, ans=0.125 2024-09-14 12:42:30,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128707.0, ans=0.125 2024-09-14 12:42:38,728 INFO [train.py:1198] (1/2) Epoch 8, batch 700, loss[loss=0.2625, ctc_loss=0.1892, cr_loss=0.3666, over 20943.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2038, cr_loss=0.408, over 3949863.50 frames. ], batch size: 50, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:42:50,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.090e+02 2.311e+02 2.574e+02 3.363e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-14 12:42:55,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=128763.66666666667, ans=0.0 2024-09-14 12:43:03,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=128763.66666666667, ans=0.2 2024-09-14 12:43:54,513 INFO [train.py:1198] (1/2) Epoch 8, batch 750, loss[loss=0.2698, ctc_loss=0.1913, cr_loss=0.3924, over 20986.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.203, cr_loss=0.4061, over 3975142.71 frames. ], batch size: 51, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:44:21,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=128905.33333333333, ans=0.125 2024-09-14 12:44:29,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128933.66666666667, ans=0.1 2024-09-14 12:44:38,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=128962.0, ans=0.125 2024-09-14 12:45:09,421 INFO [train.py:1198] (1/2) Epoch 8, batch 800, loss[loss=0.3219, ctc_loss=0.2333, cr_loss=0.443, over 20671.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2032, cr_loss=0.4071, over 4006741.21 frames. ], batch size: 68, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:45:21,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.080e+02 2.241e+02 2.579e+02 3.337e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 12:45:44,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129075.33333333333, ans=0.0 2024-09-14 12:46:20,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-14 12:46:27,509 INFO [train.py:1198] (1/2) Epoch 8, batch 850, loss[loss=0.324, ctc_loss=0.2355, cr_loss=0.4424, over 20108.00 frames. ], tot_loss[loss=0.2844, ctc_loss=0.203, cr_loss=0.4072, over 4027416.21 frames. ], batch size: 80, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:46:36,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129160.33333333333, ans=0.1 2024-09-14 12:47:04,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=129217.0, ans=0.0 2024-09-14 12:47:07,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=129217.0, ans=0.125 2024-09-14 12:47:43,061 INFO [train.py:1198] (1/2) Epoch 8, batch 900, loss[loss=0.3142, ctc_loss=0.2268, cr_loss=0.4367, over 20093.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2028, cr_loss=0.4066, over 4043001.17 frames. ], batch size: 80, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:47:58,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.030e+02 2.200e+02 2.387e+02 3.357e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-14 12:48:32,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=129387.0, ans=0.125 2024-09-14 12:48:32,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129387.0, ans=0.1 2024-09-14 12:48:36,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=129387.0, ans=0.0 2024-09-14 12:49:00,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129443.66666666667, ans=0.125 2024-09-14 12:49:01,992 INFO [train.py:1198] (1/2) Epoch 8, batch 950, loss[loss=0.2922, ctc_loss=0.2094, cr_loss=0.4138, over 20995.00 frames. ], tot_loss[loss=0.2833, ctc_loss=0.2021, cr_loss=0.406, over 4062072.63 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:49:35,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=129500.33333333333, ans=0.125 2024-09-14 12:49:35,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-09-14 12:50:17,682 INFO [train.py:1198] (1/2) Epoch 8, batch 1000, loss[loss=0.2168, ctc_loss=0.1539, cr_loss=0.3145, over 21011.00 frames. ], tot_loss[loss=0.2838, ctc_loss=0.2025, cr_loss=0.4068, over 4071550.59 frames. ], batch size: 48, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:50:18,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129585.33333333333, ans=0.1 2024-09-14 12:50:21,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129585.33333333333, ans=0.125 2024-09-14 12:50:30,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.115e+02 2.257e+02 2.479e+02 3.705e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-14 12:50:34,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129613.66666666667, ans=0.0 2024-09-14 12:51:07,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=129670.33333333333, ans=0.125 2024-09-14 12:51:08,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129670.33333333333, ans=0.1 2024-09-14 12:51:18,432 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:51:35,668 INFO [train.py:1198] (1/2) Epoch 8, batch 1050, loss[loss=0.2875, ctc_loss=0.2045, cr_loss=0.4152, over 20979.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2031, cr_loss=0.4071, over 4061848.96 frames. ], batch size: 64, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:51:41,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=129727.0, ans=0.125 2024-09-14 12:51:49,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=129755.33333333333, ans=0.05 2024-09-14 12:52:02,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=129755.33333333333, ans=15.0 2024-09-14 12:52:47,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-14 12:52:51,757 INFO [train.py:1198] (1/2) Epoch 8, batch 1100, loss[loss=0.288, ctc_loss=0.2075, cr_loss=0.4027, over 21003.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2024, cr_loss=0.4061, over 4076476.88 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 16.0 2024-09-14 12:53:04,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.164e+02 2.395e+02 2.871e+02 6.856e+02, threshold=4.791e+02, percent-clipped=2.0 2024-09-14 12:53:09,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129897.0, ans=0.125 2024-09-14 12:53:30,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=129925.33333333333, ans=0.125 2024-09-14 12:54:10,449 INFO [train.py:1198] (1/2) Epoch 8, batch 1150, loss[loss=0.2758, ctc_loss=0.195, cr_loss=0.4042, over 20925.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2019, cr_loss=0.4058, over 4070887.90 frames. ], batch size: 60, lr: 1.07e-02, grad_scale: 16.0 2024-09-14 12:54:27,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130038.66666666667, ans=0.125 2024-09-14 12:54:35,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=130038.66666666667, ans=15.0 2024-09-14 12:54:56,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=130095.33333333333, ans=0.2 2024-09-14 12:55:02,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-09-14 12:55:26,053 INFO [train.py:1198] (1/2) Epoch 8, batch 1200, loss[loss=0.3112, ctc_loss=0.2315, cr_loss=0.3986, over 18299.00 frames. ], tot_loss[loss=0.2827, ctc_loss=0.2017, cr_loss=0.405, over 4078618.79 frames. ], batch size: 108, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:55:29,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=130152.0, ans=0.025 2024-09-14 12:55:39,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.063e+02 2.289e+02 2.639e+02 4.616e+02, threshold=4.577e+02, percent-clipped=0.0 2024-09-14 12:56:13,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-14 12:56:27,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=130265.33333333333, ans=0.0 2024-09-14 12:56:42,056 INFO [train.py:1198] (1/2) Epoch 8, batch 1250, loss[loss=0.2791, ctc_loss=0.1977, cr_loss=0.4069, over 20801.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2024, cr_loss=0.4061, over 4084133.39 frames. ], batch size: 53, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:57:09,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-14 12:57:32,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-14 12:57:39,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=130378.66666666667, ans=0.125 2024-09-14 12:57:59,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=130435.33333333333, ans=0.125 2024-09-14 12:58:00,535 INFO [train.py:1198] (1/2) Epoch 8, batch 1300, loss[loss=0.2969, ctc_loss=0.2101, cr_loss=0.4342, over 20024.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2019, cr_loss=0.4058, over 4087496.08 frames. ], batch size: 80, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:58:11,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=130435.33333333333, ans=0.0 2024-09-14 12:58:14,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.044e+02 2.200e+02 2.413e+02 2.893e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-14 12:58:32,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=130492.0, ans=0.1 2024-09-14 12:59:07,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=22.5 2024-09-14 12:59:16,754 INFO [train.py:1198] (1/2) Epoch 8, batch 1350, loss[loss=0.3093, ctc_loss=0.2223, cr_loss=0.4352, over 20978.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2011, cr_loss=0.4056, over 4101513.29 frames. ], batch size: 64, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:59:28,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=130577.0, ans=0.2 2024-09-14 13:00:35,650 INFO [train.py:1198] (1/2) Epoch 8, batch 1400, loss[loss=0.2948, ctc_loss=0.2107, cr_loss=0.42, over 20890.00 frames. ], tot_loss[loss=0.2825, ctc_loss=0.2014, cr_loss=0.4054, over 4095718.96 frames. ], batch size: 54, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:00:49,092 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.137e+02 2.285e+02 2.550e+02 5.423e+02, threshold=4.570e+02, percent-clipped=1.0 2024-09-14 13:01:01,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=130747.0, ans=0.0 2024-09-14 13:01:50,836 INFO [train.py:1198] (1/2) Epoch 8, batch 1450, loss[loss=0.2374, ctc_loss=0.1644, cr_loss=0.3651, over 19939.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2007, cr_loss=0.4048, over 4093085.64 frames. ], batch size: 44, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:02:13,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=130888.66666666667, ans=0.125 2024-09-14 13:02:18,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=130888.66666666667, ans=0.125 2024-09-14 13:03:00,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130973.66666666667, ans=0.1 2024-09-14 13:03:08,956 INFO [train.py:1198] (1/2) Epoch 8, batch 1500, loss[loss=0.2708, ctc_loss=0.1917, cr_loss=0.3959, over 20894.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2004, cr_loss=0.4044, over 4085403.23 frames. ], batch size: 54, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:03:22,500 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.027e+02 2.176e+02 2.400e+02 4.808e+02, threshold=4.351e+02, percent-clipped=1.0 2024-09-14 13:03:26,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=131030.33333333333, ans=0.0 2024-09-14 13:03:31,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=131030.33333333333, ans=0.0 2024-09-14 13:03:44,380 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:04:02,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131087.0, ans=0.1 2024-09-14 13:04:25,288 INFO [train.py:1198] (1/2) Epoch 8, batch 1550, loss[loss=0.3221, ctc_loss=0.2322, cr_loss=0.4494, over 21001.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.2003, cr_loss=0.4035, over 4091149.95 frames. ], batch size: 63, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:04:42,584 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:05:13,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=131228.66666666666, ans=0.125 2024-09-14 13:05:38,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=22.5 2024-09-14 13:05:44,834 INFO [train.py:1198] (1/2) Epoch 8, batch 1600, loss[loss=0.2384, ctc_loss=0.1659, cr_loss=0.3626, over 20962.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1993, cr_loss=0.402, over 4081081.75 frames. ], batch size: 50, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:05:58,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.124e+02 2.287e+02 2.526e+02 8.912e+02, threshold=4.575e+02, percent-clipped=1.0 2024-09-14 13:06:14,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2024-09-14 13:06:33,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=131370.33333333334, ans=0.125 2024-09-14 13:06:56,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=131398.66666666666, ans=0.0 2024-09-14 13:06:57,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2024-09-14 13:06:59,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-09-14 13:07:00,199 INFO [train.py:1198] (1/2) Epoch 8, batch 1650, loss[loss=0.2755, ctc_loss=0.1956, cr_loss=0.3997, over 20967.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1989, cr_loss=0.4017, over 4084106.98 frames. ], batch size: 55, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:07:03,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=131427.0, ans=0.0 2024-09-14 13:07:21,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131455.33333333334, ans=0.1 2024-09-14 13:07:25,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=131455.33333333334, ans=0.125 2024-09-14 13:07:32,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=131483.66666666666, ans=0.0 2024-09-14 13:08:16,084 INFO [train.py:1198] (1/2) Epoch 8, batch 1700, loss[loss=0.3194, ctc_loss=0.2292, cr_loss=0.4507, over 20671.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1985, cr_loss=0.4014, over 4089616.44 frames. ], batch size: 66, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:08:32,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.126e+02 2.303e+02 2.588e+02 5.770e+02, threshold=4.606e+02, percent-clipped=1.0 2024-09-14 13:09:34,894 INFO [train.py:1198] (1/2) Epoch 8, batch 1750, loss[loss=0.3096, ctc_loss=0.2242, cr_loss=0.4266, over 20677.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1996, cr_loss=0.4039, over 4098035.04 frames. ], batch size: 71, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:09:41,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=131710.33333333334, ans=0.025 2024-09-14 13:09:44,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=131710.33333333334, ans=0.125 2024-09-14 13:10:50,904 INFO [train.py:1198] (1/2) Epoch 8, batch 1800, loss[loss=0.2626, ctc_loss=0.179, cr_loss=0.4184, over 20975.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.1994, cr_loss=0.404, over 4100857.98 frames. ], batch size: 58, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:10:55,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=131852.0, ans=0.125 2024-09-14 13:11:07,980 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.117e+02 2.382e+02 2.705e+02 4.221e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-14 13:11:15,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=131880.33333333334, ans=0.125 2024-09-14 13:11:24,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=131908.66666666666, ans=0.0 2024-09-14 13:11:24,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131908.66666666666, ans=0.1 2024-09-14 13:12:09,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2024-09-14 13:12:09,957 INFO [train.py:1198] (1/2) Epoch 8, batch 1850, loss[loss=0.322, ctc_loss=0.2365, cr_loss=0.4277, over 18579.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2007, cr_loss=0.4053, over 4086443.41 frames. ], batch size: 108, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:12:43,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=132050.33333333334, ans=0.125 2024-09-14 13:12:53,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=132078.66666666666, ans=0.125 2024-09-14 13:12:54,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2024-09-14 13:13:14,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=132107.0, ans=0.0 2024-09-14 13:13:25,058 INFO [train.py:1198] (1/2) Epoch 8, batch 1900, loss[loss=0.2772, ctc_loss=0.1966, cr_loss=0.4026, over 20781.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2019, cr_loss=0.4073, over 4082687.30 frames. ], batch size: 56, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:13:38,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.078e+02 2.357e+02 2.693e+02 3.869e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-14 13:13:58,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-14 13:14:39,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=12.0 2024-09-14 13:14:42,908 INFO [train.py:1198] (1/2) Epoch 8, batch 1950, loss[loss=0.3762, ctc_loss=0.286, cr_loss=0.4509, over 14639.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2021, cr_loss=0.4076, over 4084563.74 frames. ], batch size: 149, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:15:02,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=132305.33333333334, ans=0.125 2024-09-14 13:15:09,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=132305.33333333334, ans=0.07 2024-09-14 13:15:26,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=132333.66666666666, ans=0.0 2024-09-14 13:15:29,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=132362.0, ans=0.125 2024-09-14 13:15:29,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132362.0, ans=0.125 2024-09-14 13:15:58,828 INFO [train.py:1198] (1/2) Epoch 8, batch 2000, loss[loss=0.3101, ctc_loss=0.2189, cr_loss=0.4558, over 20653.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2016, cr_loss=0.4071, over 4092244.87 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:16:05,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=132418.66666666666, ans=0.125 2024-09-14 13:16:12,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.034e+02 2.190e+02 2.424e+02 4.015e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-14 13:17:12,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-09-14 13:17:17,441 INFO [train.py:1198] (1/2) Epoch 8, batch 2050, loss[loss=0.2926, ctc_loss=0.2051, cr_loss=0.4378, over 20972.00 frames. ], tot_loss[loss=0.2828, ctc_loss=0.2016, cr_loss=0.406, over 4082578.81 frames. ], batch size: 58, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:17:23,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=132560.33333333334, ans=0.0 2024-09-14 13:17:23,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-14 13:17:40,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=132588.66666666666, ans=0.125 2024-09-14 13:18:04,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=132645.33333333334, ans=0.0 2024-09-14 13:18:05,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-14 13:18:27,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132673.66666666666, ans=0.125 2024-09-14 13:18:33,100 INFO [train.py:1198] (1/2) Epoch 8, batch 2100, loss[loss=0.3152, ctc_loss=0.2256, cr_loss=0.4482, over 20978.00 frames. ], tot_loss[loss=0.2831, ctc_loss=0.2017, cr_loss=0.4066, over 4085338.14 frames. ], batch size: 67, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:18:46,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.167e+02 2.422e+02 2.724e+02 4.702e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-14 13:18:53,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=132730.33333333334, ans=0.0 2024-09-14 13:19:09,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=132758.66666666666, ans=0.0 2024-09-14 13:19:11,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=132758.66666666666, ans=0.125 2024-09-14 13:19:40,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=132815.33333333334, ans=0.0 2024-09-14 13:19:51,613 INFO [train.py:1198] (1/2) Epoch 8, batch 2150, loss[loss=0.2556, ctc_loss=0.1784, cr_loss=0.3861, over 20950.00 frames. ], tot_loss[loss=0.2829, ctc_loss=0.2015, cr_loss=0.4071, over 4098759.45 frames. ], batch size: 49, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:19:53,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=132843.66666666666, ans=0.125 2024-09-14 13:19:59,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=132843.66666666666, ans=0.2 2024-09-14 13:20:16,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=132872.0, ans=0.125 2024-09-14 13:20:39,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=22.5 2024-09-14 13:20:44,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132928.66666666666, ans=0.1 2024-09-14 13:20:58,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=132957.0, ans=0.125 2024-09-14 13:21:02,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=132957.0, ans=0.0 2024-09-14 13:21:07,107 INFO [train.py:1198] (1/2) Epoch 8, batch 2200, loss[loss=0.2887, ctc_loss=0.2014, cr_loss=0.4363, over 20948.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2003, cr_loss=0.405, over 4087814.74 frames. ], batch size: 64, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:21:20,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.645e+02 2.092e+02 2.250e+02 2.431e+02 3.576e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-14 13:21:22,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133013.66666666666, ans=0.1 2024-09-14 13:22:12,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133098.66666666666, ans=0.1 2024-09-14 13:22:26,038 INFO [train.py:1198] (1/2) Epoch 8, batch 2250, loss[loss=0.281, ctc_loss=0.1994, cr_loss=0.4082, over 20886.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1991, cr_loss=0.4041, over 4101307.22 frames. ], batch size: 57, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:22:26,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.56 vs. limit=22.5 2024-09-14 13:22:49,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-14 13:23:17,649 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:23:41,201 INFO [train.py:1198] (1/2) Epoch 8, batch 2300, loss[loss=0.2786, ctc_loss=0.1964, cr_loss=0.4111, over 20783.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1991, cr_loss=0.4042, over 4107950.09 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:23:54,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.101e+02 2.369e+02 2.709e+02 3.780e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-14 13:24:03,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=133297.0, ans=0.0 2024-09-14 13:24:08,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133297.0, ans=0.1 2024-09-14 13:24:33,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=133353.66666666666, ans=0.125 2024-09-14 13:24:57,191 INFO [train.py:1198] (1/2) Epoch 8, batch 2350, loss[loss=0.2497, ctc_loss=0.1774, cr_loss=0.3617, over 21019.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1992, cr_loss=0.4044, over 4107866.19 frames. ], batch size: 48, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:25:00,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133410.33333333334, ans=0.125 2024-09-14 13:25:19,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=133438.66666666666, ans=0.125 2024-09-14 13:25:36,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=133467.0, ans=0.125 2024-09-14 13:25:49,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=133495.33333333334, ans=0.2 2024-09-14 13:25:53,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=133495.33333333334, ans=0.2 2024-09-14 13:26:05,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=133523.66666666666, ans=0.125 2024-09-14 13:26:09,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=133523.66666666666, ans=10.0 2024-09-14 13:26:10,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=133523.66666666666, ans=0.125 2024-09-14 13:26:15,906 INFO [train.py:1198] (1/2) Epoch 8, batch 2400, loss[loss=0.2779, ctc_loss=0.2003, cr_loss=0.388, over 21018.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1997, cr_loss=0.4047, over 4101443.09 frames. ], batch size: 62, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:26:29,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.062e+02 2.325e+02 2.560e+02 3.645e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-14 13:26:55,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=133608.66666666666, ans=0.0 2024-09-14 13:27:31,802 INFO [train.py:1198] (1/2) Epoch 8, batch 2450, loss[loss=0.2517, ctc_loss=0.1765, cr_loss=0.3759, over 21056.00 frames. ], tot_loss[loss=0.2818, ctc_loss=0.2007, cr_loss=0.4054, over 4107236.95 frames. ], batch size: 53, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:27:46,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133722.0, ans=0.125 2024-09-14 13:27:52,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=133722.0, ans=0.125 2024-09-14 13:28:34,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=133778.66666666666, ans=0.0 2024-09-14 13:28:52,468 INFO [train.py:1198] (1/2) Epoch 8, batch 2500, loss[loss=0.2525, ctc_loss=0.1766, cr_loss=0.3796, over 20790.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2005, cr_loss=0.4039, over 4102225.98 frames. ], batch size: 53, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:29:05,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.145e+02 2.275e+02 2.498e+02 4.011e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-14 13:29:18,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133863.66666666666, ans=0.125 2024-09-14 13:29:23,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=133892.0, ans=0.125 2024-09-14 13:29:56,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133948.66666666666, ans=0.1 2024-09-14 13:30:04,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=133948.66666666666, ans=0.125 2024-09-14 13:30:08,589 INFO [train.py:1198] (1/2) Epoch 8, batch 2550, loss[loss=0.2626, ctc_loss=0.1829, cr_loss=0.3985, over 21021.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2009, cr_loss=0.4044, over 4099412.96 frames. ], batch size: 61, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:30:10,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=133977.0, ans=0.125 2024-09-14 13:30:40,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=134033.66666666666, ans=10.0 2024-09-14 13:30:49,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=134033.66666666666, ans=0.125 2024-09-14 13:30:51,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134033.66666666666, ans=0.125 2024-09-14 13:31:26,739 INFO [train.py:1198] (1/2) Epoch 8, batch 2600, loss[loss=0.2891, ctc_loss=0.2062, cr_loss=0.4144, over 21054.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2012, cr_loss=0.4046, over 4092038.91 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:31:40,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.108e+02 2.323e+02 2.656e+02 4.347e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-14 13:31:48,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=134147.0, ans=0.025 2024-09-14 13:32:42,099 INFO [train.py:1198] (1/2) Epoch 8, batch 2650, loss[loss=0.2538, ctc_loss=0.1791, cr_loss=0.3735, over 21002.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2008, cr_loss=0.4045, over 4093941.08 frames. ], batch size: 52, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:33:11,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:33:21,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-14 13:33:52,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-14 13:34:02,474 INFO [train.py:1198] (1/2) Epoch 8, batch 2700, loss[loss=0.3125, ctc_loss=0.2246, cr_loss=0.4397, over 20633.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.2006, cr_loss=0.4043, over 4082118.39 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:34:13,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=134402.0, ans=0.2 2024-09-14 13:34:16,089 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.096e+02 2.304e+02 2.611e+02 4.523e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-14 13:34:32,928 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:34:34,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2024-09-14 13:35:17,965 INFO [train.py:1198] (1/2) Epoch 8, batch 2750, loss[loss=0.295, ctc_loss=0.211, cr_loss=0.42, over 20627.00 frames. ], tot_loss[loss=0.2823, ctc_loss=0.2013, cr_loss=0.4053, over 4088500.03 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:35:19,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-14 13:35:31,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=134572.0, ans=0.125 2024-09-14 13:35:32,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.87 vs. limit=10.0 2024-09-14 13:36:14,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=134628.66666666666, ans=0.2 2024-09-14 13:36:14,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134628.66666666666, ans=0.1 2024-09-14 13:36:31,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134657.0, ans=0.1 2024-09-14 13:36:34,263 INFO [train.py:1198] (1/2) Epoch 8, batch 2800, loss[loss=0.3488, ctc_loss=0.2655, cr_loss=0.4165, over 14122.00 frames. ], tot_loss[loss=0.2829, ctc_loss=0.2018, cr_loss=0.4058, over 4075344.43 frames. ], batch size: 150, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:36:34,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134685.33333333334, ans=0.125 2024-09-14 13:36:37,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=134685.33333333334, ans=0.125 2024-09-14 13:36:48,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.114e+02 2.371e+02 2.689e+02 4.577e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-14 13:37:05,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134713.66666666666, ans=0.1 2024-09-14 13:37:23,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=134770.33333333334, ans=0.125 2024-09-14 13:37:29,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=134770.33333333334, ans=0.0 2024-09-14 13:37:43,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=134798.66666666666, ans=0.125 2024-09-14 13:37:53,365 INFO [train.py:1198] (1/2) Epoch 8, batch 2850, loss[loss=0.2584, ctc_loss=0.1823, cr_loss=0.3805, over 20880.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2003, cr_loss=0.405, over 4091758.47 frames. ], batch size: 54, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:38:23,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-14 13:39:09,898 INFO [train.py:1198] (1/2) Epoch 8, batch 2900, loss[loss=0.2674, ctc_loss=0.1894, cr_loss=0.3904, over 20791.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1996, cr_loss=0.4046, over 4099645.85 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:39:22,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=134968.66666666666, ans=0.2 2024-09-14 13:39:23,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.121e+02 2.307e+02 2.594e+02 4.069e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-14 13:39:34,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-14 13:40:03,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=135053.66666666666, ans=0.125 2024-09-14 13:40:06,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=135053.66666666666, ans=0.2 2024-09-14 13:40:07,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=135053.66666666666, ans=0.2 2024-09-14 13:40:29,831 INFO [train.py:1198] (1/2) Epoch 8, batch 2950, loss[loss=0.3214, ctc_loss=0.2359, cr_loss=0.4276, over 20826.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2004, cr_loss=0.4047, over 4092109.06 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:40:34,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=135110.33333333334, ans=0.2 2024-09-14 13:41:06,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=135167.0, ans=0.0 2024-09-14 13:41:24,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=135195.33333333334, ans=0.125 2024-09-14 13:41:44,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=135252.0, ans=0.0 2024-09-14 13:41:45,800 INFO [train.py:1198] (1/2) Epoch 8, batch 3000, loss[loss=0.2743, ctc_loss=0.2004, cr_loss=0.37, over 21042.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.2, cr_loss=0.4041, over 4098284.24 frames. ], batch size: 56, lr: 1.05e-02, grad_scale: 64.0 2024-09-14 13:41:45,800 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 13:42:16,214 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.05931, ctc_loss=0.05931, cr_loss=9.029e-15, over 944034.00 frames. 2024-09-14 13:42:16,215 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 13:42:29,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.154e+02 2.318e+02 2.651e+02 3.722e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-14 13:43:01,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=135308.66666666666, ans=0.0 2024-09-14 13:43:18,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=22.5 2024-09-14 13:43:34,689 INFO [train.py:1198] (1/2) Epoch 8, batch 3050, loss[loss=0.2876, ctc_loss=0.2092, cr_loss=0.392, over 21024.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.1994, cr_loss=0.4042, over 4099163.80 frames. ], batch size: 61, lr: 1.05e-02, grad_scale: 64.0 2024-09-14 13:43:38,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=135393.66666666666, ans=0.0 2024-09-14 13:44:15,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=135450.33333333334, ans=0.2 2024-09-14 13:44:28,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-14 13:44:50,753 INFO [train.py:1198] (1/2) Epoch 8, batch 3100, loss[loss=0.2563, ctc_loss=0.1799, cr_loss=0.3818, over 21049.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.199, cr_loss=0.4035, over 4096747.18 frames. ], batch size: 56, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:44:56,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135535.33333333334, ans=0.125 2024-09-14 13:45:05,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.084e+02 2.281e+02 2.519e+02 3.592e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-14 13:45:28,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=135592.0, ans=0.125 2024-09-14 13:45:31,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=135592.0, ans=0.025 2024-09-14 13:45:53,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=135648.66666666666, ans=0.125 2024-09-14 13:46:08,444 INFO [train.py:1198] (1/2) Epoch 8, batch 3150, loss[loss=0.3188, ctc_loss=0.2313, cr_loss=0.4379, over 19533.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2008, cr_loss=0.406, over 4081908.55 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:46:58,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135762.0, ans=0.1 2024-09-14 13:47:07,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=135790.33333333334, ans=0.125 2024-09-14 13:47:09,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=22.5 2024-09-14 13:47:24,234 INFO [train.py:1198] (1/2) Epoch 8, batch 3200, loss[loss=0.2955, ctc_loss=0.2043, cr_loss=0.4561, over 21030.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.2003, cr_loss=0.4052, over 4087334.38 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:47:39,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.091e+02 2.295e+02 2.588e+02 5.097e+02, threshold=4.590e+02, percent-clipped=1.0 2024-09-14 13:48:25,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=135932.0, ans=0.125 2024-09-14 13:48:26,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=135932.0, ans=0.125 2024-09-14 13:48:31,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-14 13:48:35,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=135932.0, ans=0.125 2024-09-14 13:48:42,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-14 13:48:42,722 INFO [train.py:1198] (1/2) Epoch 8, batch 3250, loss[loss=0.3265, ctc_loss=0.2379, cr_loss=0.4432, over 19407.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1989, cr_loss=0.4043, over 4098449.14 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:49:42,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=136073.66666666666, ans=0.0 2024-09-14 13:49:54,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-14 13:49:58,967 INFO [train.py:1198] (1/2) Epoch 8, batch 3300, loss[loss=0.3615, ctc_loss=0.2772, cr_loss=0.4216, over 14158.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1982, cr_loss=0.4037, over 4099637.90 frames. ], batch size: 149, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:50:14,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.100e+02 2.272e+02 2.491e+02 3.171e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-14 13:50:29,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=136158.66666666666, ans=0.2 2024-09-14 13:50:32,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136158.66666666666, ans=0.125 2024-09-14 13:50:40,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136158.66666666666, ans=0.1 2024-09-14 13:51:16,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=136243.66666666666, ans=0.125 2024-09-14 13:51:17,248 INFO [train.py:1198] (1/2) Epoch 8, batch 3350, loss[loss=0.2634, ctc_loss=0.1887, cr_loss=0.3734, over 21063.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1982, cr_loss=0.4038, over 4105755.83 frames. ], batch size: 56, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:52:27,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=136357.0, ans=0.125 2024-09-14 13:52:32,978 INFO [train.py:1198] (1/2) Epoch 8, batch 3400, loss[loss=0.283, ctc_loss=0.2021, cr_loss=0.4045, over 21089.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1979, cr_loss=0.4029, over 4114570.46 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:52:48,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.158e+02 2.412e+02 2.883e+02 4.591e+02, threshold=4.824e+02, percent-clipped=1.0 2024-09-14 13:52:51,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=15.0 2024-09-14 13:53:09,815 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:53:11,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=136442.0, ans=0.0 2024-09-14 13:53:11,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=136442.0, ans=0.125 2024-09-14 13:53:48,355 INFO [train.py:1198] (1/2) Epoch 8, batch 3450, loss[loss=0.2998, ctc_loss=0.2149, cr_loss=0.4248, over 21017.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1981, cr_loss=0.4031, over 4102697.99 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:54:05,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=136555.33333333334, ans=0.0 2024-09-14 13:54:06,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-14 13:54:14,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-09-14 13:54:44,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136612.0, ans=0.1 2024-09-14 13:54:47,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.67 vs. limit=10.0 2024-09-14 13:55:00,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=136640.33333333334, ans=0.0 2024-09-14 13:55:06,226 INFO [train.py:1198] (1/2) Epoch 8, batch 3500, loss[loss=0.2171, ctc_loss=0.1548, cr_loss=0.3117, over 21005.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1974, cr_loss=0.4024, over 4104578.85 frames. ], batch size: 48, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:55:06,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2024-09-14 13:55:20,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=136697.0, ans=0.125 2024-09-14 13:55:21,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.050e+02 2.209e+02 2.507e+02 3.525e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-14 13:55:34,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=136697.0, ans=0.125 2024-09-14 13:55:52,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136753.66666666666, ans=0.125 2024-09-14 13:56:17,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=136782.0, ans=0.0 2024-09-14 13:56:22,302 INFO [train.py:1198] (1/2) Epoch 8, batch 3550, loss[loss=0.2463, ctc_loss=0.172, cr_loss=0.3712, over 21014.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1985, cr_loss=0.4038, over 4092125.64 frames. ], batch size: 52, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:56:25,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=136810.33333333334, ans=0.025 2024-09-14 13:56:31,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=136810.33333333334, ans=0.125 2024-09-14 13:56:36,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136838.66666666666, ans=0.1 2024-09-14 13:56:51,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=136867.0, ans=0.5 2024-09-14 13:57:07,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=136867.0, ans=0.125 2024-09-14 13:57:14,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2024-09-14 13:57:28,940 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:57:40,707 INFO [train.py:1198] (1/2) Epoch 8, batch 3600, loss[loss=0.2436, ctc_loss=0.1721, cr_loss=0.3577, over 20965.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1975, cr_loss=0.4025, over 4097291.91 frames. ], batch size: 48, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:57:55,640 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.061e+02 2.168e+02 2.358e+02 4.249e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-14 13:58:50,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=137065.33333333334, ans=0.125 2024-09-14 13:58:55,980 INFO [train.py:1198] (1/2) Epoch 8, batch 3650, loss[loss=0.309, ctc_loss=0.2223, cr_loss=0.4339, over 20650.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1984, cr_loss=0.4032, over 4073869.28 frames. ], batch size: 68, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:00:01,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-14 14:00:14,606 INFO [train.py:1198] (1/2) Epoch 8, batch 3700, loss[loss=0.298, ctc_loss=0.2115, cr_loss=0.4326, over 20859.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1999, cr_loss=0.4033, over 4071790.73 frames. ], batch size: 65, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:00:29,416 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.163e+02 2.326e+02 2.877e+02 5.372e+02, threshold=4.652e+02, percent-clipped=3.0 2024-09-14 14:00:40,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=137263.66666666666, ans=0.025 2024-09-14 14:01:29,720 INFO [train.py:1198] (1/2) Epoch 8, batch 3750, loss[loss=0.2426, ctc_loss=0.1724, cr_loss=0.3508, over 21041.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1989, cr_loss=0.4029, over 4084735.38 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:02:06,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=137433.66666666666, ans=0.0 2024-09-14 14:02:09,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=137433.66666666666, ans=0.2 2024-09-14 14:02:12,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=137433.66666666666, ans=0.0 2024-09-14 14:02:48,068 INFO [train.py:1198] (1/2) Epoch 8, batch 3800, loss[loss=0.2854, ctc_loss=0.2072, cr_loss=0.3912, over 20703.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.199, cr_loss=0.403, over 4081407.00 frames. ], batch size: 71, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:02:53,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=137518.66666666666, ans=0.0 2024-09-14 14:02:54,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=137518.66666666666, ans=0.125 2024-09-14 14:03:03,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.171e+02 2.365e+02 2.666e+02 4.854e+02, threshold=4.729e+02, percent-clipped=1.0 2024-09-14 14:03:37,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=137603.66666666666, ans=0.0 2024-09-14 14:03:51,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=137632.0, ans=0.0 2024-09-14 14:04:04,222 INFO [train.py:1198] (1/2) Epoch 8, batch 3850, loss[loss=0.3049, ctc_loss=0.2165, cr_loss=0.4421, over 20688.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1997, cr_loss=0.4044, over 4079513.48 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:04:06,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=137660.33333333334, ans=0.125 2024-09-14 14:04:26,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=137688.66666666666, ans=0.2 2024-09-14 14:04:33,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137717.0, ans=0.1 2024-09-14 14:04:49,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-09-14 14:05:20,417 INFO [train.py:1198] (1/2) Epoch 8, batch 3900, loss[loss=0.3086, ctc_loss=0.2193, cr_loss=0.4462, over 20684.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1997, cr_loss=0.4038, over 4070092.72 frames. ], batch size: 66, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:05:25,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=137802.0, ans=0.125 2024-09-14 14:05:35,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.096e+02 2.322e+02 2.626e+02 3.932e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-14 14:05:41,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137830.33333333334, ans=0.125 2024-09-14 14:05:56,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-14 14:06:15,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.02 vs. limit=5.0 2024-09-14 14:06:31,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137915.33333333334, ans=0.125 2024-09-14 14:06:38,405 INFO [train.py:1198] (1/2) Epoch 8, batch 3950, loss[loss=0.3009, ctc_loss=0.2126, cr_loss=0.4414, over 20258.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1999, cr_loss=0.4053, over 4078660.28 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:06:41,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=137943.66666666666, ans=0.2 2024-09-14 14:06:54,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-14 14:07:27,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=12.0 2024-09-14 14:07:47,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=138057.0, ans=0.0 2024-09-14 14:07:54,303 INFO [train.py:1198] (1/2) Epoch 8, batch 4000, loss[loss=0.2774, ctc_loss=0.1908, cr_loss=0.4332, over 20917.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1993, cr_loss=0.4054, over 4091901.33 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:08:07,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-09-14 14:08:12,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.071e+02 2.262e+02 2.472e+02 3.920e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-14 14:08:12,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=138113.66666666666, ans=0.025 2024-09-14 14:09:03,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=138198.66666666666, ans=0.0 2024-09-14 14:09:13,519 INFO [train.py:1198] (1/2) Epoch 8, batch 4050, loss[loss=0.2243, ctc_loss=0.1574, cr_loss=0.3343, over 20971.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1989, cr_loss=0.4046, over 4078129.57 frames. ], batch size: 48, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:09:15,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138227.0, ans=0.1 2024-09-14 14:09:19,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=138227.0, ans=0.0 2024-09-14 14:09:30,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=138255.33333333334, ans=0.0 2024-09-14 14:10:11,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138312.0, ans=0.125 2024-09-14 14:10:29,544 INFO [train.py:1198] (1/2) Epoch 8, batch 4100, loss[loss=0.3088, ctc_loss=0.2231, cr_loss=0.4286, over 21000.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1991, cr_loss=0.405, over 4083500.63 frames. ], batch size: 63, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:10:31,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=138368.66666666666, ans=0.125 2024-09-14 14:10:44,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.218e+02 2.557e+02 2.938e+02 5.683e+02, threshold=5.114e+02, percent-clipped=2.0 2024-09-14 14:11:03,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=138425.33333333334, ans=0.0 2024-09-14 14:11:48,044 INFO [train.py:1198] (1/2) Epoch 8, batch 4150, loss[loss=0.2654, ctc_loss=0.1878, cr_loss=0.3879, over 20991.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1985, cr_loss=0.4037, over 4087096.17 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:12:32,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=138595.33333333334, ans=0.0 2024-09-14 14:12:37,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=138595.33333333334, ans=0.125 2024-09-14 14:12:47,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=138623.66666666666, ans=0.04949747468305833 2024-09-14 14:12:56,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=138623.66666666666, ans=0.125 2024-09-14 14:13:03,914 INFO [train.py:1198] (1/2) Epoch 8, batch 4200, loss[loss=0.3474, ctc_loss=0.251, cr_loss=0.4818, over 18301.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1977, cr_loss=0.404, over 4090406.63 frames. ], batch size: 108, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:13:04,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=138652.0, ans=0.125 2024-09-14 14:13:13,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138652.0, ans=0.1 2024-09-14 14:13:20,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.160e+02 2.378e+02 2.746e+02 4.623e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-14 14:13:33,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=138708.66666666666, ans=0.05 2024-09-14 14:13:42,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-09-14 14:13:46,871 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:14:21,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=138793.66666666666, ans=0.0 2024-09-14 14:14:23,009 INFO [train.py:1198] (1/2) Epoch 8, batch 4250, loss[loss=0.2667, ctc_loss=0.1891, cr_loss=0.3879, over 20680.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1973, cr_loss=0.4033, over 4086590.57 frames. ], batch size: 68, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:14:32,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2024-09-14 14:15:05,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138850.33333333334, ans=0.1 2024-09-14 14:15:38,156 INFO [train.py:1198] (1/2) Epoch 8, batch 4300, loss[loss=0.2938, ctc_loss=0.2089, cr_loss=0.4247, over 20682.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1972, cr_loss=0.4035, over 4088404.59 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:15:54,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.101e+02 2.235e+02 2.576e+02 3.487e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-14 14:15:56,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138963.66666666666, ans=0.1 2024-09-14 14:16:19,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138992.0, ans=0.125 2024-09-14 14:16:43,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=139048.66666666666, ans=0.025 2024-09-14 14:16:53,804 INFO [train.py:1198] (1/2) Epoch 8, batch 4350, loss[loss=0.2643, ctc_loss=0.1852, cr_loss=0.3953, over 20951.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1987, cr_loss=0.4044, over 4085729.20 frames. ], batch size: 50, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:16:57,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=139077.0, ans=0.0 2024-09-14 14:17:10,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-09-14 14:17:49,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139162.0, ans=0.1 2024-09-14 14:18:00,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.74 vs. limit=22.5 2024-09-14 14:18:04,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=139190.33333333334, ans=0.0 2024-09-14 14:18:04,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=139190.33333333334, ans=0.025 2024-09-14 14:18:05,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-14 14:18:12,243 INFO [train.py:1198] (1/2) Epoch 8, batch 4400, loss[loss=0.2587, ctc_loss=0.1849, cr_loss=0.369, over 20984.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1989, cr_loss=0.4055, over 4095890.16 frames. ], batch size: 48, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:18:29,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.208e+02 2.365e+02 2.855e+02 4.961e+02, threshold=4.731e+02, percent-clipped=3.0 2024-09-14 14:18:50,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139275.33333333334, ans=0.0 2024-09-14 14:19:03,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-14 14:19:04,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-09-14 14:19:31,122 INFO [train.py:1198] (1/2) Epoch 8, batch 4450, loss[loss=0.3096, ctc_loss=0.2249, cr_loss=0.4237, over 19333.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1988, cr_loss=0.4047, over 4088776.18 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:19:41,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=22.5 2024-09-14 14:19:58,393 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:20:22,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139445.33333333334, ans=0.1 2024-09-14 14:20:46,232 INFO [train.py:1198] (1/2) Epoch 8, batch 4500, loss[loss=0.3097, ctc_loss=0.2236, cr_loss=0.4307, over 18374.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2004, cr_loss=0.4058, over 4085669.34 frames. ], batch size: 108, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:20:46,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-14 14:21:02,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.138e+02 2.348e+02 2.604e+02 6.128e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-14 14:21:10,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=139530.33333333334, ans=10.0 2024-09-14 14:21:27,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=139558.66666666666, ans=0.0 2024-09-14 14:22:01,677 INFO [train.py:1198] (1/2) Epoch 8, batch 4550, loss[loss=0.2533, ctc_loss=0.1776, cr_loss=0.3782, over 20938.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1995, cr_loss=0.4045, over 4086939.75 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:22:08,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=139643.66666666666, ans=0.125 2024-09-14 14:22:38,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=139700.33333333334, ans=0.035 2024-09-14 14:23:17,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=139757.0, ans=0.1 2024-09-14 14:23:20,281 INFO [train.py:1198] (1/2) Epoch 8, batch 4600, loss[loss=0.2651, ctc_loss=0.1846, cr_loss=0.4024, over 20979.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1989, cr_loss=0.4029, over 4094503.46 frames. ], batch size: 51, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:23:35,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=139813.66666666666, ans=0.125 2024-09-14 14:23:37,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.082e+02 2.376e+02 2.603e+02 3.739e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-14 14:24:04,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=139870.33333333334, ans=0.125 2024-09-14 14:24:12,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=139870.33333333334, ans=0.125 2024-09-14 14:24:13,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=139870.33333333334, ans=0.05 2024-09-14 14:24:36,059 INFO [train.py:1198] (1/2) Epoch 8, batch 4650, loss[loss=0.2962, ctc_loss=0.2068, cr_loss=0.4469, over 21075.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1992, cr_loss=0.404, over 4106584.67 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:24:48,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=139927.0, ans=0.125 2024-09-14 14:24:56,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=139955.33333333334, ans=0.0 2024-09-14 14:25:54,671 INFO [train.py:1198] (1/2) Epoch 8, batch 4700, loss[loss=0.2505, ctc_loss=0.1771, cr_loss=0.3673, over 20938.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1983, cr_loss=0.4032, over 4101776.39 frames. ], batch size: 49, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:25:58,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=140068.66666666666, ans=0.2 2024-09-14 14:26:12,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.076e+02 2.230e+02 2.548e+02 4.187e+02, threshold=4.461e+02, percent-clipped=0.0 2024-09-14 14:26:20,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=140097.0, ans=0.125 2024-09-14 14:26:39,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=140153.66666666666, ans=0.0 2024-09-14 14:27:02,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=140182.0, ans=0.07 2024-09-14 14:27:06,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=140182.0, ans=0.125 2024-09-14 14:27:09,560 INFO [train.py:1198] (1/2) Epoch 8, batch 4750, loss[loss=0.2444, ctc_loss=0.1732, cr_loss=0.3563, over 20992.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1987, cr_loss=0.4033, over 4102136.58 frames. ], batch size: 52, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:27:21,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=140210.33333333334, ans=0.0 2024-09-14 14:27:28,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-14 14:27:52,141 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:28:01,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=140295.33333333334, ans=0.0 2024-09-14 14:28:01,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=140295.33333333334, ans=0.125 2024-09-14 14:28:23,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140352.0, ans=0.125 2024-09-14 14:28:24,525 INFO [train.py:1198] (1/2) Epoch 8, batch 4800, loss[loss=0.311, ctc_loss=0.2164, cr_loss=0.4731, over 20883.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1996, cr_loss=0.4042, over 4096001.54 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:28:38,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140380.33333333334, ans=0.125 2024-09-14 14:28:42,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.125e+02 2.335e+02 2.659e+02 3.866e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-14 14:28:46,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-09-14 14:29:07,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2024-09-14 14:29:23,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=140437.0, ans=0.125 2024-09-14 14:29:28,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-09-14 14:29:43,061 INFO [train.py:1198] (1/2) Epoch 8, batch 4850, loss[loss=0.2834, ctc_loss=0.1975, cr_loss=0.4294, over 21016.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1991, cr_loss=0.404, over 4095654.68 frames. ], batch size: 63, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:30:16,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=140550.33333333334, ans=0.0 2024-09-14 14:30:58,698 INFO [train.py:1198] (1/2) Epoch 8, batch 4900, loss[loss=0.2406, ctc_loss=0.1687, cr_loss=0.3595, over 20983.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1986, cr_loss=0.4045, over 4107207.14 frames. ], batch size: 51, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:31:19,768 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.074e+02 2.271e+02 2.512e+02 3.637e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-14 14:31:24,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=140663.66666666666, ans=0.0 2024-09-14 14:31:28,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-09-14 14:31:43,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140692.0, ans=0.1 2024-09-14 14:32:01,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=140748.66666666666, ans=0.125 2024-09-14 14:32:16,009 INFO [train.py:1198] (1/2) Epoch 8, batch 4950, loss[loss=0.2598, ctc_loss=0.1847, cr_loss=0.3754, over 20959.00 frames. ], tot_loss[loss=0.2809, ctc_loss=0.1998, cr_loss=0.4056, over 4092982.85 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:32:29,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140805.33333333334, ans=0.125 2024-09-14 14:32:44,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-14 14:32:54,840 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:33:07,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-09-14 14:33:17,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=140890.33333333334, ans=0.125 2024-09-14 14:33:30,061 INFO [train.py:1198] (1/2) Epoch 8, batch 5000, loss[loss=0.256, ctc_loss=0.1778, cr_loss=0.3914, over 20764.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2003, cr_loss=0.406, over 4084349.20 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:33:47,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.139e+02 2.387e+02 2.772e+02 5.760e+02, threshold=4.774e+02, percent-clipped=2.0 2024-09-14 14:33:51,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=140947.0, ans=0.125 2024-09-14 14:33:58,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140975.33333333334, ans=0.125 2024-09-14 14:34:11,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2024-09-14 14:34:17,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-14 14:34:21,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=141003.66666666666, ans=0.5 2024-09-14 14:34:27,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=141032.0, ans=0.2 2024-09-14 14:34:42,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=141060.33333333334, ans=0.125 2024-09-14 14:34:44,101 INFO [train.py:1198] (1/2) Epoch 8, batch 5050, loss[loss=0.2772, ctc_loss=0.197, cr_loss=0.4011, over 20884.00 frames. ], tot_loss[loss=0.2823, ctc_loss=0.201, cr_loss=0.4068, over 4086811.15 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:34:57,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-14 14:35:16,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-14 14:35:56,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2024-09-14 14:35:58,531 INFO [train.py:1198] (1/2) Epoch 8, batch 5100, loss[loss=0.2433, ctc_loss=0.1696, cr_loss=0.3688, over 20976.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1995, cr_loss=0.4052, over 4089511.39 frames. ], batch size: 52, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:36:00,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-09-14 14:36:16,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.119e+02 2.302e+02 2.585e+02 6.006e+02, threshold=4.603e+02, percent-clipped=1.0 2024-09-14 14:36:28,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141258.66666666666, ans=0.1 2024-09-14 14:37:01,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=141315.33333333334, ans=0.2 2024-09-14 14:37:13,433 INFO [train.py:1198] (1/2) Epoch 8, batch 5150, loss[loss=0.3137, ctc_loss=0.2269, cr_loss=0.4339, over 21055.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1986, cr_loss=0.4042, over 4095786.76 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:37:48,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=141400.33333333334, ans=0.0 2024-09-14 14:38:08,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141428.66666666666, ans=0.1 2024-09-14 14:38:21,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-09-14 14:38:21,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=141457.0, ans=0.0 2024-09-14 14:38:30,602 INFO [train.py:1198] (1/2) Epoch 8, batch 5200, loss[loss=0.2671, ctc_loss=0.1861, cr_loss=0.4049, over 20942.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1981, cr_loss=0.4041, over 4097783.46 frames. ], batch size: 60, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:38:48,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.148e+02 2.359e+02 2.888e+02 3.578e+02, threshold=4.717e+02, percent-clipped=0.0 2024-09-14 14:39:04,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=141542.0, ans=0.0 2024-09-14 14:39:09,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141542.0, ans=0.1 2024-09-14 14:39:10,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=141542.0, ans=0.5 2024-09-14 14:39:13,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=141570.33333333334, ans=0.0 2024-09-14 14:39:13,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=141570.33333333334, ans=0.2 2024-09-14 14:39:15,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-09-14 14:39:23,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=141570.33333333334, ans=0.2 2024-09-14 14:39:37,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141598.66666666666, ans=0.1 2024-09-14 14:39:38,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=141598.66666666666, ans=0.0 2024-09-14 14:39:44,523 INFO [train.py:1198] (1/2) Epoch 8, batch 5250, loss[loss=0.282, ctc_loss=0.2027, cr_loss=0.3963, over 21050.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1986, cr_loss=0.4044, over 4095663.15 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:39:47,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=141627.0, ans=0.0 2024-09-14 14:39:53,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2024-09-14 14:40:01,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=141655.33333333334, ans=0.2 2024-09-14 14:40:06,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2024-09-14 14:40:10,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=141655.33333333334, ans=0.0 2024-09-14 14:40:17,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=141683.66666666666, ans=0.0 2024-09-14 14:40:29,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=22.5 2024-09-14 14:41:00,968 INFO [train.py:1198] (1/2) Epoch 8, batch 5300, loss[loss=0.3101, ctc_loss=0.2241, cr_loss=0.4298, over 18452.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1999, cr_loss=0.4057, over 4090686.59 frames. ], batch size: 108, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:41:18,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.129e+02 2.379e+02 2.832e+02 5.826e+02, threshold=4.758e+02, percent-clipped=1.0 2024-09-14 14:41:29,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-14 14:41:29,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-14 14:41:47,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=141853.66666666666, ans=0.0 2024-09-14 14:41:52,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=141853.66666666666, ans=0.2 2024-09-14 14:42:14,513 INFO [train.py:1198] (1/2) Epoch 8, batch 5350, loss[loss=0.3121, ctc_loss=0.2243, cr_loss=0.4386, over 20833.00 frames. ], tot_loss[loss=0.2816, ctc_loss=0.2005, cr_loss=0.4056, over 4081139.08 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:42:43,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=22.5 2024-09-14 14:42:44,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=141967.0, ans=0.2 2024-09-14 14:42:50,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141967.0, ans=0.125 2024-09-14 14:43:03,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141995.33333333334, ans=0.1 2024-09-14 14:43:05,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=141995.33333333334, ans=0.025 2024-09-14 14:43:09,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=141995.33333333334, ans=0.125 2024-09-14 14:43:28,601 INFO [train.py:1198] (1/2) Epoch 8, batch 5400, loss[loss=0.263, ctc_loss=0.189, cr_loss=0.37, over 20780.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1993, cr_loss=0.4055, over 4095893.59 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:43:38,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2024-09-14 14:43:39,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-09-14 14:43:46,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.263e+02 2.496e+02 2.892e+02 4.621e+02, threshold=4.993e+02, percent-clipped=0.0 2024-09-14 14:43:58,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=142108.66666666666, ans=0.2 2024-09-14 14:44:09,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=142108.66666666666, ans=0.05 2024-09-14 14:44:42,432 INFO [train.py:1198] (1/2) Epoch 8, batch 5450, loss[loss=0.2586, ctc_loss=0.18, cr_loss=0.3933, over 21066.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1999, cr_loss=0.406, over 4082643.24 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:45:17,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-14 14:45:34,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142278.66666666666, ans=0.1 2024-09-14 14:45:36,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=142278.66666666666, ans=0.0 2024-09-14 14:45:51,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=142307.0, ans=0.125 2024-09-14 14:45:56,728 INFO [train.py:1198] (1/2) Epoch 8, batch 5500, loss[loss=0.3001, ctc_loss=0.2153, cr_loss=0.4242, over 21077.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1995, cr_loss=0.4059, over 4082948.10 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:45:57,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=142335.33333333334, ans=0.2 2024-09-14 14:46:14,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 2.071e+02 2.368e+02 2.692e+02 7.477e+02, threshold=4.736e+02, percent-clipped=3.0 2024-09-14 14:46:14,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-14 14:46:19,056 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:46:20,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=142363.66666666666, ans=0.125 2024-09-14 14:46:30,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142392.0, ans=0.1 2024-09-14 14:46:37,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-14 14:46:41,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142420.33333333334, ans=0.125 2024-09-14 14:47:05,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=142448.66666666666, ans=0.125 2024-09-14 14:47:12,885 INFO [train.py:1198] (1/2) Epoch 8, batch 5550, loss[loss=0.2759, ctc_loss=0.1949, cr_loss=0.4051, over 20643.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.1998, cr_loss=0.406, over 4074352.21 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:47:28,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=142505.33333333334, ans=0.125 2024-09-14 14:48:10,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=142562.0, ans=0.0 2024-09-14 14:48:13,239 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:48:27,610 INFO [train.py:1198] (1/2) Epoch 8, batch 5600, loss[loss=0.2758, ctc_loss=0.1955, cr_loss=0.4014, over 20314.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1988, cr_loss=0.4046, over 4080375.16 frames. ], batch size: 74, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:48:45,234 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.120e+02 2.314e+02 2.615e+02 3.772e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-14 14:49:15,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=142703.66666666666, ans=0.125 2024-09-14 14:49:44,218 INFO [train.py:1198] (1/2) Epoch 8, batch 5650, loss[loss=0.2762, ctc_loss=0.1954, cr_loss=0.4042, over 20665.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1972, cr_loss=0.4031, over 4093666.01 frames. ], batch size: 71, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:50:37,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142845.33333333334, ans=0.1 2024-09-14 14:50:40,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=142845.33333333334, ans=0.125 2024-09-14 14:50:42,251 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:50:58,386 INFO [train.py:1198] (1/2) Epoch 8, batch 5700, loss[loss=0.3058, ctc_loss=0.2231, cr_loss=0.4133, over 20230.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1981, cr_loss=0.4047, over 4102277.87 frames. ], batch size: 74, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:51:04,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=142902.0, ans=0.125 2024-09-14 14:51:16,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.109e+02 2.284e+02 2.557e+02 3.936e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-14 14:51:38,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=142958.66666666666, ans=10.0 2024-09-14 14:51:55,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=142987.0, ans=15.0 2024-09-14 14:52:02,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=143015.33333333334, ans=0.04949747468305833 2024-09-14 14:52:12,179 INFO [train.py:1198] (1/2) Epoch 8, batch 5750, loss[loss=0.2903, ctc_loss=0.2089, cr_loss=0.4069, over 20783.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1977, cr_loss=0.4044, over 4106211.01 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:52:13,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143043.66666666666, ans=0.1 2024-09-14 14:52:19,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143043.66666666666, ans=0.125 2024-09-14 14:52:58,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-14 14:53:02,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=143128.66666666666, ans=0.125 2024-09-14 14:53:15,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=143157.0, ans=0.125 2024-09-14 14:53:25,571 INFO [train.py:1198] (1/2) Epoch 8, batch 5800, loss[loss=0.2602, ctc_loss=0.1824, cr_loss=0.3891, over 20940.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1981, cr_loss=0.4044, over 4102436.09 frames. ], batch size: 50, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:53:25,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=143185.33333333334, ans=0.125 2024-09-14 14:53:41,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-14 14:53:43,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.048e+02 2.178e+02 2.439e+02 3.789e+02, threshold=4.356e+02, percent-clipped=0.0 2024-09-14 14:53:47,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2024-09-14 14:53:57,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-14 14:54:01,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=12.0 2024-09-14 14:54:12,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=143270.33333333334, ans=0.125 2024-09-14 14:54:31,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=143298.66666666666, ans=0.07 2024-09-14 14:54:39,506 INFO [train.py:1198] (1/2) Epoch 8, batch 5850, loss[loss=0.2778, ctc_loss=0.1916, cr_loss=0.4315, over 20953.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1972, cr_loss=0.4029, over 4105285.88 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:55:21,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143383.66666666666, ans=0.1 2024-09-14 14:55:23,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-14 14:55:55,940 INFO [train.py:1198] (1/2) Epoch 8, batch 5900, loss[loss=0.293, ctc_loss=0.2081, cr_loss=0.4244, over 20676.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1979, cr_loss=0.4035, over 4095267.44 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:56:05,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143468.66666666666, ans=0.125 2024-09-14 14:56:13,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.113e+02 2.342e+02 2.739e+02 4.298e+02, threshold=4.684e+02, percent-clipped=0.0 2024-09-14 14:57:06,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=143582.0, ans=0.125 2024-09-14 14:57:10,304 INFO [train.py:1198] (1/2) Epoch 8, batch 5950, loss[loss=0.3028, ctc_loss=0.2179, cr_loss=0.4246, over 18172.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1981, cr_loss=0.4038, over 4092380.60 frames. ], batch size: 108, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:57:10,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143610.33333333334, ans=0.125 2024-09-14 14:57:14,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-14 14:57:55,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=143695.33333333334, ans=0.2 2024-09-14 14:58:15,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-14 14:58:26,701 INFO [train.py:1198] (1/2) Epoch 8, batch 6000, loss[loss=0.3093, ctc_loss=0.2245, cr_loss=0.4241, over 20665.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1994, cr_loss=0.405, over 4080472.18 frames. ], batch size: 68, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:58:26,701 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 14:58:42,102 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4398, 3.9201, 2.9179, 3.5383], device='cuda:1') 2024-09-14 14:58:52,718 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.05749, ctc_loss=0.05749, cr_loss=9.447e-15, over 944034.00 frames. 2024-09-14 14:58:52,719 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 14:59:09,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=143780.33333333334, ans=0.0 2024-09-14 14:59:10,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.175e+02 2.381e+02 2.748e+02 4.326e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-14 14:59:29,997 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:59:30,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-14 15:00:07,235 INFO [train.py:1198] (1/2) Epoch 8, batch 6050, loss[loss=0.236, ctc_loss=0.1633, cr_loss=0.3634, over 20990.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1984, cr_loss=0.4038, over 4091281.91 frames. ], batch size: 52, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:00:17,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=143893.66666666666, ans=0.025 2024-09-14 15:00:28,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=143922.0, ans=0.125 2024-09-14 15:01:07,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=144007.0, ans=0.125 2024-09-14 15:01:20,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=144035.33333333334, ans=0.125 2024-09-14 15:01:21,781 INFO [train.py:1198] (1/2) Epoch 8, batch 6100, loss[loss=0.2883, ctc_loss=0.2035, cr_loss=0.4238, over 20992.00 frames. ], tot_loss[loss=0.2784, ctc_loss=0.1978, cr_loss=0.4029, over 4099262.87 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:01:24,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-14 15:01:39,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.110e+02 2.250e+02 2.560e+02 3.709e+02, threshold=4.501e+02, percent-clipped=0.0 2024-09-14 15:02:16,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=144120.33333333334, ans=0.1 2024-09-14 15:02:34,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=144177.0, ans=0.0 2024-09-14 15:02:35,232 INFO [train.py:1198] (1/2) Epoch 8, batch 6150, loss[loss=0.2655, ctc_loss=0.1851, cr_loss=0.4023, over 21002.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.198, cr_loss=0.403, over 4098191.98 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:02:48,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-09-14 15:03:50,243 INFO [train.py:1198] (1/2) Epoch 8, batch 6200, loss[loss=0.2309, ctc_loss=0.1613, cr_loss=0.3478, over 20972.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1963, cr_loss=0.4003, over 4095002.66 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:03:56,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=144318.66666666666, ans=0.0 2024-09-14 15:04:02,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=144318.66666666666, ans=0.0 2024-09-14 15:04:08,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.046e+02 2.184e+02 2.414e+02 3.936e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-14 15:04:09,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=144347.0, ans=0.0 2024-09-14 15:04:17,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=144347.0, ans=0.5 2024-09-14 15:04:58,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=144432.0, ans=0.0 2024-09-14 15:04:58,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-14 15:05:01,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=144432.0, ans=0.04949747468305833 2024-09-14 15:05:03,862 INFO [train.py:1198] (1/2) Epoch 8, batch 6250, loss[loss=0.2741, ctc_loss=0.1939, cr_loss=0.4009, over 20869.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1973, cr_loss=0.4007, over 4062837.20 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:05:30,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=144488.66666666666, ans=0.02 2024-09-14 15:05:54,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-14 15:06:17,050 INFO [train.py:1198] (1/2) Epoch 8, batch 6300, loss[loss=0.2376, ctc_loss=0.1682, cr_loss=0.3467, over 20985.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1994, cr_loss=0.4018, over 4025556.30 frames. ], batch size: 52, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:06:34,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.136e+02 2.374e+02 2.738e+02 4.102e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-14 15:06:49,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=144658.66666666666, ans=0.2 2024-09-14 15:06:50,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=144658.66666666666, ans=0.015 2024-09-14 15:06:59,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=144687.0, ans=0.0 2024-09-14 15:07:08,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-14 15:07:11,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.49 vs. limit=10.0 2024-09-14 15:07:14,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=144715.33333333334, ans=0.0 2024-09-14 15:07:19,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=22.5 2024-09-14 15:07:28,678 INFO [train.py:1198] (1/2) Epoch 8, batch 6350, loss[loss=0.3376, ctc_loss=0.2531, cr_loss=0.4225, over 14305.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2068, cr_loss=0.4052, over 3852185.61 frames. ], batch size: 149, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:07:56,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=144800.33333333334, ans=0.2 2024-09-14 15:09:14,800 INFO [train.py:1198] (1/2) Epoch 9, batch 0, loss[loss=0.2806, ctc_loss=0.1969, cr_loss=0.4184, over 21038.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1969, cr_loss=0.4184, over 21038.00 frames. ], batch size: 62, lr: 9.65e-03, grad_scale: 32.0 2024-09-14 15:09:14,801 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 15:09:33,300 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.05921, ctc_loss=0.05921, cr_loss=9.606e-15, over 944034.00 frames. 2024-09-14 15:09:33,301 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 15:09:37,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=144859.83333333334, ans=0.125 2024-09-14 15:09:43,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-14 15:09:55,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=144888.16666666666, ans=0.025 2024-09-14 15:10:04,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.383e+02 2.585e+02 2.854e+02 4.318e+02, threshold=5.171e+02, percent-clipped=0.0 2024-09-14 15:10:14,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=144916.5, ans=0.025 2024-09-14 15:10:17,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=144944.83333333334, ans=0.07 2024-09-14 15:10:46,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2024-09-14 15:10:48,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=144973.16666666666, ans=0.1 2024-09-14 15:10:51,206 INFO [train.py:1198] (1/2) Epoch 9, batch 50, loss[loss=0.2569, ctc_loss=0.183, cr_loss=0.3694, over 21036.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1959, cr_loss=0.403, over 928321.30 frames. ], batch size: 56, lr: 9.64e-03, grad_scale: 32.0 2024-09-14 15:10:54,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-09-14 15:11:01,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=145001.5, ans=0.0 2024-09-14 15:11:05,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=145029.83333333334, ans=0.0 2024-09-14 15:11:08,135 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:11:32,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=145058.16666666666, ans=0.0 2024-09-14 15:11:32,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145058.16666666666, ans=0.125 2024-09-14 15:12:06,014 INFO [train.py:1198] (1/2) Epoch 9, batch 100, loss[loss=0.2803, ctc_loss=0.1968, cr_loss=0.4173, over 21047.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1956, cr_loss=0.4025, over 1627985.42 frames. ], batch size: 62, lr: 9.64e-03, grad_scale: 32.0 2024-09-14 15:12:08,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:12:20,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-14 15:12:24,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.76 vs. limit=10.0 2024-09-14 15:12:37,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.003e+02 2.113e+02 2.321e+02 3.266e+02, threshold=4.227e+02, percent-clipped=0.0 2024-09-14 15:12:45,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=145199.83333333334, ans=0.025 2024-09-14 15:12:51,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145199.83333333334, ans=0.0 2024-09-14 15:12:57,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145228.16666666666, ans=0.1 2024-09-14 15:13:06,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=145228.16666666666, ans=0.015 2024-09-14 15:13:12,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=145256.5, ans=0.125 2024-09-14 15:13:24,329 INFO [train.py:1198] (1/2) Epoch 9, batch 150, loss[loss=0.2501, ctc_loss=0.1787, cr_loss=0.3571, over 19855.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1953, cr_loss=0.4011, over 2172852.68 frames. ], batch size: 44, lr: 9.63e-03, grad_scale: 32.0 2024-09-14 15:13:34,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=145284.83333333334, ans=0.2 2024-09-14 15:14:39,564 INFO [train.py:1198] (1/2) Epoch 9, batch 200, loss[loss=0.3256, ctc_loss=0.2358, cr_loss=0.4493, over 20677.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.196, cr_loss=0.4034, over 2613255.15 frames. ], batch size: 68, lr: 9.63e-03, grad_scale: 32.0 2024-09-14 15:14:41,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-14 15:14:52,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=145426.5, ans=0.125 2024-09-14 15:14:58,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=145454.83333333334, ans=0.0 2024-09-14 15:15:04,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=145454.83333333334, ans=0.125 2024-09-14 15:15:11,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.054e+02 2.202e+02 2.486e+02 5.208e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-14 15:15:16,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=145483.16666666666, ans=0.125 2024-09-14 15:15:21,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=145483.16666666666, ans=0.0 2024-09-14 15:15:39,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=145539.83333333334, ans=0.0 2024-09-14 15:15:51,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-09-14 15:15:55,445 INFO [train.py:1198] (1/2) Epoch 9, batch 250, loss[loss=0.2517, ctc_loss=0.1743, cr_loss=0.3871, over 21051.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1947, cr_loss=0.4011, over 2942646.03 frames. ], batch size: 53, lr: 9.62e-03, grad_scale: 32.0 2024-09-14 15:15:57,451 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:16:07,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=145568.16666666666, ans=0.125 2024-09-14 15:16:18,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145596.5, ans=0.0 2024-09-14 15:16:25,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=145596.5, ans=0.125 2024-09-14 15:16:41,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-14 15:17:14,112 INFO [train.py:1198] (1/2) Epoch 9, batch 300, loss[loss=0.3538, ctc_loss=0.2683, cr_loss=0.4275, over 13925.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1947, cr_loss=0.4017, over 3205413.26 frames. ], batch size: 149, lr: 9.62e-03, grad_scale: 64.0 2024-09-14 15:17:20,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145709.83333333334, ans=0.1 2024-09-14 15:17:33,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=145738.16666666666, ans=0.2 2024-09-14 15:17:45,566 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.047e+02 2.187e+02 2.461e+02 3.338e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-14 15:17:46,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145766.5, ans=0.1 2024-09-14 15:17:57,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=145794.83333333334, ans=0.2 2024-09-14 15:18:03,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145794.83333333334, ans=0.1 2024-09-14 15:18:14,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=145823.16666666666, ans=0.125 2024-09-14 15:18:14,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145823.16666666666, ans=0.0 2024-09-14 15:18:23,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=145823.16666666666, ans=0.025 2024-09-14 15:18:29,311 INFO [train.py:1198] (1/2) Epoch 9, batch 350, loss[loss=0.2591, ctc_loss=0.1827, cr_loss=0.3821, over 21058.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.1958, cr_loss=0.404, over 3404310.06 frames. ], batch size: 53, lr: 9.62e-03, grad_scale: 64.0 2024-09-14 15:18:44,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145851.5, ans=0.125 2024-09-14 15:19:15,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145936.5, ans=0.125 2024-09-14 15:19:16,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-14 15:19:44,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=145964.83333333334, ans=0.0 2024-09-14 15:19:46,961 INFO [train.py:1198] (1/2) Epoch 9, batch 400, loss[loss=0.2863, ctc_loss=0.1988, cr_loss=0.4375, over 20936.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1969, cr_loss=0.4048, over 3550143.22 frames. ], batch size: 60, lr: 9.61e-03, grad_scale: 64.0 2024-09-14 15:20:20,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.122e+02 2.280e+02 2.528e+02 3.565e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 15:20:52,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=146106.5, ans=0.125 2024-09-14 15:21:01,980 INFO [train.py:1198] (1/2) Epoch 9, batch 450, loss[loss=0.2344, ctc_loss=0.1649, cr_loss=0.3473, over 20947.00 frames. ], tot_loss[loss=0.2772, ctc_loss=0.1963, cr_loss=0.4043, over 3675730.67 frames. ], batch size: 50, lr: 9.61e-03, grad_scale: 32.0 2024-09-14 15:21:02,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146134.83333333334, ans=0.1 2024-09-14 15:21:08,140 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:21:27,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=146163.16666666666, ans=0.025 2024-09-14 15:21:29,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=146163.16666666666, ans=0.5 2024-09-14 15:22:20,018 INFO [train.py:1198] (1/2) Epoch 9, batch 500, loss[loss=0.2362, ctc_loss=0.1621, cr_loss=0.3709, over 19868.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.195, cr_loss=0.4015, over 3758322.48 frames. ], batch size: 44, lr: 9.60e-03, grad_scale: 16.0 2024-09-14 15:22:36,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=146304.83333333334, ans=0.0 2024-09-14 15:22:41,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-09-14 15:22:50,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=146333.16666666666, ans=0.0 2024-09-14 15:22:54,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.000e+02 2.260e+02 2.487e+02 3.645e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-14 15:23:08,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=146361.5, ans=0.0 2024-09-14 15:23:35,113 INFO [train.py:1198] (1/2) Epoch 9, batch 550, loss[loss=0.2721, ctc_loss=0.1915, cr_loss=0.4032, over 20326.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1952, cr_loss=0.4028, over 3840639.55 frames. ], batch size: 74, lr: 9.60e-03, grad_scale: 16.0 2024-09-14 15:23:56,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=22.5 2024-09-14 15:24:52,696 INFO [train.py:1198] (1/2) Epoch 9, batch 600, loss[loss=0.263, ctc_loss=0.1829, cr_loss=0.4005, over 20894.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1951, cr_loss=0.4035, over 3897671.46 frames. ], batch size: 54, lr: 9.59e-03, grad_scale: 16.0 2024-09-14 15:25:00,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=146559.83333333334, ans=0.125 2024-09-14 15:25:10,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.64 vs. limit=22.5 2024-09-14 15:25:12,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=146588.16666666666, ans=0.125 2024-09-14 15:25:26,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.084e+02 2.253e+02 2.578e+02 7.002e+02, threshold=4.507e+02, percent-clipped=1.0 2024-09-14 15:25:47,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-14 15:26:07,750 INFO [train.py:1198] (1/2) Epoch 9, batch 650, loss[loss=0.2477, ctc_loss=0.1726, cr_loss=0.3757, over 20963.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1956, cr_loss=0.4029, over 3923882.91 frames. ], batch size: 51, lr: 9.59e-03, grad_scale: 16.0 2024-09-14 15:26:09,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=146701.5, ans=0.125 2024-09-14 15:26:14,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=146701.5, ans=0.125 2024-09-14 15:26:15,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=146701.5, ans=0.0 2024-09-14 15:27:23,150 INFO [train.py:1198] (1/2) Epoch 9, batch 700, loss[loss=0.2428, ctc_loss=0.1697, cr_loss=0.365, over 21063.00 frames. ], tot_loss[loss=0.2769, ctc_loss=0.1963, cr_loss=0.4031, over 3951128.67 frames. ], batch size: 53, lr: 9.58e-03, grad_scale: 16.0 2024-09-14 15:28:00,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.100e+02 2.307e+02 2.630e+02 5.153e+02, threshold=4.614e+02, percent-clipped=1.0 2024-09-14 15:28:25,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=146956.5, ans=0.0 2024-09-14 15:28:33,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=146956.5, ans=0.125 2024-09-14 15:28:38,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=146956.5, ans=0.2 2024-09-14 15:28:40,995 INFO [train.py:1198] (1/2) Epoch 9, batch 750, loss[loss=0.2793, ctc_loss=0.1981, cr_loss=0.4056, over 20834.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1952, cr_loss=0.4021, over 3993296.66 frames. ], batch size: 65, lr: 9.58e-03, grad_scale: 16.0 2024-09-14 15:28:46,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2024-09-14 15:29:15,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=147041.5, ans=0.125 2024-09-14 15:29:17,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=147041.5, ans=0.125 2024-09-14 15:29:54,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=147098.16666666666, ans=0.07 2024-09-14 15:29:59,099 INFO [train.py:1198] (1/2) Epoch 9, batch 800, loss[loss=0.2562, ctc_loss=0.1781, cr_loss=0.3904, over 20903.00 frames. ], tot_loss[loss=0.2776, ctc_loss=0.1968, cr_loss=0.4041, over 4005607.65 frames. ], batch size: 54, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:30:15,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-14 15:30:33,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.091e+02 2.348e+02 2.533e+02 4.670e+02, threshold=4.696e+02, percent-clipped=1.0 2024-09-14 15:30:44,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147211.5, ans=0.125 2024-09-14 15:30:59,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147239.83333333334, ans=0.1 2024-09-14 15:31:09,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=147239.83333333334, ans=0.0 2024-09-14 15:31:13,982 INFO [train.py:1198] (1/2) Epoch 9, batch 850, loss[loss=0.2679, ctc_loss=0.1886, cr_loss=0.3963, over 20300.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1963, cr_loss=0.4036, over 4029143.48 frames. ], batch size: 74, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:31:20,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2024-09-14 15:31:43,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-09-14 15:32:12,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=147353.16666666666, ans=0.125 2024-09-14 15:32:13,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=147381.5, ans=0.125 2024-09-14 15:32:25,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=147381.5, ans=0.125 2024-09-14 15:32:29,693 INFO [train.py:1198] (1/2) Epoch 9, batch 900, loss[loss=0.2442, ctc_loss=0.1752, cr_loss=0.3451, over 21070.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1956, cr_loss=0.4027, over 4047934.39 frames. ], batch size: 53, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:32:31,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147409.83333333334, ans=0.125 2024-09-14 15:32:43,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147438.16666666666, ans=0.125 2024-09-14 15:32:47,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-14 15:33:01,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=147466.5, ans=0.1 2024-09-14 15:33:04,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.072e+02 2.211e+02 2.432e+02 3.229e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-14 15:33:46,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=147523.16666666666, ans=0.2 2024-09-14 15:33:49,184 INFO [train.py:1198] (1/2) Epoch 9, batch 950, loss[loss=0.2578, ctc_loss=0.1829, cr_loss=0.3748, over 20836.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1955, cr_loss=0.4031, over 4052258.65 frames. ], batch size: 59, lr: 9.56e-03, grad_scale: 32.0 2024-09-14 15:34:51,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147664.83333333334, ans=0.125 2024-09-14 15:35:04,778 INFO [train.py:1198] (1/2) Epoch 9, batch 1000, loss[loss=0.3014, ctc_loss=0.2196, cr_loss=0.4088, over 18260.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.195, cr_loss=0.4032, over 4070355.91 frames. ], batch size: 108, lr: 9.56e-03, grad_scale: 32.0 2024-09-14 15:35:22,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-14 15:35:25,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-09-14 15:35:42,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.137e+02 2.357e+02 2.706e+02 4.826e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-14 15:35:46,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=147749.83333333334, ans=0.125 2024-09-14 15:35:55,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.04 vs. limit=10.0 2024-09-14 15:36:06,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=147806.5, ans=0.0 2024-09-14 15:36:08,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=147806.5, ans=0.125 2024-09-14 15:36:23,183 INFO [train.py:1198] (1/2) Epoch 9, batch 1050, loss[loss=0.2765, ctc_loss=0.1988, cr_loss=0.3886, over 20968.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1951, cr_loss=0.4032, over 4074521.58 frames. ], batch size: 58, lr: 9.55e-03, grad_scale: 32.0 2024-09-14 15:37:38,773 INFO [train.py:1198] (1/2) Epoch 9, batch 1100, loss[loss=0.2467, ctc_loss=0.1728, cr_loss=0.3695, over 21072.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1945, cr_loss=0.4026, over 4085024.11 frames. ], batch size: 53, lr: 9.55e-03, grad_scale: 16.0 2024-09-14 15:37:39,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147976.5, ans=0.125 2024-09-14 15:37:45,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=147976.5, ans=0.0 2024-09-14 15:37:49,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=147976.5, ans=0.125 2024-09-14 15:37:51,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=147976.5, ans=0.125 2024-09-14 15:38:05,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148004.83333333334, ans=0.125 2024-09-14 15:38:15,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.049e+02 2.233e+02 2.569e+02 3.716e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-14 15:38:21,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148033.16666666666, ans=0.0 2024-09-14 15:38:24,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148061.5, ans=0.125 2024-09-14 15:38:54,262 INFO [train.py:1198] (1/2) Epoch 9, batch 1150, loss[loss=0.295, ctc_loss=0.2086, cr_loss=0.4321, over 20835.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.194, cr_loss=0.4023, over 4096019.50 frames. ], batch size: 65, lr: 9.54e-03, grad_scale: 16.0 2024-09-14 15:39:08,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=148146.5, ans=0.2 2024-09-14 15:39:15,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=148146.5, ans=0.025 2024-09-14 15:39:24,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-14 15:40:12,479 INFO [train.py:1198] (1/2) Epoch 9, batch 1200, loss[loss=0.2365, ctc_loss=0.165, cr_loss=0.3575, over 20977.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1942, cr_loss=0.4019, over 4102671.43 frames. ], batch size: 51, lr: 9.54e-03, grad_scale: 32.0 2024-09-14 15:40:26,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=148288.16666666666, ans=0.125 2024-09-14 15:40:29,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-14 15:40:47,762 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.084e+02 2.241e+02 2.597e+02 3.993e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 15:41:29,692 INFO [train.py:1198] (1/2) Epoch 9, batch 1250, loss[loss=0.2453, ctc_loss=0.1707, cr_loss=0.3729, over 20982.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1943, cr_loss=0.4021, over 4108351.12 frames. ], batch size: 52, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:41:37,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=148401.5, ans=0.0 2024-09-14 15:41:37,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148401.5, ans=0.1 2024-09-14 15:42:09,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148458.16666666666, ans=0.1 2024-09-14 15:42:31,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148514.83333333334, ans=0.1 2024-09-14 15:42:45,658 INFO [train.py:1198] (1/2) Epoch 9, batch 1300, loss[loss=0.2847, ctc_loss=0.2004, cr_loss=0.4218, over 21016.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1955, cr_loss=0.4034, over 4102367.32 frames. ], batch size: 63, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:43:21,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.097e+02 2.224e+02 2.451e+02 7.655e+02, threshold=4.447e+02, percent-clipped=2.0 2024-09-14 15:43:26,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=148599.83333333334, ans=0.09899494936611666 2024-09-14 15:43:31,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=148628.16666666666, ans=10.0 2024-09-14 15:44:00,914 INFO [train.py:1198] (1/2) Epoch 9, batch 1350, loss[loss=0.2345, ctc_loss=0.1647, cr_loss=0.3492, over 20305.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.195, cr_loss=0.4018, over 4101587.35 frames. ], batch size: 45, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:45:19,256 INFO [train.py:1198] (1/2) Epoch 9, batch 1400, loss[loss=0.2745, ctc_loss=0.1934, cr_loss=0.4054, over 20970.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1943, cr_loss=0.402, over 4112349.70 frames. ], batch size: 58, lr: 9.52e-03, grad_scale: 32.0 2024-09-14 15:45:28,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2024-09-14 15:45:37,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=148854.83333333334, ans=0.125 2024-09-14 15:45:49,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148883.16666666666, ans=0.1 2024-09-14 15:45:52,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-09-14 15:45:53,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148883.16666666666, ans=0.125 2024-09-14 15:45:56,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.104e+02 2.225e+02 2.434e+02 3.450e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-14 15:46:07,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=148911.5, ans=0.04949747468305833 2024-09-14 15:46:07,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148911.5, ans=0.125 2024-09-14 15:46:16,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148911.5, ans=0.1 2024-09-14 15:46:34,301 INFO [train.py:1198] (1/2) Epoch 9, batch 1450, loss[loss=0.2464, ctc_loss=0.1689, cr_loss=0.3874, over 20934.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1943, cr_loss=0.4014, over 4108880.00 frames. ], batch size: 50, lr: 9.52e-03, grad_scale: 16.0 2024-09-14 15:46:34,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=148968.16666666666, ans=0.05 2024-09-14 15:46:37,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=148968.16666666666, ans=0.025 2024-09-14 15:46:46,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=148968.16666666666, ans=0.125 2024-09-14 15:47:53,995 INFO [train.py:1198] (1/2) Epoch 9, batch 1500, loss[loss=0.2851, ctc_loss=0.2024, cr_loss=0.4136, over 21066.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1942, cr_loss=0.401, over 4103958.16 frames. ], batch size: 56, lr: 9.51e-03, grad_scale: 16.0 2024-09-14 15:48:12,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149138.16666666666, ans=0.125 2024-09-14 15:48:18,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=149138.16666666666, ans=0.0 2024-09-14 15:48:31,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.038e+02 2.322e+02 2.723e+02 3.987e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-14 15:48:36,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2024-09-14 15:48:58,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=149223.16666666666, ans=0.2 2024-09-14 15:49:08,972 INFO [train.py:1198] (1/2) Epoch 9, batch 1550, loss[loss=0.2381, ctc_loss=0.1686, cr_loss=0.3472, over 21053.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1951, cr_loss=0.4015, over 4077701.90 frames. ], batch size: 56, lr: 9.51e-03, grad_scale: 16.0 2024-09-14 15:49:30,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=149279.83333333334, ans=0.125 2024-09-14 15:50:15,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-14 15:50:20,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149364.83333333334, ans=0.0 2024-09-14 15:50:24,999 INFO [train.py:1198] (1/2) Epoch 9, batch 1600, loss[loss=0.2908, ctc_loss=0.2082, cr_loss=0.4134, over 20846.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1943, cr_loss=0.4003, over 4089440.74 frames. ], batch size: 65, lr: 9.50e-03, grad_scale: 32.0 2024-09-14 15:50:31,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=149393.16666666666, ans=0.04949747468305833 2024-09-14 15:51:05,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.144e+02 2.333e+02 2.616e+02 4.738e+02, threshold=4.667e+02, percent-clipped=1.0 2024-09-14 15:51:10,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=149449.83333333334, ans=0.035 2024-09-14 15:51:16,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149478.16666666666, ans=0.1 2024-09-14 15:51:41,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=149534.83333333334, ans=0.025 2024-09-14 15:51:42,850 INFO [train.py:1198] (1/2) Epoch 9, batch 1650, loss[loss=0.2635, ctc_loss=0.1851, cr_loss=0.3922, over 20781.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1948, cr_loss=0.4012, over 4087443.30 frames. ], batch size: 56, lr: 9.50e-03, grad_scale: 32.0 2024-09-14 15:52:44,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149648.16666666666, ans=0.1 2024-09-14 15:53:00,649 INFO [train.py:1198] (1/2) Epoch 9, batch 1700, loss[loss=0.2491, ctc_loss=0.1723, cr_loss=0.3839, over 20986.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1946, cr_loss=0.4011, over 4083007.77 frames. ], batch size: 51, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:53:08,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=149676.5, ans=0.125 2024-09-14 15:53:25,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-14 15:53:38,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.049e+02 2.258e+02 2.531e+02 7.471e+02, threshold=4.516e+02, percent-clipped=2.0 2024-09-14 15:54:00,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=149789.83333333334, ans=0.125 2024-09-14 15:54:16,871 INFO [train.py:1198] (1/2) Epoch 9, batch 1750, loss[loss=0.2834, ctc_loss=0.1946, cr_loss=0.4439, over 20815.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1938, cr_loss=0.4014, over 4096880.16 frames. ], batch size: 59, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:54:17,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149818.16666666666, ans=0.125 2024-09-14 15:54:24,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149818.16666666666, ans=0.125 2024-09-14 15:54:42,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-14 15:54:49,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-14 15:54:54,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=149874.83333333334, ans=0.0 2024-09-14 15:54:59,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2024-09-14 15:55:00,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=149903.16666666666, ans=0.2 2024-09-14 15:55:02,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=149903.16666666666, ans=0.125 2024-09-14 15:55:08,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=149903.16666666666, ans=0.015 2024-09-14 15:55:09,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149903.16666666666, ans=0.0 2024-09-14 15:55:17,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=149931.5, ans=0.0 2024-09-14 15:55:31,807 INFO [train.py:1198] (1/2) Epoch 9, batch 1800, loss[loss=0.2614, ctc_loss=0.1836, cr_loss=0.3894, over 20986.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1936, cr_loss=0.4014, over 4097272.18 frames. ], batch size: 52, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:55:50,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149988.16666666666, ans=0.125 2024-09-14 15:55:50,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149988.16666666666, ans=0.1 2024-09-14 15:56:09,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.064e+02 2.286e+02 2.541e+02 4.423e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-14 15:56:50,317 INFO [train.py:1198] (1/2) Epoch 9, batch 1850, loss[loss=0.2681, ctc_loss=0.1871, cr_loss=0.4052, over 20968.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.193, cr_loss=0.4018, over 4096405.03 frames. ], batch size: 58, lr: 9.48e-03, grad_scale: 32.0 2024-09-14 15:57:37,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=150186.5, ans=0.0 2024-09-14 15:57:43,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150186.5, ans=0.1 2024-09-14 15:57:58,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2024-09-14 15:58:05,575 INFO [train.py:1198] (1/2) Epoch 9, batch 1900, loss[loss=0.2686, ctc_loss=0.1914, cr_loss=0.3858, over 20648.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1932, cr_loss=0.4016, over 4100820.88 frames. ], batch size: 66, lr: 9.48e-03, grad_scale: 32.0 2024-09-14 15:58:16,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150243.16666666666, ans=0.1 2024-09-14 15:58:43,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-14 15:58:45,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.091e+02 2.239e+02 2.372e+02 4.186e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-14 15:59:23,365 INFO [train.py:1198] (1/2) Epoch 9, batch 1950, loss[loss=0.3353, ctc_loss=0.2519, cr_loss=0.4168, over 14014.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1933, cr_loss=0.4017, over 4100666.45 frames. ], batch size: 149, lr: 9.47e-03, grad_scale: 32.0 2024-09-14 15:59:25,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=150384.83333333334, ans=0.025 2024-09-14 15:59:42,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=150413.16666666666, ans=0.2 2024-09-14 15:59:55,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2024-09-14 16:00:38,337 INFO [train.py:1198] (1/2) Epoch 9, batch 2000, loss[loss=0.2912, ctc_loss=0.2038, cr_loss=0.4369, over 20974.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1944, cr_loss=0.403, over 4097432.31 frames. ], batch size: 64, lr: 9.47e-03, grad_scale: 32.0 2024-09-14 16:00:38,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=150526.5, ans=0.04949747468305833 2024-09-14 16:00:58,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150554.83333333334, ans=0.1 2024-09-14 16:01:00,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-14 16:01:15,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.064e+02 2.222e+02 2.486e+02 5.486e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-14 16:01:29,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=150611.5, ans=0.125 2024-09-14 16:01:40,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=150639.83333333334, ans=0.125 2024-09-14 16:01:49,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=150639.83333333334, ans=0.2 2024-09-14 16:01:53,337 INFO [train.py:1198] (1/2) Epoch 9, batch 2050, loss[loss=0.2903, ctc_loss=0.2043, cr_loss=0.4303, over 21026.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1944, cr_loss=0.4029, over 4101053.61 frames. ], batch size: 63, lr: 9.46e-03, grad_scale: 32.0 2024-09-14 16:02:52,958 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:02:58,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=150781.5, ans=0.2 2024-09-14 16:03:00,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=150781.5, ans=0.125 2024-09-14 16:03:12,143 INFO [train.py:1198] (1/2) Epoch 9, batch 2100, loss[loss=0.3165, ctc_loss=0.2304, cr_loss=0.4306, over 18475.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1947, cr_loss=0.4035, over 4094515.75 frames. ], batch size: 108, lr: 9.46e-03, grad_scale: 32.0 2024-09-14 16:03:35,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=150838.16666666666, ans=0.125 2024-09-14 16:03:49,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.042e+02 2.207e+02 2.407e+02 4.895e+02, threshold=4.414e+02, percent-clipped=1.0 2024-09-14 16:04:30,056 INFO [train.py:1198] (1/2) Epoch 9, batch 2150, loss[loss=0.2436, ctc_loss=0.1727, cr_loss=0.3546, over 20965.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1941, cr_loss=0.4025, over 4096374.59 frames. ], batch size: 48, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:04:50,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-14 16:05:12,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-09-14 16:05:40,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-14 16:05:45,662 INFO [train.py:1198] (1/2) Epoch 9, batch 2200, loss[loss=0.2783, ctc_loss=0.1977, cr_loss=0.4033, over 20718.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1945, cr_loss=0.4023, over 4092191.96 frames. ], batch size: 71, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:05:50,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=151093.16666666666, ans=0.125 2024-09-14 16:05:58,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151093.16666666666, ans=0.1 2024-09-14 16:06:20,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=151149.83333333334, ans=0.125 2024-09-14 16:06:22,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=151149.83333333334, ans=0.125 2024-09-14 16:06:23,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.119e+02 2.414e+02 2.810e+02 4.816e+02, threshold=4.829e+02, percent-clipped=1.0 2024-09-14 16:06:25,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=151149.83333333334, ans=0.125 2024-09-14 16:06:31,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=151178.16666666666, ans=0.125 2024-09-14 16:06:41,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=151178.16666666666, ans=0.0 2024-09-14 16:06:42,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=151178.16666666666, ans=0.0 2024-09-14 16:07:01,233 INFO [train.py:1198] (1/2) Epoch 9, batch 2250, loss[loss=0.2998, ctc_loss=0.2169, cr_loss=0.4143, over 20082.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1944, cr_loss=0.4026, over 4096955.65 frames. ], batch size: 80, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:07:17,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-14 16:07:28,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=151263.16666666666, ans=0.07 2024-09-14 16:07:31,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151291.5, ans=0.0 2024-09-14 16:07:34,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=151291.5, ans=0.0 2024-09-14 16:07:49,574 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:08:19,543 INFO [train.py:1198] (1/2) Epoch 9, batch 2300, loss[loss=0.2496, ctc_loss=0.1725, cr_loss=0.3852, over 21068.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1939, cr_loss=0.4014, over 4097107.23 frames. ], batch size: 53, lr: 9.44e-03, grad_scale: 32.0 2024-09-14 16:08:52,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-14 16:08:57,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.084e+02 2.261e+02 2.474e+02 4.816e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-14 16:09:27,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=151489.83333333334, ans=0.0 2024-09-14 16:09:35,203 INFO [train.py:1198] (1/2) Epoch 9, batch 2350, loss[loss=0.2631, ctc_loss=0.1846, cr_loss=0.3925, over 20984.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.193, cr_loss=0.4005, over 4098060.83 frames. ], batch size: 55, lr: 9.44e-03, grad_scale: 32.0 2024-09-14 16:10:53,042 INFO [train.py:1198] (1/2) Epoch 9, batch 2400, loss[loss=0.2825, ctc_loss=0.2008, cr_loss=0.4089, over 21028.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1933, cr_loss=0.4007, over 4092862.01 frames. ], batch size: 62, lr: 9.43e-03, grad_scale: 32.0 2024-09-14 16:11:08,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=151688.16666666666, ans=0.0 2024-09-14 16:11:12,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-09-14 16:11:16,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151688.16666666666, ans=0.1 2024-09-14 16:11:31,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.051e+02 2.214e+02 2.513e+02 3.869e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-14 16:11:42,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=151744.83333333334, ans=0.125 2024-09-14 16:11:42,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-14 16:12:09,102 INFO [train.py:1198] (1/2) Epoch 9, batch 2450, loss[loss=0.2877, ctc_loss=0.2051, cr_loss=0.4132, over 20955.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1921, cr_loss=0.3994, over 4089036.79 frames. ], batch size: 58, lr: 9.43e-03, grad_scale: 16.0 2024-09-14 16:12:15,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=151801.5, ans=0.125 2024-09-14 16:12:27,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=151829.83333333334, ans=0.125 2024-09-14 16:12:29,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=151829.83333333334, ans=0.125 2024-09-14 16:12:39,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=151858.16666666666, ans=0.125 2024-09-14 16:12:52,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=151858.16666666666, ans=0.0 2024-09-14 16:13:27,329 INFO [train.py:1198] (1/2) Epoch 9, batch 2500, loss[loss=0.2846, ctc_loss=0.2031, cr_loss=0.4072, over 19712.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1931, cr_loss=0.4001, over 4086357.70 frames. ], batch size: 90, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:13:30,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=151943.16666666666, ans=0.05 2024-09-14 16:14:06,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.100e+02 2.274e+02 2.459e+02 4.462e+02, threshold=4.548e+02, percent-clipped=1.0 2024-09-14 16:14:42,359 INFO [train.py:1198] (1/2) Epoch 9, batch 2550, loss[loss=0.2912, ctc_loss=0.2079, cr_loss=0.4161, over 20648.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1928, cr_loss=0.4001, over 4096973.82 frames. ], batch size: 66, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:14:42,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=152084.83333333334, ans=0.0 2024-09-14 16:15:03,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=152113.16666666666, ans=0.125 2024-09-14 16:15:31,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-14 16:15:32,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=152169.83333333334, ans=0.125 2024-09-14 16:15:36,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=152169.83333333334, ans=0.125 2024-09-14 16:15:58,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-14 16:16:01,114 INFO [train.py:1198] (1/2) Epoch 9, batch 2600, loss[loss=0.2711, ctc_loss=0.1909, cr_loss=0.401, over 20825.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1929, cr_loss=0.4009, over 4096483.15 frames. ], batch size: 59, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:16:25,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=152254.83333333334, ans=0.0 2024-09-14 16:16:40,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.052e+02 2.227e+02 2.453e+02 4.284e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-14 16:16:58,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=152311.5, ans=0.125 2024-09-14 16:17:16,056 INFO [train.py:1198] (1/2) Epoch 9, batch 2650, loss[loss=0.2795, ctc_loss=0.1943, cr_loss=0.4256, over 20879.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1929, cr_loss=0.4013, over 4097124.38 frames. ], batch size: 57, lr: 9.41e-03, grad_scale: 16.0 2024-09-14 16:17:32,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=152396.5, ans=0.2 2024-09-14 16:18:28,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=152481.5, ans=0.125 2024-09-14 16:18:31,588 INFO [train.py:1198] (1/2) Epoch 9, batch 2700, loss[loss=0.3051, ctc_loss=0.2227, cr_loss=0.412, over 19591.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1934, cr_loss=0.4015, over 4089856.08 frames. ], batch size: 90, lr: 9.41e-03, grad_scale: 16.0 2024-09-14 16:18:39,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=152509.83333333334, ans=0.0 2024-09-14 16:19:13,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.173e+02 2.396e+02 2.767e+02 4.710e+02, threshold=4.792e+02, percent-clipped=1.0 2024-09-14 16:19:24,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=152594.83333333334, ans=0.125 2024-09-14 16:19:49,212 INFO [train.py:1198] (1/2) Epoch 9, batch 2750, loss[loss=0.2257, ctc_loss=0.157, cr_loss=0.3432, over 20938.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1943, cr_loss=0.4021, over 4079787.94 frames. ], batch size: 50, lr: 9.40e-03, grad_scale: 16.0 2024-09-14 16:20:14,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=152679.83333333334, ans=0.125 2024-09-14 16:20:26,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=12.0 2024-09-14 16:20:36,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2024-09-14 16:20:45,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-14 16:21:04,333 INFO [train.py:1198] (1/2) Epoch 9, batch 2800, loss[loss=0.281, ctc_loss=0.1956, cr_loss=0.4266, over 21078.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1945, cr_loss=0.4022, over 4074471.14 frames. ], batch size: 59, lr: 9.40e-03, grad_scale: 32.0 2024-09-14 16:21:09,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=152793.16666666666, ans=0.125 2024-09-14 16:21:36,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=152849.83333333334, ans=0.2 2024-09-14 16:21:46,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.030e+02 2.197e+02 2.409e+02 5.376e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-14 16:22:14,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=152906.5, ans=0.125 2024-09-14 16:22:23,047 INFO [train.py:1198] (1/2) Epoch 9, batch 2850, loss[loss=0.2832, ctc_loss=0.1984, cr_loss=0.4243, over 20671.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1934, cr_loss=0.4009, over 4091189.36 frames. ], batch size: 68, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:22:35,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=152934.83333333334, ans=0.0 2024-09-14 16:22:52,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=152991.5, ans=0.09899494936611666 2024-09-14 16:23:25,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=153048.16666666666, ans=0.0 2024-09-14 16:23:34,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153048.16666666666, ans=0.125 2024-09-14 16:23:39,078 INFO [train.py:1198] (1/2) Epoch 9, batch 2900, loss[loss=0.2288, ctc_loss=0.1562, cr_loss=0.3633, over 20978.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1936, cr_loss=0.4021, over 4095294.70 frames. ], batch size: 48, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:23:42,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=153076.5, ans=0.025 2024-09-14 16:24:18,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.031e+02 2.179e+02 2.346e+02 3.418e+02, threshold=4.358e+02, percent-clipped=0.0 2024-09-14 16:24:26,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153161.5, ans=0.1 2024-09-14 16:24:36,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153161.5, ans=0.125 2024-09-14 16:24:55,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=153218.16666666666, ans=0.0 2024-09-14 16:24:56,652 INFO [train.py:1198] (1/2) Epoch 9, batch 2950, loss[loss=0.2895, ctc_loss=0.2033, cr_loss=0.4313, over 21078.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1933, cr_loss=0.4012, over 4101180.76 frames. ], batch size: 59, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:25:04,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.05 vs. limit=10.0 2024-09-14 16:25:12,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-09-14 16:25:16,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=153246.5, ans=0.125 2024-09-14 16:25:37,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=153274.83333333334, ans=0.2 2024-09-14 16:26:11,940 INFO [train.py:1198] (1/2) Epoch 9, batch 3000, loss[loss=0.2548, ctc_loss=0.1812, cr_loss=0.3683, over 20957.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1926, cr_loss=0.401, over 4105042.96 frames. ], batch size: 58, lr: 9.38e-03, grad_scale: 32.0 2024-09-14 16:26:11,940 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 16:26:31,420 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.0567, ctc_loss=0.0567, cr_loss=9.377e-15, over 944034.00 frames. 2024-09-14 16:26:31,420 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 16:26:50,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-14 16:26:55,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=153388.16666666666, ans=0.0 2024-09-14 16:27:00,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=153388.16666666666, ans=0.2 2024-09-14 16:27:13,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.098e+02 2.324e+02 2.738e+02 4.478e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-14 16:27:43,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-14 16:27:44,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-14 16:27:50,197 INFO [train.py:1198] (1/2) Epoch 9, batch 3050, loss[loss=0.2568, ctc_loss=0.1827, cr_loss=0.3706, over 20829.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1936, cr_loss=0.4022, over 4090689.04 frames. ], batch size: 59, lr: 9.38e-03, grad_scale: 32.0 2024-09-14 16:27:59,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=153501.5, ans=0.2 2024-09-14 16:28:08,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=153529.83333333334, ans=0.2 2024-09-14 16:28:28,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153558.16666666666, ans=0.125 2024-09-14 16:28:44,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=153586.5, ans=0.05 2024-09-14 16:28:52,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153614.83333333334, ans=0.125 2024-09-14 16:29:05,992 INFO [train.py:1198] (1/2) Epoch 9, batch 3100, loss[loss=0.297, ctc_loss=0.2125, cr_loss=0.4227, over 20372.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1933, cr_loss=0.4021, over 4092904.74 frames. ], batch size: 74, lr: 9.37e-03, grad_scale: 32.0 2024-09-14 16:29:25,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153671.5, ans=0.125 2024-09-14 16:29:33,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153671.5, ans=0.125 2024-09-14 16:29:39,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=153699.83333333334, ans=0.125 2024-09-14 16:29:43,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=153699.83333333334, ans=0.0 2024-09-14 16:29:45,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.029e+02 2.142e+02 2.385e+02 4.083e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-14 16:29:45,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-14 16:30:23,334 INFO [train.py:1198] (1/2) Epoch 9, batch 3150, loss[loss=0.2675, ctc_loss=0.1844, cr_loss=0.4155, over 20955.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1926, cr_loss=0.402, over 4107769.70 frames. ], batch size: 64, lr: 9.37e-03, grad_scale: 32.0 2024-09-14 16:30:29,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=153784.83333333334, ans=0.2 2024-09-14 16:30:44,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=153813.16666666666, ans=0.0 2024-09-14 16:30:49,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153813.16666666666, ans=0.125 2024-09-14 16:30:57,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=153841.5, ans=0.2 2024-09-14 16:31:07,546 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:31:27,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-14 16:31:38,552 INFO [train.py:1198] (1/2) Epoch 9, batch 3200, loss[loss=0.2829, ctc_loss=0.2022, cr_loss=0.4036, over 20968.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1924, cr_loss=0.4013, over 4104918.79 frames. ], batch size: 58, lr: 9.36e-03, grad_scale: 32.0 2024-09-14 16:31:48,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153926.5, ans=0.125 2024-09-14 16:31:48,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=153926.5, ans=0.125 2024-09-14 16:32:12,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=153983.16666666666, ans=0.0 2024-09-14 16:32:15,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=153983.16666666666, ans=0.0 2024-09-14 16:32:16,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=153983.16666666666, ans=0.125 2024-09-14 16:32:17,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.060e+02 2.182e+02 2.345e+02 3.448e+02, threshold=4.365e+02, percent-clipped=0.0 2024-09-14 16:32:38,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=154011.5, ans=0.0 2024-09-14 16:32:38,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-14 16:32:57,397 INFO [train.py:1198] (1/2) Epoch 9, batch 3250, loss[loss=0.2932, ctc_loss=0.2063, cr_loss=0.4344, over 20695.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1928, cr_loss=0.402, over 4096826.24 frames. ], batch size: 71, lr: 9.36e-03, grad_scale: 16.0 2024-09-14 16:33:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=154096.5, ans=0.015 2024-09-14 16:33:20,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-14 16:33:22,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=154096.5, ans=0.025 2024-09-14 16:33:47,649 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:33:53,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=154153.16666666666, ans=0.0 2024-09-14 16:33:55,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=154153.16666666666, ans=0.0 2024-09-14 16:34:13,056 INFO [train.py:1198] (1/2) Epoch 9, batch 3300, loss[loss=0.2879, ctc_loss=0.2025, cr_loss=0.4271, over 20975.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1932, cr_loss=0.4021, over 4096064.64 frames. ], batch size: 58, lr: 9.36e-03, grad_scale: 16.0 2024-09-14 16:34:16,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154209.83333333334, ans=0.1 2024-09-14 16:34:16,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=154209.83333333334, ans=0.0 2024-09-14 16:34:29,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154238.16666666666, ans=0.125 2024-09-14 16:34:43,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=154266.5, ans=0.0 2024-09-14 16:34:53,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.132e+02 2.385e+02 2.654e+02 6.296e+02, threshold=4.769e+02, percent-clipped=1.0 2024-09-14 16:35:17,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=154323.16666666666, ans=0.125 2024-09-14 16:35:18,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-09-14 16:35:28,293 INFO [train.py:1198] (1/2) Epoch 9, batch 3350, loss[loss=0.2296, ctc_loss=0.1577, cr_loss=0.3594, over 21066.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1924, cr_loss=0.4014, over 4090140.16 frames. ], batch size: 53, lr: 9.35e-03, grad_scale: 16.0 2024-09-14 16:36:10,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-14 16:36:12,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154408.16666666666, ans=0.125 2024-09-14 16:36:38,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154464.83333333334, ans=0.1 2024-09-14 16:36:47,141 INFO [train.py:1198] (1/2) Epoch 9, batch 3400, loss[loss=0.2914, ctc_loss=0.2085, cr_loss=0.4143, over 21020.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1935, cr_loss=0.4025, over 4091983.61 frames. ], batch size: 62, lr: 9.35e-03, grad_scale: 16.0 2024-09-14 16:37:13,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=154521.5, ans=0.0 2024-09-14 16:37:18,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2024-09-14 16:37:22,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=154549.83333333334, ans=0.125 2024-09-14 16:37:25,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154549.83333333334, ans=0.125 2024-09-14 16:37:27,820 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.104e+02 2.223e+02 2.375e+02 5.219e+02, threshold=4.447e+02, percent-clipped=1.0 2024-09-14 16:37:35,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154578.16666666666, ans=0.125 2024-09-14 16:37:37,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154578.16666666666, ans=0.125 2024-09-14 16:37:43,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=154578.16666666666, ans=0.0 2024-09-14 16:38:00,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-14 16:38:02,528 INFO [train.py:1198] (1/2) Epoch 9, batch 3450, loss[loss=0.2494, ctc_loss=0.1744, cr_loss=0.3748, over 20989.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.194, cr_loss=0.4031, over 4071523.30 frames. ], batch size: 48, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:38:19,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154663.16666666666, ans=0.0 2024-09-14 16:38:29,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154663.16666666666, ans=0.125 2024-09-14 16:38:39,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2024-09-14 16:38:40,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=154691.5, ans=0.025 2024-09-14 16:38:57,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=154719.83333333334, ans=0.125 2024-09-14 16:39:18,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=154748.16666666666, ans=0.2 2024-09-14 16:39:21,560 INFO [train.py:1198] (1/2) Epoch 9, batch 3500, loss[loss=0.2388, ctc_loss=0.166, cr_loss=0.3638, over 19842.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1945, cr_loss=0.4038, over 4075330.38 frames. ], batch size: 44, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:40:02,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.092e+02 2.306e+02 2.543e+02 5.355e+02, threshold=4.612e+02, percent-clipped=1.0 2024-09-14 16:40:23,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=154889.83333333334, ans=0.05 2024-09-14 16:40:37,012 INFO [train.py:1198] (1/2) Epoch 9, batch 3550, loss[loss=0.2406, ctc_loss=0.1652, cr_loss=0.3771, over 20773.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1939, cr_loss=0.4026, over 4083332.76 frames. ], batch size: 53, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:40:38,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=154918.16666666666, ans=0.2 2024-09-14 16:41:38,294 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:41:54,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2024-09-14 16:41:55,696 INFO [train.py:1198] (1/2) Epoch 9, batch 3600, loss[loss=0.2876, ctc_loss=0.2063, cr_loss=0.4064, over 20255.00 frames. ], tot_loss[loss=0.2743, ctc_loss=0.1938, cr_loss=0.4023, over 4091914.85 frames. ], batch size: 74, lr: 9.33e-03, grad_scale: 32.0 2024-09-14 16:42:38,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.137e+02 2.269e+02 2.630e+02 5.003e+02, threshold=4.537e+02, percent-clipped=1.0 2024-09-14 16:42:46,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=155144.83333333334, ans=0.025 2024-09-14 16:42:49,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=155144.83333333334, ans=0.125 2024-09-14 16:42:58,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155173.16666666666, ans=0.1 2024-09-14 16:43:02,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-09-14 16:43:12,301 INFO [train.py:1198] (1/2) Epoch 9, batch 3650, loss[loss=0.327, ctc_loss=0.2323, cr_loss=0.4735, over 20659.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1937, cr_loss=0.4023, over 4089060.25 frames. ], batch size: 71, lr: 9.33e-03, grad_scale: 16.0 2024-09-14 16:43:17,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155201.5, ans=0.125 2024-09-14 16:44:30,829 INFO [train.py:1198] (1/2) Epoch 9, batch 3700, loss[loss=0.3135, ctc_loss=0.2237, cr_loss=0.4489, over 20658.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1955, cr_loss=0.4037, over 4072218.23 frames. ], batch size: 66, lr: 9.32e-03, grad_scale: 16.0 2024-09-14 16:44:44,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-09-14 16:44:48,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-14 16:45:00,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=155399.83333333334, ans=0.0 2024-09-14 16:45:06,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=155399.83333333334, ans=0.125 2024-09-14 16:45:13,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.066e+02 2.239e+02 2.559e+02 3.668e+02, threshold=4.478e+02, percent-clipped=0.0 2024-09-14 16:45:13,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=155399.83333333334, ans=0.025 2024-09-14 16:45:42,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-14 16:45:46,256 INFO [train.py:1198] (1/2) Epoch 9, batch 3750, loss[loss=0.2424, ctc_loss=0.168, cr_loss=0.3721, over 20965.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1955, cr_loss=0.4039, over 4071547.02 frames. ], batch size: 55, lr: 9.32e-03, grad_scale: 16.0 2024-09-14 16:46:12,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=155513.16666666666, ans=0.0 2024-09-14 16:46:34,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=155569.83333333334, ans=0.125 2024-09-14 16:46:35,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-09-14 16:46:43,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=155569.83333333334, ans=0.0 2024-09-14 16:46:49,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.03 vs. limit=15.0 2024-09-14 16:47:01,812 INFO [train.py:1198] (1/2) Epoch 9, batch 3800, loss[loss=0.2301, ctc_loss=0.159, cr_loss=0.3557, over 20956.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1948, cr_loss=0.4029, over 4069053.26 frames. ], batch size: 51, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:47:03,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=155626.5, ans=0.125 2024-09-14 16:47:47,281 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.740e+02 2.153e+02 2.340e+02 2.738e+02 4.041e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-14 16:47:49,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155711.5, ans=0.125 2024-09-14 16:48:20,326 INFO [train.py:1198] (1/2) Epoch 9, batch 3850, loss[loss=0.3074, ctc_loss=0.2166, cr_loss=0.4541, over 19543.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1947, cr_loss=0.4037, over 4082810.70 frames. ], batch size: 90, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:48:50,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155824.83333333334, ans=0.1 2024-09-14 16:48:55,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=155824.83333333334, ans=15.0 2024-09-14 16:48:58,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=155824.83333333334, ans=0.0 2024-09-14 16:49:39,406 INFO [train.py:1198] (1/2) Epoch 9, batch 3900, loss[loss=0.2665, ctc_loss=0.1869, cr_loss=0.3981, over 21010.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1938, cr_loss=0.4027, over 4098059.12 frames. ], batch size: 61, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:49:43,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.69 vs. limit=10.0 2024-09-14 16:49:47,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=155909.83333333334, ans=0.07 2024-09-14 16:49:53,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=155938.16666666666, ans=0.125 2024-09-14 16:50:02,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=155938.16666666666, ans=0.025 2024-09-14 16:50:08,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=22.5 2024-09-14 16:50:17,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=155966.5, ans=0.125 2024-09-14 16:50:17,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=155966.5, ans=0.04949747468305833 2024-09-14 16:50:21,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.086e+02 2.271e+02 2.677e+02 3.517e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-14 16:50:54,608 INFO [train.py:1198] (1/2) Epoch 9, batch 3950, loss[loss=0.3046, ctc_loss=0.2148, cr_loss=0.4495, over 20676.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1928, cr_loss=0.4011, over 4095434.78 frames. ], batch size: 68, lr: 9.30e-03, grad_scale: 16.0 2024-09-14 16:51:36,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=156108.16666666666, ans=0.0 2024-09-14 16:51:42,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-09-14 16:51:51,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156136.5, ans=0.125 2024-09-14 16:51:56,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=156164.83333333334, ans=0.04949747468305833 2024-09-14 16:52:09,075 INFO [train.py:1198] (1/2) Epoch 9, batch 4000, loss[loss=0.3212, ctc_loss=0.2342, cr_loss=0.4347, over 19990.00 frames. ], tot_loss[loss=0.2743, ctc_loss=0.1938, cr_loss=0.4025, over 4087945.48 frames. ], batch size: 80, lr: 9.30e-03, grad_scale: 32.0 2024-09-14 16:52:12,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=156193.16666666666, ans=0.09899494936611666 2024-09-14 16:52:23,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156221.5, ans=0.1 2024-09-14 16:52:29,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=156221.5, ans=0.0 2024-09-14 16:52:47,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=156249.83333333334, ans=0.125 2024-09-14 16:52:51,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.125e+02 2.308e+02 2.600e+02 3.947e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-14 16:52:52,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=156249.83333333334, ans=0.0 2024-09-14 16:53:05,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=156278.16666666666, ans=0.2 2024-09-14 16:53:13,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=156306.5, ans=0.125 2024-09-14 16:53:28,037 INFO [train.py:1198] (1/2) Epoch 9, batch 4050, loss[loss=0.2881, ctc_loss=0.2045, cr_loss=0.4184, over 20671.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1934, cr_loss=0.4022, over 4085680.42 frames. ], batch size: 68, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:53:35,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=156334.83333333334, ans=0.125 2024-09-14 16:53:38,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=156334.83333333334, ans=0.2 2024-09-14 16:54:10,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=156391.5, ans=0.025 2024-09-14 16:54:14,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-14 16:54:44,146 INFO [train.py:1198] (1/2) Epoch 9, batch 4100, loss[loss=0.2712, ctc_loss=0.1926, cr_loss=0.393, over 20892.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1943, cr_loss=0.403, over 4085061.64 frames. ], batch size: 54, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:54:45,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=156476.5, ans=0.025 2024-09-14 16:55:01,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156504.83333333334, ans=0.125 2024-09-14 16:55:25,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=156533.16666666666, ans=0.025 2024-09-14 16:55:29,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.077e+02 2.209e+02 2.480e+02 3.413e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-14 16:55:51,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=156589.83333333334, ans=0.025 2024-09-14 16:56:03,189 INFO [train.py:1198] (1/2) Epoch 9, batch 4150, loss[loss=0.292, ctc_loss=0.2061, cr_loss=0.4298, over 20826.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.193, cr_loss=0.4018, over 4100871.52 frames. ], batch size: 59, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:56:05,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=156618.16666666666, ans=0.025 2024-09-14 16:56:30,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156646.5, ans=0.125 2024-09-14 16:57:10,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=156731.5, ans=0.025 2024-09-14 16:57:18,881 INFO [train.py:1198] (1/2) Epoch 9, batch 4200, loss[loss=0.2969, ctc_loss=0.2148, cr_loss=0.4104, over 20978.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1928, cr_loss=0.4014, over 4100679.20 frames. ], batch size: 67, lr: 9.28e-03, grad_scale: 32.0 2024-09-14 16:57:32,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156788.16666666666, ans=0.125 2024-09-14 16:58:00,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.94 vs. limit=15.0 2024-09-14 16:58:01,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.160e+02 2.352e+02 2.615e+02 4.061e+02, threshold=4.705e+02, percent-clipped=0.0 2024-09-14 16:58:04,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-14 16:58:19,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=156873.16666666666, ans=0.125 2024-09-14 16:58:22,532 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:58:34,391 INFO [train.py:1198] (1/2) Epoch 9, batch 4250, loss[loss=0.268, ctc_loss=0.1873, cr_loss=0.4035, over 20765.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1943, cr_loss=0.4032, over 4088693.85 frames. ], batch size: 53, lr: 9.28e-03, grad_scale: 32.0 2024-09-14 16:58:54,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-14 16:58:55,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156929.83333333334, ans=0.125 2024-09-14 16:59:12,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-14 16:59:24,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=156986.5, ans=0.125 2024-09-14 16:59:27,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=156986.5, ans=0.0 2024-09-14 16:59:52,932 INFO [train.py:1198] (1/2) Epoch 9, batch 4300, loss[loss=0.2786, ctc_loss=0.1971, cr_loss=0.4077, over 20882.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1932, cr_loss=0.4021, over 4101810.56 frames. ], batch size: 57, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 17:00:04,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=157043.16666666666, ans=0.07 2024-09-14 17:00:09,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=157071.5, ans=0.2 2024-09-14 17:00:35,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.080e+02 2.212e+02 2.497e+02 3.875e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-14 17:01:12,039 INFO [train.py:1198] (1/2) Epoch 9, batch 4350, loss[loss=0.2285, ctc_loss=0.1593, cr_loss=0.3461, over 20968.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1924, cr_loss=0.4006, over 4103568.03 frames. ], batch size: 49, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 17:01:21,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157184.83333333334, ans=0.1 2024-09-14 17:01:26,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=157213.16666666666, ans=0.125 2024-09-14 17:01:44,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=157241.5, ans=0.125 2024-09-14 17:02:16,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=157298.16666666666, ans=0.125 2024-09-14 17:02:16,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=157298.16666666666, ans=0.125 2024-09-14 17:02:28,190 INFO [train.py:1198] (1/2) Epoch 9, batch 4400, loss[loss=0.2781, ctc_loss=0.1997, cr_loss=0.3918, over 19426.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1925, cr_loss=0.4007, over 4102191.80 frames. ], batch size: 90, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 17:02:44,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-09-14 17:02:55,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157354.83333333334, ans=0.0 2024-09-14 17:02:59,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157383.16666666666, ans=0.1 2024-09-14 17:03:02,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=157383.16666666666, ans=0.0 2024-09-14 17:03:10,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.662e+02 2.086e+02 2.306e+02 2.587e+02 4.199e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-14 17:03:19,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=157411.5, ans=15.0 2024-09-14 17:03:43,257 INFO [train.py:1198] (1/2) Epoch 9, batch 4450, loss[loss=0.2892, ctc_loss=0.202, cr_loss=0.4362, over 20966.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1921, cr_loss=0.4002, over 4105591.22 frames. ], batch size: 64, lr: 9.26e-03, grad_scale: 32.0 2024-09-14 17:03:46,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=157468.16666666666, ans=0.0 2024-09-14 17:04:03,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-14 17:04:09,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=157496.5, ans=0.2 2024-09-14 17:04:14,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-14 17:05:00,927 INFO [train.py:1198] (1/2) Epoch 9, batch 4500, loss[loss=0.2468, ctc_loss=0.1744, cr_loss=0.3625, over 20994.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1928, cr_loss=0.4007, over 4088153.73 frames. ], batch size: 55, lr: 9.26e-03, grad_scale: 32.0 2024-09-14 17:05:08,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=157609.83333333334, ans=0.125 2024-09-14 17:05:16,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=157638.16666666666, ans=0.0 2024-09-14 17:05:43,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.115e+02 2.292e+02 2.578e+02 7.609e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-14 17:05:57,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=22.5 2024-09-14 17:06:09,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=157723.16666666666, ans=0.0 2024-09-14 17:06:16,636 INFO [train.py:1198] (1/2) Epoch 9, batch 4550, loss[loss=0.2528, ctc_loss=0.176, cr_loss=0.384, over 21019.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1925, cr_loss=0.4002, over 4087300.05 frames. ], batch size: 52, lr: 9.25e-03, grad_scale: 32.0 2024-09-14 17:06:42,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-14 17:06:51,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-09-14 17:06:54,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=157808.16666666666, ans=0.2 2024-09-14 17:06:57,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=157808.16666666666, ans=0.125 2024-09-14 17:07:03,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2024-09-14 17:07:34,346 INFO [train.py:1198] (1/2) Epoch 9, batch 4600, loss[loss=0.278, ctc_loss=0.1942, cr_loss=0.4191, over 20885.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1922, cr_loss=0.4011, over 4092970.57 frames. ], batch size: 54, lr: 9.25e-03, grad_scale: 32.0 2024-09-14 17:07:52,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=157921.5, ans=0.0 2024-09-14 17:08:07,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-14 17:08:13,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=157949.83333333334, ans=0.05 2024-09-14 17:08:16,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.028e+02 2.220e+02 2.385e+02 3.296e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-14 17:08:36,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=158006.5, ans=0.2 2024-09-14 17:08:41,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-09-14 17:08:49,504 INFO [train.py:1198] (1/2) Epoch 9, batch 4650, loss[loss=0.3027, ctc_loss=0.2151, cr_loss=0.4384, over 20695.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1921, cr_loss=0.4011, over 4098793.15 frames. ], batch size: 66, lr: 9.24e-03, grad_scale: 32.0 2024-09-14 17:09:06,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=158063.16666666666, ans=0.125 2024-09-14 17:09:14,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=10.0 2024-09-14 17:09:15,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=158063.16666666666, ans=0.0 2024-09-14 17:09:21,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158091.5, ans=0.0 2024-09-14 17:10:08,667 INFO [train.py:1198] (1/2) Epoch 9, batch 4700, loss[loss=0.2232, ctc_loss=0.1564, cr_loss=0.3342, over 19431.00 frames. ], tot_loss[loss=0.2722, ctc_loss=0.1921, cr_loss=0.4003, over 4094487.48 frames. ], batch size: 43, lr: 9.24e-03, grad_scale: 32.0 2024-09-14 17:10:50,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.080e+02 2.241e+02 2.591e+02 3.565e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-14 17:10:58,464 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:11:00,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=158261.5, ans=0.0 2024-09-14 17:11:13,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=158289.83333333334, ans=0.025 2024-09-14 17:11:22,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=158318.16666666666, ans=0.025 2024-09-14 17:11:23,597 INFO [train.py:1198] (1/2) Epoch 9, batch 4750, loss[loss=0.2707, ctc_loss=0.1903, cr_loss=0.4021, over 21061.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1924, cr_loss=0.4007, over 4099781.97 frames. ], batch size: 56, lr: 9.24e-03, grad_scale: 16.0 2024-09-14 17:11:28,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=158318.16666666666, ans=0.125 2024-09-14 17:11:34,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=158318.16666666666, ans=0.0 2024-09-14 17:11:34,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-14 17:11:43,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=158346.5, ans=0.0 2024-09-14 17:11:43,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=22.5 2024-09-14 17:11:58,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=158374.83333333334, ans=0.125 2024-09-14 17:11:58,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=158374.83333333334, ans=0.125 2024-09-14 17:12:04,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=158374.83333333334, ans=0.125 2024-09-14 17:12:23,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=158403.16666666666, ans=0.0 2024-09-14 17:12:41,527 INFO [train.py:1198] (1/2) Epoch 9, batch 4800, loss[loss=0.3042, ctc_loss=0.2173, cr_loss=0.4349, over 21073.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1925, cr_loss=0.4013, over 4109300.50 frames. ], batch size: 62, lr: 9.23e-03, grad_scale: 32.0 2024-09-14 17:12:53,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=158459.83333333334, ans=0.125 2024-09-14 17:13:06,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=158488.16666666666, ans=0.0 2024-09-14 17:13:13,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=158516.5, ans=0.0 2024-09-14 17:13:16,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158516.5, ans=0.125 2024-09-14 17:13:19,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=158516.5, ans=0.125 2024-09-14 17:13:25,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.078e+02 2.292e+02 2.536e+02 3.600e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-14 17:13:28,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=158544.83333333334, ans=0.125 2024-09-14 17:13:30,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=158544.83333333334, ans=0.04949747468305833 2024-09-14 17:13:45,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=158573.16666666666, ans=10.0 2024-09-14 17:13:57,263 INFO [train.py:1198] (1/2) Epoch 9, batch 4850, loss[loss=0.2697, ctc_loss=0.189, cr_loss=0.404, over 20689.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.192, cr_loss=0.4007, over 4100636.20 frames. ], batch size: 71, lr: 9.23e-03, grad_scale: 32.0 2024-09-14 17:14:05,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=158601.5, ans=0.125 2024-09-14 17:14:42,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=158686.5, ans=0.02 2024-09-14 17:14:50,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=158686.5, ans=0.125 2024-09-14 17:15:12,778 INFO [train.py:1198] (1/2) Epoch 9, batch 4900, loss[loss=0.2458, ctc_loss=0.1672, cr_loss=0.393, over 20998.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1908, cr_loss=0.399, over 4104990.55 frames. ], batch size: 48, lr: 9.22e-03, grad_scale: 32.0 2024-09-14 17:15:26,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158771.5, ans=0.1 2024-09-14 17:15:55,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.129e+02 2.311e+02 2.603e+02 4.263e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-14 17:16:29,624 INFO [train.py:1198] (1/2) Epoch 9, batch 4950, loss[loss=0.2711, ctc_loss=0.1914, cr_loss=0.3985, over 21051.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1912, cr_loss=0.3989, over 4088349.61 frames. ], batch size: 56, lr: 9.22e-03, grad_scale: 32.0 2024-09-14 17:16:30,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=158884.83333333334, ans=0.125 2024-09-14 17:16:43,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2024-09-14 17:17:29,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-14 17:17:44,330 INFO [train.py:1198] (1/2) Epoch 9, batch 5000, loss[loss=0.26, ctc_loss=0.1846, cr_loss=0.3769, over 21071.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.191, cr_loss=0.3981, over 4082771.85 frames. ], batch size: 59, lr: 9.22e-03, grad_scale: 16.0 2024-09-14 17:17:50,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-09-14 17:18:08,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=159054.83333333334, ans=0.025 2024-09-14 17:18:28,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.125e+02 2.345e+02 2.595e+02 9.253e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-14 17:18:58,618 INFO [train.py:1198] (1/2) Epoch 9, batch 5050, loss[loss=0.2483, ctc_loss=0.1704, cr_loss=0.3893, over 20956.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1901, cr_loss=0.398, over 4092394.02 frames. ], batch size: 51, lr: 9.21e-03, grad_scale: 16.0 2024-09-14 17:19:01,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=8.0 2024-09-14 17:19:04,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=159168.16666666666, ans=0.0 2024-09-14 17:19:34,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=159224.83333333334, ans=0.125 2024-09-14 17:19:37,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=159224.83333333334, ans=0.125 2024-09-14 17:19:45,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159253.16666666666, ans=0.125 2024-09-14 17:20:11,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=159281.5, ans=0.0 2024-09-14 17:20:15,727 INFO [train.py:1198] (1/2) Epoch 9, batch 5100, loss[loss=0.2798, ctc_loss=0.1955, cr_loss=0.4211, over 20675.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1922, cr_loss=0.4011, over 4092149.54 frames. ], batch size: 66, lr: 9.21e-03, grad_scale: 16.0 2024-09-14 17:20:16,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=159309.83333333334, ans=0.05 2024-09-14 17:20:19,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=159309.83333333334, ans=0.04949747468305833 2024-09-14 17:20:25,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159309.83333333334, ans=0.95 2024-09-14 17:21:00,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.099e+02 2.285e+02 2.574e+02 3.012e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-14 17:21:29,873 INFO [train.py:1198] (1/2) Epoch 9, batch 5150, loss[loss=0.2766, ctc_loss=0.1962, cr_loss=0.402, over 21074.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1936, cr_loss=0.4022, over 4078233.87 frames. ], batch size: 59, lr: 9.20e-03, grad_scale: 16.0 2024-09-14 17:22:17,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=159536.5, ans=0.125 2024-09-14 17:22:21,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=159536.5, ans=0.0 2024-09-14 17:22:24,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=159536.5, ans=0.0 2024-09-14 17:22:27,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=159564.83333333334, ans=0.025 2024-09-14 17:22:43,749 INFO [train.py:1198] (1/2) Epoch 9, batch 5200, loss[loss=0.2517, ctc_loss=0.1747, cr_loss=0.385, over 21030.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1924, cr_loss=0.4008, over 4081400.64 frames. ], batch size: 56, lr: 9.20e-03, grad_scale: 32.0 2024-09-14 17:22:54,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159593.16666666666, ans=0.1 2024-09-14 17:23:27,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.078e+02 2.268e+02 2.552e+02 4.758e+02, threshold=4.536e+02, percent-clipped=1.0 2024-09-14 17:23:29,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159678.16666666666, ans=0.1 2024-09-14 17:23:57,269 INFO [train.py:1198] (1/2) Epoch 9, batch 5250, loss[loss=0.2269, ctc_loss=0.1567, cr_loss=0.3512, over 20912.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1923, cr_loss=0.401, over 4081514.56 frames. ], batch size: 50, lr: 9.20e-03, grad_scale: 32.0 2024-09-14 17:23:59,199 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:24:06,791 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:24:06,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159734.83333333334, ans=0.125 2024-09-14 17:24:09,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=159734.83333333334, ans=0.04949747468305833 2024-09-14 17:24:52,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=159819.83333333334, ans=0.125 2024-09-14 17:24:58,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=159848.16666666666, ans=0.0 2024-09-14 17:24:58,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=159848.16666666666, ans=0.125 2024-09-14 17:24:58,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=159848.16666666666, ans=0.5 2024-09-14 17:25:11,241 INFO [train.py:1198] (1/2) Epoch 9, batch 5300, loss[loss=0.2524, ctc_loss=0.1781, cr_loss=0.3713, over 21077.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1936, cr_loss=0.4019, over 4066850.31 frames. ], batch size: 59, lr: 9.19e-03, grad_scale: 32.0 2024-09-14 17:25:20,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=159876.5, ans=0.125 2024-09-14 17:25:20,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=159876.5, ans=0.0 2024-09-14 17:25:37,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-09-14 17:25:38,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=159904.83333333334, ans=0.125 2024-09-14 17:25:55,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.100e+02 2.340e+02 2.644e+02 4.163e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-14 17:26:18,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=159989.83333333334, ans=0.125 2024-09-14 17:26:27,133 INFO [train.py:1198] (1/2) Epoch 9, batch 5350, loss[loss=0.2779, ctc_loss=0.1964, cr_loss=0.4075, over 21020.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1935, cr_loss=0.4016, over 4072999.34 frames. ], batch size: 62, lr: 9.19e-03, grad_scale: 32.0 2024-09-14 17:26:48,424 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:27:00,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=160074.83333333334, ans=0.0 2024-09-14 17:27:17,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=8.0 2024-09-14 17:27:22,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=160103.16666666666, ans=0.2 2024-09-14 17:27:42,038 INFO [train.py:1198] (1/2) Epoch 9, batch 5400, loss[loss=0.2904, ctc_loss=0.2068, cr_loss=0.4183, over 20165.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1927, cr_loss=0.3998, over 4076237.57 frames. ], batch size: 80, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:28:13,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=160216.5, ans=0.125 2024-09-14 17:28:16,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=160216.5, ans=0.0 2024-09-14 17:28:23,901 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:28:25,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=160244.83333333334, ans=0.125 2024-09-14 17:28:26,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.050e+02 2.236e+02 2.556e+02 3.213e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-14 17:28:32,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=160244.83333333334, ans=0.125 2024-09-14 17:28:56,238 INFO [train.py:1198] (1/2) Epoch 9, batch 5450, loss[loss=0.269, ctc_loss=0.195, cr_loss=0.3703, over 20997.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1929, cr_loss=0.4005, over 4083073.90 frames. ], batch size: 55, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:28:58,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-14 17:29:09,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160301.5, ans=0.1 2024-09-14 17:29:20,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-14 17:29:25,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=160329.83333333334, ans=0.2 2024-09-14 17:29:49,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160386.5, ans=0.125 2024-09-14 17:30:12,596 INFO [train.py:1198] (1/2) Epoch 9, batch 5500, loss[loss=0.2868, ctc_loss=0.2024, cr_loss=0.4224, over 20777.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1932, cr_loss=0.4012, over 4094235.96 frames. ], batch size: 56, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:30:14,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=160443.16666666666, ans=0.0 2024-09-14 17:30:32,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=160471.5, ans=0.125 2024-09-14 17:30:40,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=160471.5, ans=0.125 2024-09-14 17:30:47,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160499.83333333334, ans=0.125 2024-09-14 17:30:57,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.094e+02 2.232e+02 2.492e+02 3.807e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-14 17:31:14,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-14 17:31:20,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=160556.5, ans=0.05 2024-09-14 17:31:27,368 INFO [train.py:1198] (1/2) Epoch 9, batch 5550, loss[loss=0.2881, ctc_loss=0.2065, cr_loss=0.4078, over 20973.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1929, cr_loss=0.4008, over 4100038.28 frames. ], batch size: 64, lr: 9.17e-03, grad_scale: 32.0 2024-09-14 17:31:42,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160613.16666666666, ans=0.1 2024-09-14 17:32:34,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=160698.16666666666, ans=0.2 2024-09-14 17:32:35,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=160698.16666666666, ans=0.125 2024-09-14 17:32:41,018 INFO [train.py:1198] (1/2) Epoch 9, batch 5600, loss[loss=0.3021, ctc_loss=0.2142, cr_loss=0.4396, over 20858.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1942, cr_loss=0.4031, over 4077773.81 frames. ], batch size: 65, lr: 9.17e-03, grad_scale: 32.0 2024-09-14 17:32:50,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2024-09-14 17:32:51,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=160726.5, ans=0.0 2024-09-14 17:33:21,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-09-14 17:33:25,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.093e+02 2.245e+02 2.529e+02 3.867e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-14 17:33:50,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=160839.83333333334, ans=0.0 2024-09-14 17:33:54,938 INFO [train.py:1198] (1/2) Epoch 9, batch 5650, loss[loss=0.349, ctc_loss=0.2577, cr_loss=0.4567, over 14269.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1945, cr_loss=0.4034, over 4072712.28 frames. ], batch size: 149, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:34:04,290 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:35:11,472 INFO [train.py:1198] (1/2) Epoch 9, batch 5700, loss[loss=0.2759, ctc_loss=0.1941, cr_loss=0.409, over 20944.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.194, cr_loss=0.4031, over 4077748.43 frames. ], batch size: 64, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:35:13,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=161009.83333333334, ans=0.125 2024-09-14 17:35:25,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-14 17:35:50,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-14 17:35:55,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.044e+02 2.225e+02 2.584e+02 3.392e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-14 17:36:11,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=161123.16666666666, ans=0.0 2024-09-14 17:36:15,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=161123.16666666666, ans=0.0 2024-09-14 17:36:25,756 INFO [train.py:1198] (1/2) Epoch 9, batch 5750, loss[loss=0.2627, ctc_loss=0.1847, cr_loss=0.3904, over 20826.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1926, cr_loss=0.4021, over 4096919.49 frames. ], batch size: 59, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:36:44,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-09-14 17:37:13,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-14 17:37:37,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=161264.83333333334, ans=0.125 2024-09-14 17:37:41,845 INFO [train.py:1198] (1/2) Epoch 9, batch 5800, loss[loss=0.2893, ctc_loss=0.202, cr_loss=0.4368, over 20936.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.193, cr_loss=0.4024, over 4093274.24 frames. ], batch size: 60, lr: 9.15e-03, grad_scale: 32.0 2024-09-14 17:37:46,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=161293.16666666666, ans=0.125 2024-09-14 17:38:26,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.164e+02 2.328e+02 2.619e+02 4.158e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-14 17:38:41,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161406.5, ans=0.1 2024-09-14 17:38:47,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=161406.5, ans=0.0 2024-09-14 17:38:47,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-09-14 17:38:56,078 INFO [train.py:1198] (1/2) Epoch 9, batch 5850, loss[loss=0.246, ctc_loss=0.1693, cr_loss=0.3834, over 20772.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1925, cr_loss=0.4015, over 4103035.25 frames. ], batch size: 56, lr: 9.15e-03, grad_scale: 32.0 2024-09-14 17:39:03,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161434.83333333334, ans=0.1 2024-09-14 17:39:03,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=161434.83333333334, ans=0.2 2024-09-14 17:40:09,861 INFO [train.py:1198] (1/2) Epoch 9, batch 5900, loss[loss=0.3385, ctc_loss=0.2524, cr_loss=0.4303, over 14076.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1931, cr_loss=0.402, over 4085038.26 frames. ], batch size: 149, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:40:54,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.098e+02 2.330e+02 2.680e+02 3.765e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-14 17:41:01,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-14 17:41:23,537 INFO [train.py:1198] (1/2) Epoch 9, batch 5950, loss[loss=0.2716, ctc_loss=0.1876, cr_loss=0.4198, over 20829.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1934, cr_loss=0.4027, over 4084312.85 frames. ], batch size: 59, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:41:29,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=161718.16666666666, ans=0.0 2024-09-14 17:41:38,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=161746.5, ans=0.05 2024-09-14 17:42:10,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=161803.16666666666, ans=0.035 2024-09-14 17:42:16,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=161803.16666666666, ans=0.125 2024-09-14 17:42:37,647 INFO [train.py:1198] (1/2) Epoch 9, batch 6000, loss[loss=0.254, ctc_loss=0.1759, cr_loss=0.3902, over 20892.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1926, cr_loss=0.4018, over 4102187.15 frames. ], batch size: 54, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:42:37,648 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 17:42:58,466 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.05442, ctc_loss=0.05442, cr_loss=9.512e-15, over 944034.00 frames. 2024-09-14 17:42:58,467 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 17:43:03,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2024-09-14 17:43:10,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=161859.83333333334, ans=0.125 2024-09-14 17:43:28,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=161916.5, ans=0.125 2024-09-14 17:43:42,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.065e+02 2.261e+02 2.727e+02 3.866e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-14 17:44:13,202 INFO [train.py:1198] (1/2) Epoch 9, batch 6050, loss[loss=0.256, ctc_loss=0.1794, cr_loss=0.3832, over 20855.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1929, cr_loss=0.4018, over 4083362.95 frames. ], batch size: 57, lr: 9.13e-03, grad_scale: 32.0 2024-09-14 17:44:43,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=162058.16666666666, ans=0.0 2024-09-14 17:45:15,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=162114.83333333334, ans=0.0 2024-09-14 17:45:22,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=162114.83333333334, ans=0.0 2024-09-14 17:45:28,295 INFO [train.py:1198] (1/2) Epoch 9, batch 6100, loss[loss=0.251, ctc_loss=0.1764, cr_loss=0.3729, over 20952.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1927, cr_loss=0.4013, over 4074400.74 frames. ], batch size: 58, lr: 9.13e-03, grad_scale: 32.0 2024-09-14 17:46:13,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.083e+02 2.381e+02 2.787e+02 4.151e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-14 17:46:28,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=162256.5, ans=0.0 2024-09-14 17:46:42,817 INFO [train.py:1198] (1/2) Epoch 9, batch 6150, loss[loss=0.2977, ctc_loss=0.2094, cr_loss=0.4414, over 21066.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1926, cr_loss=0.4018, over 4081839.44 frames. ], batch size: 62, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:46:52,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-14 17:47:40,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=162398.16666666666, ans=0.0 2024-09-14 17:47:50,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-09-14 17:47:56,905 INFO [train.py:1198] (1/2) Epoch 9, batch 6200, loss[loss=0.2295, ctc_loss=0.1596, cr_loss=0.3498, over 19947.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1927, cr_loss=0.4012, over 4072427.65 frames. ], batch size: 44, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:48:04,893 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:48:25,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=162483.16666666666, ans=0.125 2024-09-14 17:48:39,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=162483.16666666666, ans=0.09899494936611666 2024-09-14 17:48:41,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.049e+02 2.171e+02 2.420e+02 4.623e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-14 17:48:49,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=162511.5, ans=0.2 2024-09-14 17:48:58,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=162539.83333333334, ans=0.125 2024-09-14 17:49:11,300 INFO [train.py:1198] (1/2) Epoch 9, batch 6250, loss[loss=0.2783, ctc_loss=0.199, cr_loss=0.3964, over 21042.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1923, cr_loss=0.4009, over 4058251.48 frames. ], batch size: 62, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:49:25,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=162596.5, ans=0.5 2024-09-14 17:49:42,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162624.83333333334, ans=0.1 2024-09-14 17:50:10,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162681.5, ans=0.1 2024-09-14 17:50:26,171 INFO [train.py:1198] (1/2) Epoch 9, batch 6300, loss[loss=0.2782, ctc_loss=0.1927, cr_loss=0.4273, over 20961.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1932, cr_loss=0.4012, over 4030497.25 frames. ], batch size: 55, lr: 9.11e-03, grad_scale: 32.0 2024-09-14 17:50:26,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.71 vs. limit=10.0 2024-09-14 17:50:34,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2024-09-14 17:50:42,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-09-14 17:50:54,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=162766.5, ans=0.125 2024-09-14 17:50:59,782 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:51:08,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.115e+02 2.309e+02 2.708e+02 4.561e+02, threshold=4.619e+02, percent-clipped=1.0 2024-09-14 17:51:36,043 INFO [train.py:1198] (1/2) Epoch 9, batch 6350, loss[loss=0.2994, ctc_loss=0.2209, cr_loss=0.3923, over 14110.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.2005, cr_loss=0.4045, over 3870871.87 frames. ], batch size: 150, lr: 9.11e-03, grad_scale: 32.0 2024-09-14 17:52:16,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-14 17:52:17,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=162936.5, ans=0.0 2024-09-14 17:53:20,916 INFO [train.py:1198] (1/2) Epoch 10, batch 0, loss[loss=0.2659, ctc_loss=0.1875, cr_loss=0.3919, over 21057.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1875, cr_loss=0.3919, over 21057.00 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 32.0 2024-09-14 17:53:20,917 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 17:53:39,080 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.05531, ctc_loss=0.05531, cr_loss=9.031e-15, over 944034.00 frames. 2024-09-14 17:53:39,081 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 17:53:47,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-09-14 17:54:41,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.186e+02 2.441e+02 2.681e+02 3.994e+02, threshold=4.882e+02, percent-clipped=0.0 2024-09-14 17:54:43,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=163078.16666666666, ans=0.125 2024-09-14 17:54:56,164 INFO [train.py:1198] (1/2) Epoch 10, batch 50, loss[loss=0.2846, ctc_loss=0.1997, cr_loss=0.4244, over 20953.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1912, cr_loss=0.4, over 918664.07 frames. ], batch size: 58, lr: 8.66e-03, grad_scale: 32.0 2024-09-14 17:55:11,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=163134.83333333334, ans=0.125 2024-09-14 17:55:20,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=163134.83333333334, ans=0.2 2024-09-14 17:55:24,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=163163.16666666666, ans=0.125 2024-09-14 17:55:24,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=163163.16666666666, ans=0.125 2024-09-14 17:55:29,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=163163.16666666666, ans=0.125 2024-09-14 17:55:35,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2024-09-14 17:55:44,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=163191.5, ans=0.125 2024-09-14 17:56:05,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=163219.83333333334, ans=0.125 2024-09-14 17:56:11,375 INFO [train.py:1198] (1/2) Epoch 10, batch 100, loss[loss=0.2576, ctc_loss=0.1798, cr_loss=0.3889, over 21048.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1931, cr_loss=0.4033, over 1617776.67 frames. ], batch size: 53, lr: 8.65e-03, grad_scale: 32.0 2024-09-14 17:56:14,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=22.5 2024-09-14 17:56:26,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=163276.5, ans=0.5 2024-09-14 17:56:29,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=163276.5, ans=0.125 2024-09-14 17:56:44,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=163304.83333333334, ans=0.0 2024-09-14 17:57:11,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=163333.16666666666, ans=0.125 2024-09-14 17:57:14,150 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.029e+02 2.182e+02 2.362e+02 3.664e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-14 17:57:29,373 INFO [train.py:1198] (1/2) Epoch 10, batch 150, loss[loss=0.2886, ctc_loss=0.1985, cr_loss=0.4502, over 21009.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1935, cr_loss=0.4044, over 2162027.23 frames. ], batch size: 61, lr: 8.65e-03, grad_scale: 32.0 2024-09-14 17:57:31,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 17:57:38,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=163389.83333333334, ans=0.125 2024-09-14 17:57:44,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=163418.16666666666, ans=0.2 2024-09-14 17:58:04,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=163446.5, ans=15.0 2024-09-14 17:58:18,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=163474.83333333334, ans=0.0 2024-09-14 17:58:42,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163531.5, ans=0.1 2024-09-14 17:58:43,343 INFO [train.py:1198] (1/2) Epoch 10, batch 200, loss[loss=0.2865, ctc_loss=0.1994, cr_loss=0.4351, over 21019.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1945, cr_loss=0.4064, over 2582521.66 frames. ], batch size: 61, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 17:59:17,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=163588.16666666666, ans=0.125 2024-09-14 17:59:20,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=163588.16666666666, ans=0.07 2024-09-14 17:59:41,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2024-09-14 17:59:46,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.046e+02 2.222e+02 2.425e+02 6.390e+02, threshold=4.445e+02, percent-clipped=1.0 2024-09-14 18:00:01,419 INFO [train.py:1198] (1/2) Epoch 10, batch 250, loss[loss=0.2292, ctc_loss=0.1588, cr_loss=0.352, over 19807.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1943, cr_loss=0.4053, over 2887567.74 frames. ], batch size: 44, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 18:00:01,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=163673.16666666666, ans=0.125 2024-09-14 18:00:03,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=163673.16666666666, ans=0.0 2024-09-14 18:00:09,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=163673.16666666666, ans=0.07 2024-09-14 18:00:24,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=163701.5, ans=0.2 2024-09-14 18:00:28,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=163701.5, ans=0.025 2024-09-14 18:00:37,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=163729.83333333334, ans=0.125 2024-09-14 18:00:37,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=163729.83333333334, ans=0.0 2024-09-14 18:01:17,057 INFO [train.py:1198] (1/2) Epoch 10, batch 300, loss[loss=0.245, ctc_loss=0.1728, cr_loss=0.3611, over 20974.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1929, cr_loss=0.4038, over 3153504.85 frames. ], batch size: 49, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 18:01:41,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163843.16666666666, ans=0.1 2024-09-14 18:01:52,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-14 18:02:03,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=163899.83333333334, ans=0.07 2024-09-14 18:02:05,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163899.83333333334, ans=0.1 2024-09-14 18:02:20,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.028e+02 2.219e+02 2.414e+02 4.165e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-14 18:02:26,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=163928.16666666666, ans=0.0 2024-09-14 18:02:35,248 INFO [train.py:1198] (1/2) Epoch 10, batch 350, loss[loss=0.2197, ctc_loss=0.1523, cr_loss=0.3368, over 21014.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1921, cr_loss=0.4029, over 3350910.71 frames. ], batch size: 52, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:03:07,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=164013.16666666666, ans=0.0 2024-09-14 18:03:50,619 INFO [train.py:1198] (1/2) Epoch 10, batch 400, loss[loss=0.284, ctc_loss=0.2008, cr_loss=0.4161, over 20843.00 frames. ], tot_loss[loss=0.2716, ctc_loss=0.1913, cr_loss=0.4019, over 3510074.07 frames. ], batch size: 59, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:04:00,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-14 18:04:14,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=164126.5, ans=0.04949747468305833 2024-09-14 18:04:16,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=164126.5, ans=0.125 2024-09-14 18:04:44,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=164183.16666666666, ans=0.025 2024-09-14 18:04:51,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.079e+02 2.205e+02 2.399e+02 3.065e+02, threshold=4.411e+02, percent-clipped=0.0 2024-09-14 18:04:52,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164211.5, ans=0.125 2024-09-14 18:04:56,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-09-14 18:05:05,063 INFO [train.py:1198] (1/2) Epoch 10, batch 450, loss[loss=0.2913, ctc_loss=0.2067, cr_loss=0.4231, over 20971.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1913, cr_loss=0.4027, over 3644506.21 frames. ], batch size: 58, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:06:23,035 INFO [train.py:1198] (1/2) Epoch 10, batch 500, loss[loss=0.2596, ctc_loss=0.1837, cr_loss=0.3795, over 21071.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1919, cr_loss=0.4036, over 3747072.63 frames. ], batch size: 59, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:06:29,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=164381.5, ans=0.125 2024-09-14 18:06:43,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164409.83333333334, ans=0.125 2024-09-14 18:06:46,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=164409.83333333334, ans=0.05 2024-09-14 18:07:06,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=164438.16666666666, ans=0.0 2024-09-14 18:07:18,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=164466.5, ans=0.0 2024-09-14 18:07:25,040 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.220e+02 2.404e+02 2.694e+02 3.649e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-14 18:07:27,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=164494.83333333334, ans=15.0 2024-09-14 18:07:38,538 INFO [train.py:1198] (1/2) Epoch 10, batch 550, loss[loss=0.2725, ctc_loss=0.189, cr_loss=0.4175, over 21008.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1906, cr_loss=0.4016, over 3825303.54 frames. ], batch size: 52, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:07:38,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=164523.16666666666, ans=0.07 2024-09-14 18:07:43,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164523.16666666666, ans=0.1 2024-09-14 18:08:15,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=22.5 2024-09-14 18:08:57,215 INFO [train.py:1198] (1/2) Epoch 10, batch 600, loss[loss=0.2911, ctc_loss=0.2053, cr_loss=0.4287, over 20784.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1908, cr_loss=0.4016, over 3882118.53 frames. ], batch size: 56, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:09:47,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=164749.83333333334, ans=0.125 2024-09-14 18:10:00,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.151e+02 2.375e+02 2.704e+02 4.252e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-14 18:10:12,457 INFO [train.py:1198] (1/2) Epoch 10, batch 650, loss[loss=0.2746, ctc_loss=0.193, cr_loss=0.4079, over 20964.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1922, cr_loss=0.403, over 3909449.89 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 16.0 2024-09-14 18:10:18,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=164806.5, ans=0.0 2024-09-14 18:10:20,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164806.5, ans=0.125 2024-09-14 18:10:32,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-09-14 18:10:38,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-14 18:10:51,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164863.16666666666, ans=0.1 2024-09-14 18:11:02,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=164891.5, ans=0.0 2024-09-14 18:11:30,415 INFO [train.py:1198] (1/2) Epoch 10, batch 700, loss[loss=0.2773, ctc_loss=0.1953, cr_loss=0.41, over 19452.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1922, cr_loss=0.4037, over 3941641.72 frames. ], batch size: 90, lr: 8.61e-03, grad_scale: 16.0 2024-09-14 18:11:55,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=164976.5, ans=0.125 2024-09-14 18:12:32,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.095e+02 2.296e+02 2.518e+02 4.159e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-14 18:12:32,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=165061.5, ans=0.125 2024-09-14 18:12:39,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=165061.5, ans=0.2 2024-09-14 18:12:44,814 INFO [train.py:1198] (1/2) Epoch 10, batch 750, loss[loss=0.2396, ctc_loss=0.1669, cr_loss=0.3633, over 20981.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1928, cr_loss=0.4039, over 3961089.52 frames. ], batch size: 55, lr: 8.60e-03, grad_scale: 16.0 2024-09-14 18:14:01,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=165231.5, ans=0.07 2024-09-14 18:14:02,316 INFO [train.py:1198] (1/2) Epoch 10, batch 800, loss[loss=0.2335, ctc_loss=0.1638, cr_loss=0.3484, over 20878.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1911, cr_loss=0.4016, over 3992412.48 frames. ], batch size: 54, lr: 8.60e-03, grad_scale: 32.0 2024-09-14 18:14:22,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165259.83333333334, ans=0.125 2024-09-14 18:14:34,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=165288.16666666666, ans=0.2 2024-09-14 18:14:52,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=165316.5, ans=0.04949747468305833 2024-09-14 18:15:05,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=165344.83333333334, ans=0.0 2024-09-14 18:15:06,902 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.048e+02 2.244e+02 2.473e+02 4.643e+02, threshold=4.488e+02, percent-clipped=1.0 2024-09-14 18:15:11,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=165344.83333333334, ans=0.125 2024-09-14 18:15:17,307 INFO [train.py:1198] (1/2) Epoch 10, batch 850, loss[loss=0.2634, ctc_loss=0.1816, cr_loss=0.4088, over 20789.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1909, cr_loss=0.4012, over 4011548.64 frames. ], batch size: 56, lr: 8.60e-03, grad_scale: 16.0 2024-09-14 18:15:17,766 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:16:03,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165458.16666666666, ans=0.125 2024-09-14 18:16:11,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165458.16666666666, ans=0.125 2024-09-14 18:16:23,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=165486.5, ans=0.125 2024-09-14 18:16:35,288 INFO [train.py:1198] (1/2) Epoch 10, batch 900, loss[loss=0.2355, ctc_loss=0.1642, cr_loss=0.3568, over 20767.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.19, cr_loss=0.4, over 4028886.53 frames. ], batch size: 56, lr: 8.59e-03, grad_scale: 16.0 2024-09-14 18:17:05,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=165571.5, ans=0.125 2024-09-14 18:17:39,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.005e+02 2.189e+02 2.470e+02 3.871e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-14 18:17:50,213 INFO [train.py:1198] (1/2) Epoch 10, batch 950, loss[loss=0.294, ctc_loss=0.2132, cr_loss=0.4042, over 20321.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1899, cr_loss=0.3993, over 4037658.67 frames. ], batch size: 74, lr: 8.59e-03, grad_scale: 16.0 2024-09-14 18:19:07,687 INFO [train.py:1198] (1/2) Epoch 10, batch 1000, loss[loss=0.2794, ctc_loss=0.2005, cr_loss=0.3946, over 20928.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1883, cr_loss=0.3971, over 4058600.26 frames. ], batch size: 67, lr: 8.59e-03, grad_scale: 8.0 2024-09-14 18:19:13,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=165798.16666666666, ans=0.125 2024-09-14 18:19:15,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=165798.16666666666, ans=0.0 2024-09-14 18:19:29,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=22.5 2024-09-14 18:19:33,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=165826.5, ans=0.125 2024-09-14 18:19:33,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=165826.5, ans=0.125 2024-09-14 18:20:00,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=165883.16666666666, ans=0.2 2024-09-14 18:20:12,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=22.5 2024-09-14 18:20:13,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.038e+02 2.163e+02 2.393e+02 4.771e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-14 18:20:16,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=165911.5, ans=0.0 2024-09-14 18:20:22,290 INFO [train.py:1198] (1/2) Epoch 10, batch 1050, loss[loss=0.2425, ctc_loss=0.1717, cr_loss=0.3543, over 20890.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1888, cr_loss=0.3977, over 4068068.38 frames. ], batch size: 54, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:20:37,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=165968.16666666666, ans=0.95 2024-09-14 18:20:40,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=165968.16666666666, ans=0.2 2024-09-14 18:20:59,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=165996.5, ans=0.0 2024-09-14 18:21:10,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=166024.83333333334, ans=0.2 2024-09-14 18:21:37,090 INFO [train.py:1198] (1/2) Epoch 10, batch 1100, loss[loss=0.284, ctc_loss=0.196, cr_loss=0.4402, over 20939.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1887, cr_loss=0.3978, over 4085840.82 frames. ], batch size: 60, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:22:18,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.80 vs. limit=22.5 2024-09-14 18:22:25,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=22.5 2024-09-14 18:22:29,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166166.5, ans=0.1 2024-09-14 18:22:39,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-14 18:22:41,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=166194.83333333334, ans=0.0 2024-09-14 18:22:45,905 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.043e+02 2.161e+02 2.398e+02 3.362e+02, threshold=4.323e+02, percent-clipped=0.0 2024-09-14 18:22:49,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166194.83333333334, ans=0.1 2024-09-14 18:22:54,726 INFO [train.py:1198] (1/2) Epoch 10, batch 1150, loss[loss=0.2931, ctc_loss=0.2096, cr_loss=0.418, over 20648.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1895, cr_loss=0.399, over 4084112.23 frames. ], batch size: 66, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:23:00,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.93 vs. limit=10.0 2024-09-14 18:23:22,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-14 18:23:40,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=166308.16666666666, ans=0.0 2024-09-14 18:23:44,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=166308.16666666666, ans=0.04949747468305833 2024-09-14 18:24:09,975 INFO [train.py:1198] (1/2) Epoch 10, batch 1200, loss[loss=0.2535, ctc_loss=0.1766, cr_loss=0.3848, over 20808.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1888, cr_loss=0.3978, over 4095165.12 frames. ], batch size: 53, lr: 8.57e-03, grad_scale: 16.0 2024-09-14 18:25:19,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.148e+02 2.314e+02 2.586e+02 5.166e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-14 18:25:19,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=166478.16666666666, ans=0.0 2024-09-14 18:25:25,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=166478.16666666666, ans=0.125 2024-09-14 18:25:28,467 INFO [train.py:1198] (1/2) Epoch 10, batch 1250, loss[loss=0.2459, ctc_loss=0.1742, cr_loss=0.3582, over 20778.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1873, cr_loss=0.3956, over 4100878.35 frames. ], batch size: 53, lr: 8.57e-03, grad_scale: 16.0 2024-09-14 18:25:39,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=166506.5, ans=0.125 2024-09-14 18:25:42,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=166534.83333333334, ans=0.125 2024-09-14 18:26:03,442 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:26:30,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-14 18:26:43,513 INFO [train.py:1198] (1/2) Epoch 10, batch 1300, loss[loss=0.3089, ctc_loss=0.2217, cr_loss=0.4361, over 20854.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1881, cr_loss=0.3972, over 4100688.74 frames. ], batch size: 65, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:26:47,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-14 18:27:04,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=166676.5, ans=0.2 2024-09-14 18:27:24,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=166704.83333333334, ans=0.5 2024-09-14 18:27:28,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=166733.16666666666, ans=0.125 2024-09-14 18:27:41,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=166733.16666666666, ans=0.025 2024-09-14 18:27:49,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 1.999e+02 2.184e+02 2.416e+02 4.144e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-14 18:27:50,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=166761.5, ans=0.125 2024-09-14 18:27:56,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2024-09-14 18:27:57,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=166789.83333333334, ans=0.025 2024-09-14 18:28:01,702 INFO [train.py:1198] (1/2) Epoch 10, batch 1350, loss[loss=0.2213, ctc_loss=0.1508, cr_loss=0.3524, over 20965.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1889, cr_loss=0.3982, over 4098489.30 frames. ], batch size: 50, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:28:02,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=166789.83333333334, ans=0.0 2024-09-14 18:28:30,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=166846.5, ans=0.125 2024-09-14 18:29:17,902 INFO [train.py:1198] (1/2) Epoch 10, batch 1400, loss[loss=0.2849, ctc_loss=0.2013, cr_loss=0.4178, over 20953.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1876, cr_loss=0.3963, over 4108445.59 frames. ], batch size: 48, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:29:30,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166931.5, ans=0.1 2024-09-14 18:29:30,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=166931.5, ans=0.0 2024-09-14 18:29:51,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-14 18:29:51,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-14 18:30:18,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-14 18:30:23,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.093e+02 2.269e+02 2.508e+02 4.820e+02, threshold=4.538e+02, percent-clipped=2.0 2024-09-14 18:30:35,908 INFO [train.py:1198] (1/2) Epoch 10, batch 1450, loss[loss=0.3078, ctc_loss=0.2287, cr_loss=0.3954, over 14271.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1881, cr_loss=0.3977, over 4111036.17 frames. ], batch size: 152, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:30:36,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-14 18:30:58,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=167101.5, ans=0.0 2024-09-14 18:31:26,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=167158.16666666666, ans=0.07 2024-09-14 18:31:35,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=167186.5, ans=0.025 2024-09-14 18:31:51,560 INFO [train.py:1198] (1/2) Epoch 10, batch 1500, loss[loss=0.2765, ctc_loss=0.1956, cr_loss=0.4046, over 20059.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1881, cr_loss=0.3972, over 4114174.19 frames. ], batch size: 80, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:32:16,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=167243.16666666666, ans=0.0 2024-09-14 18:32:50,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=167328.16666666666, ans=0.125 2024-09-14 18:32:56,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=167328.16666666666, ans=0.2 2024-09-14 18:32:57,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.026e+02 2.192e+02 2.462e+02 4.090e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-14 18:32:58,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167328.16666666666, ans=0.0 2024-09-14 18:33:07,337 INFO [train.py:1198] (1/2) Epoch 10, batch 1550, loss[loss=0.2657, ctc_loss=0.1852, cr_loss=0.4025, over 21058.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1879, cr_loss=0.3974, over 4117758.53 frames. ], batch size: 53, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:33:43,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=167413.16666666666, ans=0.0 2024-09-14 18:34:25,387 INFO [train.py:1198] (1/2) Epoch 10, batch 1600, loss[loss=0.269, ctc_loss=0.187, cr_loss=0.4103, over 20964.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1872, cr_loss=0.396, over 4112585.79 frames. ], batch size: 58, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:34:51,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=167526.5, ans=0.0 2024-09-14 18:34:54,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=167554.83333333334, ans=0.125 2024-09-14 18:35:11,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2024-09-14 18:35:31,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 2.028e+02 2.194e+02 2.393e+02 3.990e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-14 18:35:40,745 INFO [train.py:1198] (1/2) Epoch 10, batch 1650, loss[loss=0.2805, ctc_loss=0.1959, cr_loss=0.4234, over 21010.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1867, cr_loss=0.395, over 4117009.26 frames. ], batch size: 63, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:35:48,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=167639.83333333334, ans=0.0 2024-09-14 18:36:09,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=167696.5, ans=0.0 2024-09-14 18:36:12,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=167696.5, ans=0.125 2024-09-14 18:36:36,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=167724.83333333334, ans=0.0 2024-09-14 18:36:51,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167753.16666666666, ans=0.1 2024-09-14 18:36:58,638 INFO [train.py:1198] (1/2) Epoch 10, batch 1700, loss[loss=0.2492, ctc_loss=0.1746, cr_loss=0.3726, over 20977.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1877, cr_loss=0.3959, over 4100425.89 frames. ], batch size: 51, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:37:30,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=167838.16666666666, ans=0.0 2024-09-14 18:37:41,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167838.16666666666, ans=0.1 2024-09-14 18:38:01,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=167894.83333333334, ans=0.2 2024-09-14 18:38:04,323 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.085e+02 2.242e+02 2.497e+02 4.815e+02, threshold=4.485e+02, percent-clipped=1.0 2024-09-14 18:38:13,436 INFO [train.py:1198] (1/2) Epoch 10, batch 1750, loss[loss=0.266, ctc_loss=0.1854, cr_loss=0.403, over 21049.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1878, cr_loss=0.3964, over 4094005.41 frames. ], batch size: 53, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:38:15,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167923.16666666666, ans=0.1 2024-09-14 18:38:33,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=167951.5, ans=0.2 2024-09-14 18:38:34,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-14 18:38:36,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167951.5, ans=0.125 2024-09-14 18:38:56,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-14 18:39:28,458 INFO [train.py:1198] (1/2) Epoch 10, batch 1800, loss[loss=0.2611, ctc_loss=0.1821, cr_loss=0.3951, over 20874.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1877, cr_loss=0.3967, over 4098738.08 frames. ], batch size: 57, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:39:50,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-14 18:40:37,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.059e+02 2.228e+02 2.512e+02 3.783e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-14 18:40:43,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-14 18:40:46,475 INFO [train.py:1198] (1/2) Epoch 10, batch 1850, loss[loss=0.3334, ctc_loss=0.2385, cr_loss=0.4747, over 18175.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1887, cr_loss=0.398, over 4097333.75 frames. ], batch size: 108, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:41:37,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=168291.5, ans=0.025 2024-09-14 18:42:04,725 INFO [train.py:1198] (1/2) Epoch 10, batch 1900, loss[loss=0.2557, ctc_loss=0.1762, cr_loss=0.3975, over 21085.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1899, cr_loss=0.3998, over 4090285.63 frames. ], batch size: 59, lr: 8.52e-03, grad_scale: 32.0 2024-09-14 18:42:09,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=168348.16666666666, ans=0.0 2024-09-14 18:42:28,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=168376.5, ans=0.125 2024-09-14 18:42:35,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2024-09-14 18:42:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=168404.83333333334, ans=0.0 2024-09-14 18:42:39,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168404.83333333334, ans=0.1 2024-09-14 18:42:41,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=168404.83333333334, ans=0.125 2024-09-14 18:42:45,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=168404.83333333334, ans=0.125 2024-09-14 18:43:10,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.072e+02 2.257e+02 2.592e+02 3.635e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-14 18:43:14,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=168461.5, ans=0.125 2024-09-14 18:43:17,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=168461.5, ans=0.125 2024-09-14 18:43:19,882 INFO [train.py:1198] (1/2) Epoch 10, batch 1950, loss[loss=0.3249, ctc_loss=0.2453, cr_loss=0.398, over 13889.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1885, cr_loss=0.3979, over 4093362.59 frames. ], batch size: 149, lr: 8.52e-03, grad_scale: 32.0 2024-09-14 18:43:29,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=168489.83333333334, ans=0.125 2024-09-14 18:43:37,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=168518.16666666666, ans=0.2 2024-09-14 18:44:19,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=168603.16666666666, ans=0.025 2024-09-14 18:44:34,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168631.5, ans=0.125 2024-09-14 18:44:35,882 INFO [train.py:1198] (1/2) Epoch 10, batch 2000, loss[loss=0.2793, ctc_loss=0.1971, cr_loss=0.4111, over 20999.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1879, cr_loss=0.3976, over 4093164.89 frames. ], batch size: 61, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:44:46,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=168631.5, ans=0.0 2024-09-14 18:44:55,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=168659.83333333334, ans=0.2 2024-09-14 18:45:00,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=168659.83333333334, ans=0.125 2024-09-14 18:45:10,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=168688.16666666666, ans=0.2 2024-09-14 18:45:28,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=168716.5, ans=0.2 2024-09-14 18:45:38,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=22.5 2024-09-14 18:45:44,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-14 18:45:45,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.076e+02 2.278e+02 2.528e+02 4.492e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-14 18:45:54,344 INFO [train.py:1198] (1/2) Epoch 10, batch 2050, loss[loss=0.2364, ctc_loss=0.1648, cr_loss=0.358, over 21041.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1878, cr_loss=0.3972, over 4095206.25 frames. ], batch size: 62, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:46:05,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=168773.16666666666, ans=0.07 2024-09-14 18:46:19,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=22.5 2024-09-14 18:47:09,609 INFO [train.py:1198] (1/2) Epoch 10, batch 2100, loss[loss=0.2842, ctc_loss=0.1974, cr_loss=0.4339, over 19483.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1885, cr_loss=0.3991, over 4098441.79 frames. ], batch size: 90, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:47:29,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168943.16666666666, ans=0.1 2024-09-14 18:47:47,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2024-09-14 18:47:53,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=168971.5, ans=0.125 2024-09-14 18:48:11,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=169028.16666666666, ans=0.0 2024-09-14 18:48:18,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.092e+02 2.264e+02 2.617e+02 3.907e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-14 18:48:27,062 INFO [train.py:1198] (1/2) Epoch 10, batch 2150, loss[loss=0.25, ctc_loss=0.1733, cr_loss=0.3837, over 20780.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1887, cr_loss=0.399, over 4103152.24 frames. ], batch size: 56, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:48:51,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=169084.83333333334, ans=0.025 2024-09-14 18:49:05,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=169113.16666666666, ans=0.125 2024-09-14 18:49:26,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169169.83333333334, ans=0.1 2024-09-14 18:49:36,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=169169.83333333334, ans=0.125 2024-09-14 18:49:42,794 INFO [train.py:1198] (1/2) Epoch 10, batch 2200, loss[loss=0.3012, ctc_loss=0.2164, cr_loss=0.4239, over 20863.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1891, cr_loss=0.3998, over 4094922.13 frames. ], batch size: 57, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:49:52,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=169198.16666666666, ans=0.125 2024-09-14 18:49:55,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=169198.16666666666, ans=0.04949747468305833 2024-09-14 18:50:11,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=169254.83333333334, ans=0.0 2024-09-14 18:50:15,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2024-09-14 18:50:34,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169283.16666666666, ans=0.1 2024-09-14 18:50:48,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.073e+02 2.344e+02 2.630e+02 4.263e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-14 18:51:00,464 INFO [train.py:1198] (1/2) Epoch 10, batch 2250, loss[loss=0.2733, ctc_loss=0.1906, cr_loss=0.4135, over 21058.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1895, cr_loss=0.4, over 4086243.55 frames. ], batch size: 53, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:51:02,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=169339.83333333334, ans=0.5 2024-09-14 18:51:05,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2024-09-14 18:52:13,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=169453.16666666666, ans=0.125 2024-09-14 18:52:16,356 INFO [train.py:1198] (1/2) Epoch 10, batch 2300, loss[loss=0.2944, ctc_loss=0.2067, cr_loss=0.4387, over 20675.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1901, cr_loss=0.4009, over 4092183.70 frames. ], batch size: 66, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:52:28,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=169481.5, ans=0.0 2024-09-14 18:52:31,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=169509.83333333334, ans=0.2 2024-09-14 18:53:25,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.645e+02 2.093e+02 2.291e+02 2.530e+02 4.547e+02, threshold=4.582e+02, percent-clipped=0.0 2024-09-14 18:53:34,716 INFO [train.py:1198] (1/2) Epoch 10, batch 2350, loss[loss=0.256, ctc_loss=0.1777, cr_loss=0.3914, over 20992.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1906, cr_loss=0.4007, over 4069808.42 frames. ], batch size: 49, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:53:36,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=169623.16666666666, ans=0.125 2024-09-14 18:53:51,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=169651.5, ans=0.05 2024-09-14 18:54:46,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=169736.5, ans=0.025 2024-09-14 18:54:49,741 INFO [train.py:1198] (1/2) Epoch 10, batch 2400, loss[loss=0.2302, ctc_loss=0.163, cr_loss=0.3357, over 20407.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1897, cr_loss=0.4002, over 4078252.62 frames. ], batch size: 45, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:54:53,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=169764.83333333334, ans=0.0 2024-09-14 18:54:59,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=169764.83333333334, ans=0.125 2024-09-14 18:55:14,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=169793.16666666666, ans=0.025 2024-09-14 18:55:22,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=169821.5, ans=0.0 2024-09-14 18:55:29,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=169821.5, ans=0.125 2024-09-14 18:55:42,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=169849.83333333334, ans=0.125 2024-09-14 18:55:56,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.059e+02 2.222e+02 2.398e+02 4.370e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-14 18:55:57,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.66 vs. limit=15.0 2024-09-14 18:56:05,277 INFO [train.py:1198] (1/2) Epoch 10, batch 2450, loss[loss=0.347, ctc_loss=0.2584, cr_loss=0.4431, over 14205.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1888, cr_loss=0.399, over 4089322.23 frames. ], batch size: 149, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:56:16,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=169906.5, ans=0.125 2024-09-14 18:56:31,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=169934.83333333334, ans=0.0 2024-09-14 18:56:31,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=169934.83333333334, ans=0.0 2024-09-14 18:56:50,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=169963.16666666666, ans=0.125 2024-09-14 18:57:24,242 INFO [train.py:1198] (1/2) Epoch 10, batch 2500, loss[loss=0.27, ctc_loss=0.1917, cr_loss=0.3915, over 20986.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1895, cr_loss=0.4001, over 4092217.19 frames. ], batch size: 55, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:57:32,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=170048.16666666666, ans=0.0 2024-09-14 18:58:18,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=170133.16666666666, ans=0.025 2024-09-14 18:58:23,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=170161.5, ans=0.09899494936611666 2024-09-14 18:58:29,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.099e+02 2.284e+02 2.517e+02 7.014e+02, threshold=4.569e+02, percent-clipped=1.0 2024-09-14 18:58:39,171 INFO [train.py:1198] (1/2) Epoch 10, batch 2550, loss[loss=0.2728, ctc_loss=0.1901, cr_loss=0.4133, over 20840.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1892, cr_loss=0.3996, over 4089343.24 frames. ], batch size: 65, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:59:07,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=22.5 2024-09-14 18:59:32,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=170274.83333333334, ans=0.0 2024-09-14 18:59:57,387 INFO [train.py:1198] (1/2) Epoch 10, batch 2600, loss[loss=0.2897, ctc_loss=0.2062, cr_loss=0.4174, over 20960.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1884, cr_loss=0.3984, over 4105291.16 frames. ], batch size: 67, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 18:59:57,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=170331.5, ans=0.0 2024-09-14 19:00:38,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=170388.16666666666, ans=0.125 2024-09-14 19:00:42,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=170416.5, ans=0.2 2024-09-14 19:00:49,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-14 19:00:51,998 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:01:00,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=170444.83333333334, ans=0.07 2024-09-14 19:01:03,416 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.025e+02 2.201e+02 2.372e+02 4.038e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-14 19:01:12,373 INFO [train.py:1198] (1/2) Epoch 10, batch 2650, loss[loss=0.2921, ctc_loss=0.2073, cr_loss=0.4235, over 20856.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1887, cr_loss=0.3991, over 4105716.53 frames. ], batch size: 65, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 19:01:29,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=170501.5, ans=0.0 2024-09-14 19:01:35,509 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:01:36,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=170501.5, ans=0.2 2024-09-14 19:02:09,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=22.5 2024-09-14 19:02:30,997 INFO [train.py:1198] (1/2) Epoch 10, batch 2700, loss[loss=0.2401, ctc_loss=0.1667, cr_loss=0.367, over 21071.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1874, cr_loss=0.3973, over 4113317.36 frames. ], batch size: 53, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 19:02:49,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-14 19:03:07,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=170671.5, ans=0.025 2024-09-14 19:03:36,838 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.074e+02 2.227e+02 2.481e+02 4.375e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-14 19:03:38,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=170728.16666666666, ans=0.125 2024-09-14 19:03:45,656 INFO [train.py:1198] (1/2) Epoch 10, batch 2750, loss[loss=0.2648, ctc_loss=0.1868, cr_loss=0.3903, over 21025.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1881, cr_loss=0.3987, over 4106155.60 frames. ], batch size: 63, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:03:46,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170756.5, ans=0.125 2024-09-14 19:03:54,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=170756.5, ans=0.05 2024-09-14 19:03:59,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=170784.83333333334, ans=0.0 2024-09-14 19:04:04,094 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:04:33,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=170841.5, ans=0.125 2024-09-14 19:04:45,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-14 19:04:47,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170869.83333333334, ans=0.125 2024-09-14 19:05:04,430 INFO [train.py:1198] (1/2) Epoch 10, batch 2800, loss[loss=0.3301, ctc_loss=0.238, cr_loss=0.4604, over 18285.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1881, cr_loss=0.3981, over 4103703.89 frames. ], batch size: 108, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:05:13,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=170898.16666666666, ans=0.0 2024-09-14 19:05:37,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170954.83333333334, ans=0.1 2024-09-14 19:05:44,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=170954.83333333334, ans=0.2 2024-09-14 19:05:59,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=170983.16666666666, ans=0.0 2024-09-14 19:06:07,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171011.5, ans=0.1 2024-09-14 19:06:10,546 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.172e+02 2.467e+02 2.780e+02 3.931e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 19:06:18,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=171039.83333333334, ans=0.025 2024-09-14 19:06:19,685 INFO [train.py:1198] (1/2) Epoch 10, batch 2850, loss[loss=0.2852, ctc_loss=0.2011, cr_loss=0.4206, over 20952.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1888, cr_loss=0.3999, over 4103205.84 frames. ], batch size: 60, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:06:22,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.47 vs. limit=10.0 2024-09-14 19:06:48,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=171096.5, ans=0.05 2024-09-14 19:07:15,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=171124.83333333334, ans=0.025 2024-09-14 19:07:22,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171153.16666666666, ans=0.0 2024-09-14 19:07:23,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171153.16666666666, ans=0.125 2024-09-14 19:07:30,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=171153.16666666666, ans=0.2 2024-09-14 19:07:35,009 INFO [train.py:1198] (1/2) Epoch 10, batch 2900, loss[loss=0.3449, ctc_loss=0.2596, cr_loss=0.4267, over 14036.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1882, cr_loss=0.3989, over 4105311.56 frames. ], batch size: 149, lr: 8.45e-03, grad_scale: 32.0 2024-09-14 19:07:52,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-09-14 19:08:08,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171238.16666666666, ans=0.1 2024-09-14 19:08:32,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-14 19:08:41,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=171294.83333333334, ans=0.125 2024-09-14 19:08:44,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.114e+02 2.226e+02 2.484e+02 4.067e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-14 19:08:44,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171294.83333333334, ans=0.1 2024-09-14 19:08:53,120 INFO [train.py:1198] (1/2) Epoch 10, batch 2950, loss[loss=0.2425, ctc_loss=0.1671, cr_loss=0.377, over 20893.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1891, cr_loss=0.3995, over 4095508.31 frames. ], batch size: 54, lr: 8.45e-03, grad_scale: 32.0 2024-09-14 19:08:57,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=171323.16666666666, ans=0.125 2024-09-14 19:09:20,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171351.5, ans=0.125 2024-09-14 19:09:42,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2024-09-14 19:09:44,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=171408.16666666666, ans=0.125 2024-09-14 19:10:08,293 INFO [train.py:1198] (1/2) Epoch 10, batch 3000, loss[loss=0.267, ctc_loss=0.1843, cr_loss=0.4134, over 20283.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1891, cr_loss=0.4003, over 4105340.74 frames. ], batch size: 74, lr: 8.45e-03, grad_scale: 64.0 2024-09-14 19:10:08,294 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 19:10:38,156 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.05315, ctc_loss=0.05315, cr_loss=9.745e-15, over 944034.00 frames. 2024-09-14 19:10:38,157 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 19:10:40,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=171464.83333333334, ans=0.125 2024-09-14 19:10:50,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=171464.83333333334, ans=0.125 2024-09-14 19:11:01,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-09-14 19:11:16,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=171521.5, ans=0.0 2024-09-14 19:11:19,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=171521.5, ans=15.0 2024-09-14 19:11:44,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.037e+02 2.200e+02 2.527e+02 3.549e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-14 19:11:53,680 INFO [train.py:1198] (1/2) Epoch 10, batch 3050, loss[loss=0.2898, ctc_loss=0.2044, cr_loss=0.4271, over 20658.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1894, cr_loss=0.4002, over 4100571.75 frames. ], batch size: 68, lr: 8.44e-03, grad_scale: 64.0 2024-09-14 19:11:56,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171606.5, ans=0.125 2024-09-14 19:12:02,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=171606.5, ans=12.0 2024-09-14 19:12:36,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:12:37,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-09-14 19:13:03,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=171719.83333333334, ans=0.125 2024-09-14 19:13:09,529 INFO [train.py:1198] (1/2) Epoch 10, batch 3100, loss[loss=0.2877, ctc_loss=0.2041, cr_loss=0.4177, over 20445.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1893, cr_loss=0.3997, over 4092974.05 frames. ], batch size: 74, lr: 8.44e-03, grad_scale: 32.0 2024-09-14 19:14:11,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=171861.5, ans=0.0 2024-09-14 19:14:20,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.074e+02 2.221e+02 2.474e+02 4.054e+02, threshold=4.441e+02, percent-clipped=0.0 2024-09-14 19:14:24,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=171861.5, ans=0.125 2024-09-14 19:14:24,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=171861.5, ans=0.2 2024-09-14 19:14:28,090 INFO [train.py:1198] (1/2) Epoch 10, batch 3150, loss[loss=0.2373, ctc_loss=0.1646, cr_loss=0.3635, over 20994.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1897, cr_loss=0.4012, over 4101496.96 frames. ], batch size: 52, lr: 8.44e-03, grad_scale: 32.0 2024-09-14 19:14:51,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171918.16666666666, ans=0.125 2024-09-14 19:14:52,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=171918.16666666666, ans=0.0 2024-09-14 19:15:14,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2024-09-14 19:15:43,840 INFO [train.py:1198] (1/2) Epoch 10, batch 3200, loss[loss=0.2855, ctc_loss=0.2048, cr_loss=0.4032, over 21017.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1898, cr_loss=0.4008, over 4094759.06 frames. ], batch size: 63, lr: 8.43e-03, grad_scale: 32.0 2024-09-14 19:15:56,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172031.5, ans=0.1 2024-09-14 19:16:15,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=172088.16666666666, ans=0.0 2024-09-14 19:16:23,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=172088.16666666666, ans=0.125 2024-09-14 19:16:30,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172116.5, ans=0.1 2024-09-14 19:16:35,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=172116.5, ans=0.125 2024-09-14 19:16:48,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172144.83333333334, ans=0.125 2024-09-14 19:16:54,576 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.000e+02 2.171e+02 2.341e+02 6.422e+02, threshold=4.341e+02, percent-clipped=1.0 2024-09-14 19:17:01,970 INFO [train.py:1198] (1/2) Epoch 10, batch 3250, loss[loss=0.2886, ctc_loss=0.2057, cr_loss=0.4145, over 20663.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1899, cr_loss=0.401, over 4093815.62 frames. ], batch size: 68, lr: 8.43e-03, grad_scale: 32.0 2024-09-14 19:18:12,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172286.5, ans=0.125 2024-09-14 19:18:16,881 INFO [train.py:1198] (1/2) Epoch 10, batch 3300, loss[loss=0.3187, ctc_loss=0.2345, cr_loss=0.4212, over 14427.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1902, cr_loss=0.4018, over 4086996.88 frames. ], batch size: 149, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:18:35,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=172343.16666666666, ans=0.125 2024-09-14 19:19:27,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.081e+02 2.217e+02 2.391e+02 3.363e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-14 19:19:28,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172428.16666666666, ans=0.0 2024-09-14 19:19:28,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=172428.16666666666, ans=0.0 2024-09-14 19:19:34,469 INFO [train.py:1198] (1/2) Epoch 10, batch 3350, loss[loss=0.332, ctc_loss=0.2403, cr_loss=0.4584, over 20680.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1906, cr_loss=0.4015, over 4088167.95 frames. ], batch size: 71, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:19:42,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172456.5, ans=0.1 2024-09-14 19:19:46,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=172456.5, ans=0.0 2024-09-14 19:20:08,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-09-14 19:20:18,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172541.5, ans=0.125 2024-09-14 19:20:24,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-09-14 19:20:35,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=172569.83333333334, ans=0.07 2024-09-14 19:20:49,623 INFO [train.py:1198] (1/2) Epoch 10, batch 3400, loss[loss=0.2864, ctc_loss=0.2024, cr_loss=0.4202, over 20936.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1896, cr_loss=0.4001, over 4092530.49 frames. ], batch size: 60, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:21:20,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-14 19:21:55,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2024-09-14 19:22:00,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.066e+02 2.255e+02 2.634e+02 4.666e+02, threshold=4.510e+02, percent-clipped=1.0 2024-09-14 19:22:06,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=172739.83333333334, ans=0.125 2024-09-14 19:22:07,985 INFO [train.py:1198] (1/2) Epoch 10, batch 3450, loss[loss=0.2838, ctc_loss=0.2003, cr_loss=0.4176, over 20971.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1893, cr_loss=0.3996, over 4090082.39 frames. ], batch size: 58, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:22:11,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=172739.83333333334, ans=0.0 2024-09-14 19:22:22,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-14 19:23:23,211 INFO [train.py:1198] (1/2) Epoch 10, batch 3500, loss[loss=0.2494, ctc_loss=0.1734, cr_loss=0.3795, over 21000.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1888, cr_loss=0.3991, over 4099558.28 frames. ], batch size: 52, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:23:23,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=172881.5, ans=0.2 2024-09-14 19:24:03,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=172938.16666666666, ans=0.0 2024-09-14 19:24:11,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=172966.5, ans=12.0 2024-09-14 19:24:31,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.075e+02 2.327e+02 2.771e+02 5.529e+02, threshold=4.654e+02, percent-clipped=2.0 2024-09-14 19:24:39,159 INFO [train.py:1198] (1/2) Epoch 10, batch 3550, loss[loss=0.2679, ctc_loss=0.186, cr_loss=0.4097, over 20870.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1882, cr_loss=0.3981, over 4096866.08 frames. ], batch size: 54, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:25:12,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=173079.83333333334, ans=0.05 2024-09-14 19:25:48,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-14 19:25:52,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=173136.5, ans=0.0 2024-09-14 19:25:56,782 INFO [train.py:1198] (1/2) Epoch 10, batch 3600, loss[loss=0.2631, ctc_loss=0.1828, cr_loss=0.4014, over 20957.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1891, cr_loss=0.3989, over 4085253.44 frames. ], batch size: 55, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:25:57,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=173164.83333333334, ans=0.125 2024-09-14 19:26:01,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-14 19:26:09,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=173164.83333333334, ans=0.2 2024-09-14 19:26:27,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=173221.5, ans=0.125 2024-09-14 19:26:28,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=173221.5, ans=0.0 2024-09-14 19:26:39,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=173221.5, ans=0.025 2024-09-14 19:27:01,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-14 19:27:05,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.101e+02 2.294e+02 2.862e+02 4.439e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-14 19:27:12,639 INFO [train.py:1198] (1/2) Epoch 10, batch 3650, loss[loss=0.286, ctc_loss=0.2026, cr_loss=0.4168, over 20709.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1886, cr_loss=0.3989, over 4093627.37 frames. ], batch size: 71, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:27:38,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=173334.83333333334, ans=0.0 2024-09-14 19:27:44,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=173363.16666666666, ans=0.125 2024-09-14 19:28:16,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=173419.83333333334, ans=0.125 2024-09-14 19:28:17,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=173419.83333333334, ans=0.125 2024-09-14 19:28:22,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=173419.83333333334, ans=0.125 2024-09-14 19:28:30,688 INFO [train.py:1198] (1/2) Epoch 10, batch 3700, loss[loss=0.3121, ctc_loss=0.2152, cr_loss=0.4843, over 20930.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1897, cr_loss=0.3995, over 4080720.08 frames. ], batch size: 64, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:28:34,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-14 19:29:11,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=173504.83333333334, ans=0.125 2024-09-14 19:29:36,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=22.5 2024-09-14 19:29:38,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.120e+02 2.269e+02 2.497e+02 3.790e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-14 19:29:43,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173561.5, ans=0.125 2024-09-14 19:29:45,951 INFO [train.py:1198] (1/2) Epoch 10, batch 3750, loss[loss=0.2493, ctc_loss=0.1718, cr_loss=0.3875, over 20790.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1884, cr_loss=0.3981, over 4082568.50 frames. ], batch size: 56, lr: 8.39e-03, grad_scale: 32.0 2024-09-14 19:29:52,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173589.83333333334, ans=0.1 2024-09-14 19:29:52,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-09-14 19:30:13,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-14 19:30:26,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=173646.5, ans=0.125 2024-09-14 19:30:40,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173674.83333333334, ans=0.125 2024-09-14 19:30:45,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=173674.83333333334, ans=0.035 2024-09-14 19:30:59,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=173703.16666666666, ans=0.125 2024-09-14 19:31:02,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173731.5, ans=0.1 2024-09-14 19:31:03,472 INFO [train.py:1198] (1/2) Epoch 10, batch 3800, loss[loss=0.2384, ctc_loss=0.1686, cr_loss=0.3488, over 20796.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1873, cr_loss=0.3974, over 4089899.06 frames. ], batch size: 53, lr: 8.39e-03, grad_scale: 16.0 2024-09-14 19:31:11,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=173731.5, ans=0.125 2024-09-14 19:31:53,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=173816.5, ans=0.125 2024-09-14 19:32:13,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.052e+02 2.220e+02 2.453e+02 4.923e+02, threshold=4.439e+02, percent-clipped=3.0 2024-09-14 19:32:19,440 INFO [train.py:1198] (1/2) Epoch 10, batch 3850, loss[loss=0.3125, ctc_loss=0.2203, cr_loss=0.4608, over 20974.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1882, cr_loss=0.3986, over 4083902.09 frames. ], batch size: 67, lr: 8.39e-03, grad_scale: 16.0 2024-09-14 19:33:04,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=173929.83333333334, ans=0.125 2024-09-14 19:33:10,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=173958.16666666666, ans=0.125 2024-09-14 19:33:10,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-14 19:33:29,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=173986.5, ans=0.125 2024-09-14 19:33:37,009 INFO [train.py:1198] (1/2) Epoch 10, batch 3900, loss[loss=0.28, ctc_loss=0.1979, cr_loss=0.4102, over 20821.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1893, cr_loss=0.4004, over 4069502.37 frames. ], batch size: 59, lr: 8.38e-03, grad_scale: 16.0 2024-09-14 19:33:52,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=174043.16666666666, ans=0.025 2024-09-14 19:34:05,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.05 vs. limit=10.0 2024-09-14 19:34:15,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-14 19:34:46,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.039e+02 2.151e+02 2.340e+02 3.445e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-14 19:34:52,782 INFO [train.py:1198] (1/2) Epoch 10, batch 3950, loss[loss=0.2333, ctc_loss=0.1633, cr_loss=0.3499, over 20989.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1887, cr_loss=0.4005, over 4081367.53 frames. ], batch size: 51, lr: 8.38e-03, grad_scale: 16.0 2024-09-14 19:35:00,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174156.5, ans=0.1 2024-09-14 19:35:41,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=174241.5, ans=0.2 2024-09-14 19:35:50,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=174241.5, ans=0.0 2024-09-14 19:36:07,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174269.83333333334, ans=0.1 2024-09-14 19:36:10,421 INFO [train.py:1198] (1/2) Epoch 10, batch 4000, loss[loss=0.317, ctc_loss=0.2243, cr_loss=0.4638, over 20673.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1889, cr_loss=0.4001, over 4077363.11 frames. ], batch size: 71, lr: 8.38e-03, grad_scale: 32.0 2024-09-14 19:37:20,008 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.060e+02 2.217e+02 2.405e+02 5.104e+02, threshold=4.433e+02, percent-clipped=1.0 2024-09-14 19:37:20,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=174411.5, ans=0.125 2024-09-14 19:37:26,193 INFO [train.py:1198] (1/2) Epoch 10, batch 4050, loss[loss=0.2733, ctc_loss=0.1937, cr_loss=0.3982, over 20843.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1885, cr_loss=0.3998, over 4083002.32 frames. ], batch size: 65, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:37:44,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=174468.16666666666, ans=0.2 2024-09-14 19:37:56,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174496.5, ans=0.1 2024-09-14 19:38:12,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=174524.83333333334, ans=0.125 2024-09-14 19:38:44,737 INFO [train.py:1198] (1/2) Epoch 10, batch 4100, loss[loss=0.2796, ctc_loss=0.1999, cr_loss=0.3986, over 21065.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1887, cr_loss=0.401, over 4091019.53 frames. ], batch size: 59, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:39:19,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=174638.16666666666, ans=0.125 2024-09-14 19:39:34,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=174666.5, ans=0.2 2024-09-14 19:39:41,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=174666.5, ans=0.1 2024-09-14 19:39:53,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.095e+02 2.241e+02 2.577e+02 5.850e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-14 19:39:58,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=174723.16666666666, ans=0.0 2024-09-14 19:39:59,821 INFO [train.py:1198] (1/2) Epoch 10, batch 4150, loss[loss=0.2609, ctc_loss=0.1841, cr_loss=0.3842, over 20768.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1882, cr_loss=0.4001, over 4093948.92 frames. ], batch size: 56, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:40:07,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=174723.16666666666, ans=0.04949747468305833 2024-09-14 19:40:19,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=174751.5, ans=0.2 2024-09-14 19:40:35,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174779.83333333334, ans=0.1 2024-09-14 19:40:56,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=174808.16666666666, ans=0.125 2024-09-14 19:40:59,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=174836.5, ans=0.125 2024-09-14 19:41:03,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174836.5, ans=0.1 2024-09-14 19:41:13,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174864.83333333334, ans=0.125 2024-09-14 19:41:14,626 INFO [train.py:1198] (1/2) Epoch 10, batch 4200, loss[loss=0.2597, ctc_loss=0.1831, cr_loss=0.3831, over 20974.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1893, cr_loss=0.4009, over 4080433.20 frames. ], batch size: 58, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:41:19,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174864.83333333334, ans=0.1 2024-09-14 19:41:30,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=22.5 2024-09-14 19:41:30,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-14 19:41:31,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=174893.16666666666, ans=0.09899494936611666 2024-09-14 19:41:34,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-14 19:41:40,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=174893.16666666666, ans=0.125 2024-09-14 19:41:50,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=174921.5, ans=0.125 2024-09-14 19:41:51,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=174921.5, ans=0.125 2024-09-14 19:41:59,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174921.5, ans=0.125 2024-09-14 19:42:26,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.047e+02 2.187e+02 2.358e+02 3.971e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-14 19:42:32,498 INFO [train.py:1198] (1/2) Epoch 10, batch 4250, loss[loss=0.2511, ctc_loss=0.1721, cr_loss=0.3947, over 21039.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1884, cr_loss=0.4, over 4092728.77 frames. ], batch size: 62, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:42:38,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=175006.5, ans=0.025 2024-09-14 19:43:12,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2024-09-14 19:43:48,046 INFO [train.py:1198] (1/2) Epoch 10, batch 4300, loss[loss=0.2485, ctc_loss=0.1706, cr_loss=0.3891, over 20945.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1888, cr_loss=0.4014, over 4107442.91 frames. ], batch size: 51, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:44:20,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=175204.83333333334, ans=0.125 2024-09-14 19:44:55,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=175261.5, ans=0.0 2024-09-14 19:44:56,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=175261.5, ans=0.0 2024-09-14 19:45:01,036 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.125e+02 2.354e+02 2.641e+02 3.768e+02, threshold=4.708e+02, percent-clipped=0.0 2024-09-14 19:45:07,097 INFO [train.py:1198] (1/2) Epoch 10, batch 4350, loss[loss=0.2257, ctc_loss=0.1559, cr_loss=0.3487, over 19975.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1876, cr_loss=0.3993, over 4100371.56 frames. ], batch size: 44, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:45:13,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175289.83333333334, ans=0.1 2024-09-14 19:45:35,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=175346.5, ans=0.0 2024-09-14 19:45:50,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=175374.83333333334, ans=0.95 2024-09-14 19:46:05,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=175403.16666666666, ans=0.0 2024-09-14 19:46:08,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=175403.16666666666, ans=0.025 2024-09-14 19:46:22,114 INFO [train.py:1198] (1/2) Epoch 10, batch 4400, loss[loss=0.2446, ctc_loss=0.1701, cr_loss=0.3727, over 20980.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1871, cr_loss=0.398, over 4106728.25 frames. ], batch size: 55, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:46:23,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=175431.5, ans=0.125 2024-09-14 19:46:38,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175459.83333333334, ans=0.1 2024-09-14 19:46:40,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=175459.83333333334, ans=0.125 2024-09-14 19:46:55,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=175488.16666666666, ans=0.125 2024-09-14 19:47:34,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 1.999e+02 2.156e+02 2.318e+02 3.565e+02, threshold=4.312e+02, percent-clipped=0.0 2024-09-14 19:47:37,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=175544.83333333334, ans=0.05 2024-09-14 19:47:40,434 INFO [train.py:1198] (1/2) Epoch 10, batch 4450, loss[loss=0.3286, ctc_loss=0.2424, cr_loss=0.4308, over 14672.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1862, cr_loss=0.3971, over 4111151.52 frames. ], batch size: 152, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:47:42,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=175573.16666666666, ans=0.0 2024-09-14 19:47:44,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-14 19:47:49,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=175573.16666666666, ans=0.125 2024-09-14 19:48:15,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=175629.83333333334, ans=0.025 2024-09-14 19:48:20,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=175629.83333333334, ans=0.95 2024-09-14 19:48:33,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=175658.16666666666, ans=0.0 2024-09-14 19:48:55,960 INFO [train.py:1198] (1/2) Epoch 10, batch 4500, loss[loss=0.2347, ctc_loss=0.163, cr_loss=0.3583, over 20951.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1858, cr_loss=0.3956, over 4118256.11 frames. ], batch size: 49, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:49:17,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=175743.16666666666, ans=0.025 2024-09-14 19:49:22,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=175743.16666666666, ans=0.015 2024-09-14 19:49:56,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=175828.16666666666, ans=0.035 2024-09-14 19:50:08,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.063e+02 2.279e+02 2.542e+02 3.668e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-14 19:50:14,384 INFO [train.py:1198] (1/2) Epoch 10, batch 4550, loss[loss=0.2663, ctc_loss=0.1856, cr_loss=0.4035, over 20655.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1858, cr_loss=0.3959, over 4118081.52 frames. ], batch size: 66, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:50:28,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=22.5 2024-09-14 19:50:33,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=175884.83333333334, ans=0.125 2024-09-14 19:50:36,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-09-14 19:51:05,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-14 19:51:29,712 INFO [train.py:1198] (1/2) Epoch 10, batch 4600, loss[loss=0.2919, ctc_loss=0.2056, cr_loss=0.4314, over 18161.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1875, cr_loss=0.3979, over 4090458.29 frames. ], batch size: 108, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:51:48,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=176026.5, ans=0.2 2024-09-14 19:52:06,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=176054.83333333334, ans=0.125 2024-09-14 19:52:27,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=176083.16666666666, ans=0.0 2024-09-14 19:52:39,036 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.072e+02 2.240e+02 2.532e+02 4.884e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-14 19:52:44,905 INFO [train.py:1198] (1/2) Epoch 10, batch 4650, loss[loss=0.2831, ctc_loss=0.2036, cr_loss=0.3974, over 20974.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1886, cr_loss=0.399, over 4081795.87 frames. ], batch size: 55, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:53:00,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-14 19:53:08,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=176168.16666666666, ans=0.125 2024-09-14 19:53:42,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=22.5 2024-09-14 19:53:51,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=176253.16666666666, ans=0.0 2024-09-14 19:53:56,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-14 19:54:03,612 INFO [train.py:1198] (1/2) Epoch 10, batch 4700, loss[loss=0.2631, ctc_loss=0.1834, cr_loss=0.3984, over 20966.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1885, cr_loss=0.3991, over 4090334.20 frames. ], batch size: 58, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:54:05,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=176281.5, ans=0.09899494936611666 2024-09-14 19:54:32,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=176338.16666666666, ans=0.04949747468305833 2024-09-14 19:54:38,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=176338.16666666666, ans=0.125 2024-09-14 19:54:45,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=176338.16666666666, ans=0.0 2024-09-14 19:54:57,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=176366.5, ans=0.0 2024-09-14 19:55:12,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.092e+02 2.276e+02 2.485e+02 3.566e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-14 19:55:18,471 INFO [train.py:1198] (1/2) Epoch 10, batch 4750, loss[loss=0.2475, ctc_loss=0.172, cr_loss=0.3775, over 21045.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.188, cr_loss=0.3984, over 4098784.98 frames. ], batch size: 56, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:56:25,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=176536.5, ans=0.04949747468305833 2024-09-14 19:56:35,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=176564.83333333334, ans=0.125 2024-09-14 19:56:36,965 INFO [train.py:1198] (1/2) Epoch 10, batch 4800, loss[loss=0.2391, ctc_loss=0.1643, cr_loss=0.374, over 20971.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1877, cr_loss=0.3993, over 4108725.05 frames. ], batch size: 48, lr: 8.32e-03, grad_scale: 32.0 2024-09-14 19:57:32,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=176649.83333333334, ans=0.0 2024-09-14 19:57:36,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2024-09-14 19:57:37,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=176678.16666666666, ans=0.125 2024-09-14 19:57:47,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.086e+02 2.234e+02 2.429e+02 3.536e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-14 19:57:51,830 INFO [train.py:1198] (1/2) Epoch 10, batch 4850, loss[loss=0.2899, ctc_loss=0.2024, cr_loss=0.4375, over 20668.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1881, cr_loss=0.4003, over 4107704.89 frames. ], batch size: 68, lr: 8.32e-03, grad_scale: 16.0 2024-09-14 19:58:23,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176763.16666666666, ans=0.1 2024-09-14 19:58:35,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=176791.5, ans=0.125 2024-09-14 19:59:08,751 INFO [train.py:1198] (1/2) Epoch 10, batch 4900, loss[loss=0.2834, ctc_loss=0.2012, cr_loss=0.4107, over 19997.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1875, cr_loss=0.3988, over 4099503.49 frames. ], batch size: 80, lr: 8.32e-03, grad_scale: 16.0 2024-09-14 19:59:10,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=176848.16666666666, ans=0.125 2024-09-14 19:59:12,199 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:59:34,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176876.5, ans=0.1 2024-09-14 20:00:01,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-09-14 20:00:18,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.033e+02 2.185e+02 2.462e+02 3.596e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-14 20:00:18,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=176961.5, ans=0.0 2024-09-14 20:00:22,892 INFO [train.py:1198] (1/2) Epoch 10, batch 4950, loss[loss=0.228, ctc_loss=0.1543, cr_loss=0.3686, over 21056.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1881, cr_loss=0.3994, over 4105490.26 frames. ], batch size: 56, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:00:38,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=177018.16666666666, ans=0.5 2024-09-14 20:01:00,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177046.5, ans=0.1 2024-09-14 20:01:10,911 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:01:15,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=177074.83333333334, ans=0.025 2024-09-14 20:01:37,661 INFO [train.py:1198] (1/2) Epoch 10, batch 5000, loss[loss=0.2289, ctc_loss=0.1573, cr_loss=0.3577, over 20954.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1879, cr_loss=0.3993, over 4105768.84 frames. ], batch size: 50, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:01:52,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=177159.83333333334, ans=0.125 2024-09-14 20:02:01,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177159.83333333334, ans=0.1 2024-09-14 20:02:07,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=177188.16666666666, ans=0.0 2024-09-14 20:02:19,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177188.16666666666, ans=0.1 2024-09-14 20:02:26,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=177216.5, ans=0.0 2024-09-14 20:02:29,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=177216.5, ans=0.125 2024-09-14 20:02:44,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=177244.83333333334, ans=0.0 2024-09-14 20:02:47,129 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.092e+02 2.222e+02 2.420e+02 3.214e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-14 20:02:51,623 INFO [train.py:1198] (1/2) Epoch 10, batch 5050, loss[loss=0.2766, ctc_loss=0.1912, cr_loss=0.427, over 20875.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1881, cr_loss=0.3991, over 4105476.35 frames. ], batch size: 57, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:03:37,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-14 20:04:09,035 INFO [train.py:1198] (1/2) Epoch 10, batch 5100, loss[loss=0.2333, ctc_loss=0.1601, cr_loss=0.3658, over 20953.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1886, cr_loss=0.3993, over 4100010.44 frames. ], batch size: 49, lr: 8.30e-03, grad_scale: 16.0 2024-09-14 20:04:28,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-14 20:05:18,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.017e+02 2.190e+02 2.387e+02 4.016e+02, threshold=4.380e+02, percent-clipped=0.0 2024-09-14 20:05:21,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177556.5, ans=0.1 2024-09-14 20:05:22,440 INFO [train.py:1198] (1/2) Epoch 10, batch 5150, loss[loss=0.3271, ctc_loss=0.249, cr_loss=0.3906, over 14547.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1884, cr_loss=0.3991, over 4093724.52 frames. ], batch size: 150, lr: 8.30e-03, grad_scale: 16.0 2024-09-14 20:05:28,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=177556.5, ans=0.2 2024-09-14 20:05:38,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=177584.83333333334, ans=0.125 2024-09-14 20:05:57,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177613.16666666666, ans=0.0 2024-09-14 20:05:59,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-14 20:06:12,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=177641.5, ans=0.125 2024-09-14 20:06:14,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177641.5, ans=0.0 2024-09-14 20:06:36,104 INFO [train.py:1198] (1/2) Epoch 10, batch 5200, loss[loss=0.2409, ctc_loss=0.1647, cr_loss=0.381, over 20881.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1873, cr_loss=0.3987, over 4106659.49 frames. ], batch size: 54, lr: 8.30e-03, grad_scale: 32.0 2024-09-14 20:06:37,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=177698.16666666666, ans=0.2 2024-09-14 20:06:45,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=177698.16666666666, ans=0.125 2024-09-14 20:07:00,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=177726.5, ans=0.125 2024-09-14 20:07:09,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=177754.83333333334, ans=0.2 2024-09-14 20:07:16,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=177754.83333333334, ans=0.0 2024-09-14 20:07:19,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=177783.16666666666, ans=0.125 2024-09-14 20:07:46,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.083e+02 2.247e+02 2.554e+02 8.086e+02, threshold=4.494e+02, percent-clipped=1.0 2024-09-14 20:07:50,584 INFO [train.py:1198] (1/2) Epoch 10, batch 5250, loss[loss=0.2602, ctc_loss=0.1819, cr_loss=0.3918, over 20981.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1869, cr_loss=0.3976, over 4116026.44 frames. ], batch size: 58, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:08:12,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=177868.16666666666, ans=0.125 2024-09-14 20:08:34,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=177924.83333333334, ans=0.025 2024-09-14 20:08:47,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-14 20:09:03,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=177953.16666666666, ans=0.125 2024-09-14 20:09:07,483 INFO [train.py:1198] (1/2) Epoch 10, batch 5300, loss[loss=0.2641, ctc_loss=0.1821, cr_loss=0.4104, over 20838.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1864, cr_loss=0.3972, over 4116047.66 frames. ], batch size: 65, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:09:50,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=178066.5, ans=0.2 2024-09-14 20:09:59,591 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:10:03,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=178066.5, ans=0.05 2024-09-14 20:10:16,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.084e+02 2.210e+02 2.533e+02 5.203e+02, threshold=4.421e+02, percent-clipped=3.0 2024-09-14 20:10:17,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-09-14 20:10:21,097 INFO [train.py:1198] (1/2) Epoch 10, batch 5350, loss[loss=0.3087, ctc_loss=0.226, cr_loss=0.4139, over 18111.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1863, cr_loss=0.3969, over 4107558.89 frames. ], batch size: 108, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:10:36,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=178151.5, ans=0.125 2024-09-14 20:10:40,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178151.5, ans=0.1 2024-09-14 20:10:54,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178179.83333333334, ans=0.1 2024-09-14 20:10:59,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=178179.83333333334, ans=0.125 2024-09-14 20:10:59,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=178179.83333333334, ans=0.125 2024-09-14 20:11:02,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=178179.83333333334, ans=0.2 2024-09-14 20:11:05,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=178208.16666666666, ans=0.0 2024-09-14 20:11:11,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=178208.16666666666, ans=0.125 2024-09-14 20:11:34,724 INFO [train.py:1198] (1/2) Epoch 10, batch 5400, loss[loss=0.2928, ctc_loss=0.2075, cr_loss=0.4263, over 19572.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1876, cr_loss=0.3981, over 4103646.02 frames. ], batch size: 90, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:11:42,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=178264.83333333334, ans=0.125 2024-09-14 20:11:51,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-14 20:12:00,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=12.0 2024-09-14 20:12:19,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-09-14 20:12:32,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=178378.16666666666, ans=0.125 2024-09-14 20:12:45,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.079e+02 2.240e+02 2.521e+02 4.779e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-14 20:12:50,510 INFO [train.py:1198] (1/2) Epoch 10, batch 5450, loss[loss=0.2275, ctc_loss=0.1571, cr_loss=0.3523, over 20324.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1869, cr_loss=0.398, over 4115451.85 frames. ], batch size: 45, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:12:50,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178406.5, ans=0.1 2024-09-14 20:13:27,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=178463.16666666666, ans=0.125 2024-09-14 20:13:33,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178491.5, ans=0.125 2024-09-14 20:13:43,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-14 20:13:45,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178491.5, ans=0.1 2024-09-14 20:14:04,253 INFO [train.py:1198] (1/2) Epoch 10, batch 5500, loss[loss=0.2412, ctc_loss=0.1682, cr_loss=0.3647, over 20985.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1859, cr_loss=0.3971, over 4114786.90 frames. ], batch size: 50, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:14:47,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=178633.16666666666, ans=0.0 2024-09-14 20:14:56,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-14 20:14:59,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=178633.16666666666, ans=0.0 2024-09-14 20:14:59,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=178633.16666666666, ans=0.2 2024-09-14 20:15:11,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178661.5, ans=0.0 2024-09-14 20:15:13,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.098e+02 2.272e+02 2.564e+02 4.106e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-14 20:15:17,936 INFO [train.py:1198] (1/2) Epoch 10, batch 5550, loss[loss=0.2929, ctc_loss=0.2046, cr_loss=0.4419, over 20999.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1855, cr_loss=0.3965, over 4117852.04 frames. ], batch size: 64, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:15:52,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=178746.5, ans=0.95 2024-09-14 20:16:02,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=178774.83333333334, ans=0.0 2024-09-14 20:16:11,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178774.83333333334, ans=0.0 2024-09-14 20:16:31,716 INFO [train.py:1198] (1/2) Epoch 10, batch 5600, loss[loss=0.2282, ctc_loss=0.1539, cr_loss=0.371, over 20895.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1865, cr_loss=0.3978, over 4114055.52 frames. ], batch size: 54, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:16:42,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=178831.5, ans=0.125 2024-09-14 20:16:46,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=178859.83333333334, ans=0.025 2024-09-14 20:17:39,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178944.83333333334, ans=0.1 2024-09-14 20:17:43,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.079e+02 2.263e+02 2.565e+02 3.644e+02, threshold=4.526e+02, percent-clipped=0.0 2024-09-14 20:17:45,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-09-14 20:17:47,763 INFO [train.py:1198] (1/2) Epoch 10, batch 5650, loss[loss=0.2309, ctc_loss=0.1614, cr_loss=0.3474, over 20994.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1861, cr_loss=0.3963, over 4101977.39 frames. ], batch size: 48, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:18:01,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179001.5, ans=0.125 2024-09-14 20:18:12,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=179001.5, ans=0.07 2024-09-14 20:18:29,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=179029.83333333334, ans=0.125 2024-09-14 20:18:55,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=179086.5, ans=0.125 2024-09-14 20:19:02,602 INFO [train.py:1198] (1/2) Epoch 10, batch 5700, loss[loss=0.2753, ctc_loss=0.1954, cr_loss=0.3992, over 20978.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1859, cr_loss=0.3966, over 4106235.80 frames. ], batch size: 58, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:19:45,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=179199.83333333334, ans=0.125 2024-09-14 20:20:00,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179228.16666666666, ans=0.1 2024-09-14 20:20:11,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.122e+02 2.353e+02 2.799e+02 5.192e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-14 20:20:12,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=179228.16666666666, ans=10.0 2024-09-14 20:20:16,082 INFO [train.py:1198] (1/2) Epoch 10, batch 5750, loss[loss=0.274, ctc_loss=0.1922, cr_loss=0.4093, over 20794.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1866, cr_loss=0.3969, over 4089808.60 frames. ], batch size: 56, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:20:28,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=179256.5, ans=0.2 2024-09-14 20:21:25,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=179369.83333333334, ans=0.2 2024-09-14 20:21:32,547 INFO [train.py:1198] (1/2) Epoch 10, batch 5800, loss[loss=0.3072, ctc_loss=0.219, cr_loss=0.441, over 19965.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1863, cr_loss=0.3971, over 4094235.83 frames. ], batch size: 80, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:21:32,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=179398.16666666666, ans=0.0 2024-09-14 20:22:14,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-14 20:22:18,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=22.5 2024-09-14 20:22:24,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179483.16666666666, ans=0.1 2024-09-14 20:22:41,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.068e+02 2.266e+02 2.623e+02 4.460e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-14 20:22:46,285 INFO [train.py:1198] (1/2) Epoch 10, batch 5850, loss[loss=0.2633, ctc_loss=0.1822, cr_loss=0.4056, over 20776.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1859, cr_loss=0.3972, over 4099577.53 frames. ], batch size: 56, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:22:48,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=179539.83333333334, ans=12.0 2024-09-14 20:23:26,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=179596.5, ans=0.125 2024-09-14 20:24:00,192 INFO [train.py:1198] (1/2) Epoch 10, batch 5900, loss[loss=0.2482, ctc_loss=0.1732, cr_loss=0.375, over 21044.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1877, cr_loss=0.3989, over 4095512.23 frames. ], batch size: 53, lr: 8.25e-03, grad_scale: 16.0 2024-09-14 20:24:22,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179709.83333333334, ans=0.0 2024-09-14 20:24:45,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=179766.5, ans=0.07 2024-09-14 20:24:53,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=179766.5, ans=0.125 2024-09-14 20:24:53,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=179766.5, ans=0.125 2024-09-14 20:25:11,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.043e+02 2.284e+02 2.438e+02 2.924e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-14 20:25:14,485 INFO [train.py:1198] (1/2) Epoch 10, batch 5950, loss[loss=0.2854, ctc_loss=0.2001, cr_loss=0.4265, over 20764.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1883, cr_loss=0.3993, over 4088719.02 frames. ], batch size: 71, lr: 8.25e-03, grad_scale: 16.0 2024-09-14 20:25:35,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=179851.5, ans=0.125 2024-09-14 20:25:49,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=179879.83333333334, ans=0.125 2024-09-14 20:25:52,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=179879.83333333334, ans=0.0 2024-09-14 20:26:02,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=179908.16666666666, ans=0.2 2024-09-14 20:26:04,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=22.5 2024-09-14 20:26:30,396 INFO [train.py:1198] (1/2) Epoch 10, batch 6000, loss[loss=0.2486, ctc_loss=0.1707, cr_loss=0.3898, over 21035.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1878, cr_loss=0.3985, over 4082210.86 frames. ], batch size: 56, lr: 8.25e-03, grad_scale: 32.0 2024-09-14 20:26:30,397 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 20:26:56,315 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.05303, ctc_loss=0.05303, cr_loss=9.606e-15, over 944034.00 frames. 2024-09-14 20:26:56,316 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 20:27:38,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180021.5, ans=0.0 2024-09-14 20:27:58,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=180078.16666666666, ans=0.125 2024-09-14 20:28:08,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.111e+02 2.253e+02 2.632e+02 5.421e+02, threshold=4.505e+02, percent-clipped=2.0 2024-09-14 20:28:11,710 INFO [train.py:1198] (1/2) Epoch 10, batch 6050, loss[loss=0.3154, ctc_loss=0.2283, cr_loss=0.4354, over 18283.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1872, cr_loss=0.3979, over 4089782.57 frames. ], batch size: 108, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:28:23,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=180106.5, ans=0.0 2024-09-14 20:28:59,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-09-14 20:29:04,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=180191.5, ans=0.0 2024-09-14 20:29:04,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180191.5, ans=0.1 2024-09-14 20:29:07,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=180191.5, ans=0.125 2024-09-14 20:29:26,339 INFO [train.py:1198] (1/2) Epoch 10, batch 6100, loss[loss=0.3184, ctc_loss=0.2237, cr_loss=0.4735, over 20944.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.188, cr_loss=0.3995, over 4084492.35 frames. ], batch size: 64, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:29:30,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180248.16666666666, ans=0.1 2024-09-14 20:29:58,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=180304.83333333334, ans=0.025 2024-09-14 20:30:04,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180304.83333333334, ans=0.1 2024-09-14 20:30:36,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.080e+02 2.247e+02 2.428e+02 5.980e+02, threshold=4.494e+02, percent-clipped=2.0 2024-09-14 20:30:39,839 INFO [train.py:1198] (1/2) Epoch 10, batch 6150, loss[loss=0.3055, ctc_loss=0.2159, cr_loss=0.448, over 20961.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1876, cr_loss=0.3993, over 4085608.55 frames. ], batch size: 64, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:30:48,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=180389.83333333334, ans=0.125 2024-09-14 20:31:02,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=180418.16666666666, ans=15.0 2024-09-14 20:31:26,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-14 20:31:42,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-14 20:31:52,623 INFO [train.py:1198] (1/2) Epoch 10, batch 6200, loss[loss=0.2287, ctc_loss=0.1553, cr_loss=0.3669, over 20967.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1856, cr_loss=0.3969, over 4082618.60 frames. ], batch size: 49, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:31:58,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=180531.5, ans=0.2 2024-09-14 20:32:00,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=180531.5, ans=0.0 2024-09-14 20:32:23,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180588.16666666666, ans=0.1 2024-09-14 20:32:58,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=180644.83333333334, ans=0.125 2024-09-14 20:33:02,644 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.740e+02 2.041e+02 2.240e+02 2.498e+02 3.680e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-14 20:33:05,562 INFO [train.py:1198] (1/2) Epoch 10, batch 6250, loss[loss=0.3412, ctc_loss=0.2551, cr_loss=0.4303, over 13781.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1845, cr_loss=0.3948, over 4091860.17 frames. ], batch size: 149, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:33:31,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=180701.5, ans=0.0 2024-09-14 20:33:34,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=180729.83333333334, ans=0.2 2024-09-14 20:34:18,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=180814.83333333334, ans=0.2 2024-09-14 20:34:19,407 INFO [train.py:1198] (1/2) Epoch 10, batch 6300, loss[loss=0.2441, ctc_loss=0.1667, cr_loss=0.3868, over 20977.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1858, cr_loss=0.3946, over 4045362.58 frames. ], batch size: 50, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:34:36,037 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:34:38,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=180843.16666666666, ans=0.2 2024-09-14 20:34:48,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=180871.5, ans=0.125 2024-09-14 20:34:49,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-14 20:35:28,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.177e+02 2.468e+02 2.812e+02 4.954e+02, threshold=4.937e+02, percent-clipped=1.0 2024-09-14 20:35:31,308 INFO [train.py:1198] (1/2) Epoch 10, batch 6350, loss[loss=0.3131, ctc_loss=0.2292, cr_loss=0.4194, over 14369.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1908, cr_loss=0.3956, over 3874902.54 frames. ], batch size: 150, lr: 8.22e-03, grad_scale: 32.0 2024-09-14 20:35:36,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=180956.5, ans=0.0 2024-09-14 20:35:49,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=180984.83333333334, ans=0.125 2024-09-14 20:35:59,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=181013.16666666666, ans=0.125 2024-09-14 20:37:16,758 INFO [train.py:1198] (1/2) Epoch 11, batch 0, loss[loss=0.3061, ctc_loss=0.2199, cr_loss=0.4307, over 18357.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.2199, cr_loss=0.4307, over 18357.00 frames. ], batch size: 108, lr: 7.85e-03, grad_scale: 32.0 2024-09-14 20:37:16,759 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 20:37:35,259 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.05334, ctc_loss=0.05334, cr_loss=9.322e-15, over 944034.00 frames. 2024-09-14 20:37:35,260 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 20:37:41,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=181069.83333333334, ans=0.125 2024-09-14 20:38:04,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181126.5, ans=0.125 2024-09-14 20:38:07,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=181126.5, ans=0.0 2024-09-14 20:38:17,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=181126.5, ans=0.2 2024-09-14 20:38:19,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=181154.83333333334, ans=0.0 2024-09-14 20:38:26,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181154.83333333334, ans=0.1 2024-09-14 20:38:28,168 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:38:35,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=181183.16666666666, ans=0.0 2024-09-14 20:38:50,006 INFO [train.py:1198] (1/2) Epoch 11, batch 50, loss[loss=0.2956, ctc_loss=0.2074, cr_loss=0.4411, over 20014.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1862, cr_loss=0.3973, over 916375.43 frames. ], batch size: 80, lr: 7.85e-03, grad_scale: 32.0 2024-09-14 20:38:59,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=181211.5, ans=0.025 2024-09-14 20:39:00,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=181211.5, ans=0.2 2024-09-14 20:39:02,234 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.126e+02 2.351e+02 2.668e+02 3.477e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-14 20:39:16,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=181239.83333333334, ans=0.09899494936611666 2024-09-14 20:39:27,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-14 20:39:52,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=181324.83333333334, ans=0.025 2024-09-14 20:39:52,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181324.83333333334, ans=0.1 2024-09-14 20:40:06,214 INFO [train.py:1198] (1/2) Epoch 11, batch 100, loss[loss=0.2721, ctc_loss=0.1876, cr_loss=0.4224, over 21079.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1844, cr_loss=0.3987, over 1630089.92 frames. ], batch size: 59, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:40:33,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=181381.5, ans=0.2 2024-09-14 20:40:45,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-14 20:41:22,953 INFO [train.py:1198] (1/2) Epoch 11, batch 150, loss[loss=0.2678, ctc_loss=0.1861, cr_loss=0.4083, over 20649.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.185, cr_loss=0.3994, over 2190431.59 frames. ], batch size: 68, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:41:29,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=181494.83333333334, ans=0.2 2024-09-14 20:41:35,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.029e+02 2.298e+02 2.608e+02 3.542e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-14 20:41:37,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-09-14 20:41:52,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=181551.5, ans=0.0 2024-09-14 20:42:19,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=181579.83333333334, ans=0.025 2024-09-14 20:42:41,855 INFO [train.py:1198] (1/2) Epoch 11, batch 200, loss[loss=0.2895, ctc_loss=0.2033, cr_loss=0.431, over 20847.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1844, cr_loss=0.3976, over 2620645.70 frames. ], batch size: 59, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:43:25,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=181721.5, ans=0.125 2024-09-14 20:43:35,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2024-09-14 20:43:45,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=181749.83333333334, ans=0.125 2024-09-14 20:43:57,320 INFO [train.py:1198] (1/2) Epoch 11, batch 250, loss[loss=0.2832, ctc_loss=0.1968, cr_loss=0.4318, over 21014.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1843, cr_loss=0.3977, over 2954371.11 frames. ], batch size: 61, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:44:09,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.090e+02 2.207e+02 2.442e+02 3.598e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-14 20:44:15,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=181806.5, ans=0.125 2024-09-14 20:44:23,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=181806.5, ans=0.0 2024-09-14 20:44:29,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-14 20:44:39,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=181834.83333333334, ans=0.125 2024-09-14 20:45:12,614 INFO [train.py:1198] (1/2) Epoch 11, batch 300, loss[loss=0.2935, ctc_loss=0.2026, cr_loss=0.4546, over 20678.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1854, cr_loss=0.3995, over 3213374.94 frames. ], batch size: 68, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:45:26,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=22.5 2024-09-14 20:45:53,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-14 20:46:03,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182004.83333333334, ans=0.1 2024-09-14 20:46:27,325 INFO [train.py:1198] (1/2) Epoch 11, batch 350, loss[loss=0.2838, ctc_loss=0.1959, cr_loss=0.4392, over 20959.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1869, cr_loss=0.4003, over 3387689.07 frames. ], batch size: 58, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:46:35,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2024-09-14 20:46:42,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.073e+02 2.252e+02 2.506e+02 3.676e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-14 20:46:51,843 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:47:02,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=182118.16666666666, ans=0.125 2024-09-14 20:47:49,339 INFO [train.py:1198] (1/2) Epoch 11, batch 400, loss[loss=0.3235, ctc_loss=0.2338, cr_loss=0.4487, over 18521.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1866, cr_loss=0.3998, over 3554114.49 frames. ], batch size: 108, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:48:01,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=182203.16666666666, ans=0.0 2024-09-14 20:48:02,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-09-14 20:48:30,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=182259.83333333334, ans=0.125 2024-09-14 20:48:48,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.67 vs. limit=10.0 2024-09-14 20:48:55,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=182316.5, ans=0.0 2024-09-14 20:49:04,484 INFO [train.py:1198] (1/2) Epoch 11, batch 450, loss[loss=0.2702, ctc_loss=0.1859, cr_loss=0.4215, over 20858.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1865, cr_loss=0.3993, over 3679048.49 frames. ], batch size: 65, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:49:16,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.061e+02 2.238e+02 2.492e+02 3.942e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-14 20:49:35,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=182401.5, ans=0.0 2024-09-14 20:49:38,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182401.5, ans=0.1 2024-09-14 20:50:10,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=182458.16666666666, ans=0.1 2024-09-14 20:50:19,471 INFO [train.py:1198] (1/2) Epoch 11, batch 500, loss[loss=0.2825, ctc_loss=0.1973, cr_loss=0.4257, over 20641.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1861, cr_loss=0.3981, over 3770356.40 frames. ], batch size: 66, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:50:25,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=182486.5, ans=0.125 2024-09-14 20:50:39,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=182514.83333333334, ans=0.125 2024-09-14 20:51:09,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=182571.5, ans=0.125 2024-09-14 20:51:34,515 INFO [train.py:1198] (1/2) Epoch 11, batch 550, loss[loss=0.2203, ctc_loss=0.152, cr_loss=0.3412, over 21014.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.1864, cr_loss=0.3989, over 3835354.58 frames. ], batch size: 52, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:51:46,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.110e+02 2.255e+02 2.525e+02 4.249e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-14 20:51:52,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=182656.5, ans=0.5 2024-09-14 20:51:57,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=182656.5, ans=0.2 2024-09-14 20:51:58,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=182656.5, ans=0.0 2024-09-14 20:52:12,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=182684.83333333334, ans=0.125 2024-09-14 20:52:18,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=182713.16666666666, ans=0.025 2024-09-14 20:52:43,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=182741.5, ans=0.0 2024-09-14 20:52:52,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2024-09-14 20:52:53,125 INFO [train.py:1198] (1/2) Epoch 11, batch 600, loss[loss=0.2801, ctc_loss=0.1945, cr_loss=0.428, over 21021.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1847, cr_loss=0.3965, over 3906114.28 frames. ], batch size: 63, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:52:56,566 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:53:43,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=182854.83333333334, ans=0.2 2024-09-14 20:54:11,225 INFO [train.py:1198] (1/2) Epoch 11, batch 650, loss[loss=0.2938, ctc_loss=0.2076, cr_loss=0.4311, over 20880.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1848, cr_loss=0.3965, over 3941083.30 frames. ], batch size: 57, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:54:23,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 1.966e+02 2.097e+02 2.244e+02 3.204e+02, threshold=4.194e+02, percent-clipped=0.0 2024-09-14 20:55:01,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=182996.5, ans=0.025 2024-09-14 20:55:01,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182996.5, ans=0.125 2024-09-14 20:55:26,190 INFO [train.py:1198] (1/2) Epoch 11, batch 700, loss[loss=0.2606, ctc_loss=0.1773, cr_loss=0.4161, over 20840.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1851, cr_loss=0.3968, over 3982300.82 frames. ], batch size: 65, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:55:52,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.03 vs. limit=10.0 2024-09-14 20:56:17,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=183138.16666666666, ans=0.2 2024-09-14 20:56:38,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183166.5, ans=0.1 2024-09-14 20:56:41,059 INFO [train.py:1198] (1/2) Epoch 11, batch 750, loss[loss=0.3083, ctc_loss=0.2188, cr_loss=0.4472, over 20600.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1861, cr_loss=0.398, over 3995423.17 frames. ], batch size: 75, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:56:49,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=183194.83333333334, ans=0.07 2024-09-14 20:56:53,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.035e+02 2.206e+02 2.455e+02 4.066e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-14 20:57:10,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=183251.5, ans=0.125 2024-09-14 20:57:13,275 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:57:56,933 INFO [train.py:1198] (1/2) Epoch 11, batch 800, loss[loss=0.2663, ctc_loss=0.1854, cr_loss=0.4049, over 20959.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1855, cr_loss=0.3974, over 4008370.61 frames. ], batch size: 64, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:58:05,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=12.0 2024-09-14 20:58:23,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=22.5 2024-09-14 20:58:45,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=183421.5, ans=0.125 2024-09-14 20:58:53,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=183421.5, ans=0.125 2024-09-14 20:59:18,246 INFO [train.py:1198] (1/2) Epoch 11, batch 850, loss[loss=0.1985, ctc_loss=0.1329, cr_loss=0.3279, over 20961.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1855, cr_loss=0.3976, over 4026674.01 frames. ], batch size: 48, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:59:30,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.078e+02 2.290e+02 2.606e+02 4.624e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-14 20:59:50,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-14 21:00:02,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=183563.16666666666, ans=0.125 2024-09-14 21:00:32,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-14 21:00:34,178 INFO [train.py:1198] (1/2) Epoch 11, batch 900, loss[loss=0.2888, ctc_loss=0.2092, cr_loss=0.3985, over 20912.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.185, cr_loss=0.3961, over 4030313.45 frames. ], batch size: 60, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 21:00:36,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=183619.83333333334, ans=0.0 2024-09-14 21:00:37,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=183619.83333333334, ans=0.0 2024-09-14 21:00:54,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-14 21:01:29,331 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:01:49,881 INFO [train.py:1198] (1/2) Epoch 11, batch 950, loss[loss=0.2696, ctc_loss=0.1854, cr_loss=0.4213, over 21007.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1852, cr_loss=0.397, over 4053019.43 frames. ], batch size: 61, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:02:01,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.043e+02 2.313e+02 2.471e+02 3.545e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-14 21:02:10,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=183789.83333333334, ans=0.2 2024-09-14 21:02:18,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=183818.16666666666, ans=0.125 2024-09-14 21:02:22,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=183818.16666666666, ans=0.125 2024-09-14 21:03:00,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=183874.83333333334, ans=0.5 2024-09-14 21:03:04,853 INFO [train.py:1198] (1/2) Epoch 11, batch 1000, loss[loss=0.2769, ctc_loss=0.1935, cr_loss=0.4172, over 20956.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1858, cr_loss=0.3986, over 4062616.59 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:03:30,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=183931.5, ans=0.125 2024-09-14 21:04:22,958 INFO [train.py:1198] (1/2) Epoch 11, batch 1050, loss[loss=0.2543, ctc_loss=0.1789, cr_loss=0.3772, over 20971.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1861, cr_loss=0.399, over 4064037.39 frames. ], batch size: 58, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:04:34,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.069e+02 2.217e+02 2.437e+02 3.461e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-14 21:05:01,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=184101.5, ans=0.025 2024-09-14 21:05:22,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=184129.83333333334, ans=0.0 2024-09-14 21:05:25,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=184158.16666666666, ans=0.04949747468305833 2024-09-14 21:05:37,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=184158.16666666666, ans=0.125 2024-09-14 21:05:41,964 INFO [train.py:1198] (1/2) Epoch 11, batch 1100, loss[loss=0.2514, ctc_loss=0.1782, cr_loss=0.3662, over 21057.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1856, cr_loss=0.3976, over 4065639.16 frames. ], batch size: 56, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:05:43,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=184186.5, ans=0.125 2024-09-14 21:05:54,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=184186.5, ans=0.0 2024-09-14 21:05:58,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184214.83333333334, ans=0.1 2024-09-14 21:06:44,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184299.83333333334, ans=0.1 2024-09-14 21:06:48,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=184299.83333333334, ans=0.0 2024-09-14 21:06:57,293 INFO [train.py:1198] (1/2) Epoch 11, batch 1150, loss[loss=0.2866, ctc_loss=0.2035, cr_loss=0.4155, over 19353.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1857, cr_loss=0.3978, over 4080952.12 frames. ], batch size: 90, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:07:09,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.033e+02 2.198e+02 2.586e+02 3.807e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-14 21:07:33,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184384.83333333334, ans=0.125 2024-09-14 21:07:35,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=184384.83333333334, ans=0.025 2024-09-14 21:07:35,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=184384.83333333334, ans=0.04949747468305833 2024-09-14 21:07:42,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=184413.16666666666, ans=0.125 2024-09-14 21:07:43,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-09-14 21:08:12,653 INFO [train.py:1198] (1/2) Epoch 11, batch 1200, loss[loss=0.2567, ctc_loss=0.1783, cr_loss=0.392, over 20681.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.185, cr_loss=0.3967, over 4087177.34 frames. ], batch size: 71, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:08:43,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=22.5 2024-09-14 21:09:07,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.87 vs. limit=10.0 2024-09-14 21:09:30,731 INFO [train.py:1198] (1/2) Epoch 11, batch 1250, loss[loss=0.2311, ctc_loss=0.1583, cr_loss=0.3636, over 20965.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1845, cr_loss=0.396, over 4092830.80 frames. ], batch size: 51, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:09:42,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.028e+02 2.201e+02 2.379e+02 3.643e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-14 21:09:46,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=184639.83333333334, ans=0.5 2024-09-14 21:10:11,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=184668.16666666666, ans=0.0 2024-09-14 21:10:23,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184696.5, ans=0.125 2024-09-14 21:10:46,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=184724.83333333334, ans=0.025 2024-09-14 21:10:49,397 INFO [train.py:1198] (1/2) Epoch 11, batch 1300, loss[loss=0.2356, ctc_loss=0.1648, cr_loss=0.3537, over 20972.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1837, cr_loss=0.395, over 4106477.51 frames. ], batch size: 50, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:11:07,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=184781.5, ans=0.09899494936611666 2024-09-14 21:11:09,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-09-14 21:11:15,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=184781.5, ans=0.125 2024-09-14 21:11:59,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=184866.5, ans=0.125 2024-09-14 21:12:04,060 INFO [train.py:1198] (1/2) Epoch 11, batch 1350, loss[loss=0.2339, ctc_loss=0.1595, cr_loss=0.3721, over 20978.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1839, cr_loss=0.3956, over 4109821.71 frames. ], batch size: 52, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:12:06,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-09-14 21:12:16,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.009e+02 2.122e+02 2.250e+02 3.486e+02, threshold=4.244e+02, percent-clipped=0.0 2024-09-14 21:12:25,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184923.16666666666, ans=0.1 2024-09-14 21:12:40,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=184951.5, ans=0.0 2024-09-14 21:12:47,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184979.83333333334, ans=0.0 2024-09-14 21:12:52,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=184979.83333333334, ans=0.0 2024-09-14 21:12:56,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=22.5 2024-09-14 21:12:58,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184979.83333333334, ans=0.1 2024-09-14 21:13:14,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=185008.16666666666, ans=0.2 2024-09-14 21:13:18,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-14 21:13:19,032 INFO [train.py:1198] (1/2) Epoch 11, batch 1400, loss[loss=0.263, ctc_loss=0.1817, cr_loss=0.4064, over 20774.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1832, cr_loss=0.3945, over 4099821.37 frames. ], batch size: 56, lr: 7.77e-03, grad_scale: 16.0 2024-09-14 21:13:29,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=185036.5, ans=0.125 2024-09-14 21:13:56,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=185093.16666666666, ans=0.125 2024-09-14 21:14:00,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-14 21:14:10,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=185121.5, ans=0.125 2024-09-14 21:14:31,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=185149.83333333334, ans=0.0 2024-09-14 21:14:32,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=185178.16666666666, ans=10.0 2024-09-14 21:14:33,896 INFO [train.py:1198] (1/2) Epoch 11, batch 1450, loss[loss=0.2729, ctc_loss=0.1901, cr_loss=0.4142, over 20970.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1836, cr_loss=0.3945, over 4096192.54 frames. ], batch size: 58, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:14:37,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185178.16666666666, ans=0.1 2024-09-14 21:14:41,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185178.16666666666, ans=0.1 2024-09-14 21:14:47,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.054e+02 2.210e+02 2.365e+02 5.285e+02, threshold=4.419e+02, percent-clipped=2.0 2024-09-14 21:15:19,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=185234.83333333334, ans=0.1 2024-09-14 21:15:31,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=185263.16666666666, ans=0.125 2024-09-14 21:15:35,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2024-09-14 21:15:52,462 INFO [train.py:1198] (1/2) Epoch 11, batch 1500, loss[loss=0.2618, ctc_loss=0.18, cr_loss=0.4091, over 20965.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1848, cr_loss=0.3957, over 4092763.84 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:15:53,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-14 21:16:09,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=185348.16666666666, ans=0.125 2024-09-14 21:16:37,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=185376.5, ans=0.025 2024-09-14 21:16:48,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=185404.83333333334, ans=0.0 2024-09-14 21:17:06,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=185433.16666666666, ans=0.125 2024-09-14 21:17:10,454 INFO [train.py:1198] (1/2) Epoch 11, batch 1550, loss[loss=0.2698, ctc_loss=0.1911, cr_loss=0.3937, over 20977.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1845, cr_loss=0.3961, over 4102841.17 frames. ], batch size: 58, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:17:13,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=185461.5, ans=0.05 2024-09-14 21:17:23,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.056e+02 2.230e+02 2.418e+02 4.179e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-14 21:17:41,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=22.5 2024-09-14 21:18:02,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=185546.5, ans=0.025 2024-09-14 21:18:11,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=185574.83333333334, ans=0.2 2024-09-14 21:18:26,082 INFO [train.py:1198] (1/2) Epoch 11, batch 1600, loss[loss=0.2384, ctc_loss=0.1648, cr_loss=0.3677, over 20848.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1831, cr_loss=0.3943, over 4109379.67 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:18:32,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=185603.16666666666, ans=0.0 2024-09-14 21:18:50,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185631.5, ans=0.1 2024-09-14 21:19:04,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=185659.83333333334, ans=0.015 2024-09-14 21:19:17,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=185688.16666666666, ans=0.04949747468305833 2024-09-14 21:19:29,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2024-09-14 21:19:41,301 INFO [train.py:1198] (1/2) Epoch 11, batch 1650, loss[loss=0.2498, ctc_loss=0.1737, cr_loss=0.3801, over 20884.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1843, cr_loss=0.395, over 4092593.10 frames. ], batch size: 54, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:19:55,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.099e+02 2.332e+02 2.734e+02 4.290e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-14 21:20:01,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=185773.16666666666, ans=0.025 2024-09-14 21:20:23,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=185801.5, ans=0.125 2024-09-14 21:20:31,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=185829.83333333334, ans=0.0 2024-09-14 21:20:59,780 INFO [train.py:1198] (1/2) Epoch 11, batch 1700, loss[loss=0.2206, ctc_loss=0.1505, cr_loss=0.3503, over 19902.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1839, cr_loss=0.3948, over 4092902.74 frames. ], batch size: 44, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:21:49,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.06 vs. limit=10.0 2024-09-14 21:22:18,078 INFO [train.py:1198] (1/2) Epoch 11, batch 1750, loss[loss=0.2719, ctc_loss=0.1878, cr_loss=0.4202, over 20974.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1847, cr_loss=0.397, over 4106204.22 frames. ], batch size: 58, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:22:31,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.062e+02 2.241e+02 2.492e+02 4.229e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 21:22:42,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=186056.5, ans=0.125 2024-09-14 21:23:00,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=186084.83333333334, ans=0.0 2024-09-14 21:23:03,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186113.16666666666, ans=0.0 2024-09-14 21:23:26,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-14 21:23:32,880 INFO [train.py:1198] (1/2) Epoch 11, batch 1800, loss[loss=0.2492, ctc_loss=0.1734, cr_loss=0.3788, over 20982.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1846, cr_loss=0.3968, over 4094630.79 frames. ], batch size: 55, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:23:59,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=22.5 2024-09-14 21:24:04,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=186226.5, ans=0.2 2024-09-14 21:24:20,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186254.83333333334, ans=0.1 2024-09-14 21:24:47,554 INFO [train.py:1198] (1/2) Epoch 11, batch 1850, loss[loss=0.2978, ctc_loss=0.2128, cr_loss=0.425, over 19402.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1857, cr_loss=0.3979, over 4096257.70 frames. ], batch size: 90, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:24:55,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=186311.5, ans=0.125 2024-09-14 21:25:02,701 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.057e+02 2.167e+02 2.409e+02 4.086e+02, threshold=4.333e+02, percent-clipped=0.0 2024-09-14 21:25:09,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=186339.83333333334, ans=0.0 2024-09-14 21:25:15,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-14 21:25:51,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=186424.83333333334, ans=0.125 2024-09-14 21:26:03,136 INFO [train.py:1198] (1/2) Epoch 11, batch 1900, loss[loss=0.2801, ctc_loss=0.1928, cr_loss=0.4361, over 20123.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1839, cr_loss=0.3962, over 4103976.43 frames. ], batch size: 80, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:27:21,409 INFO [train.py:1198] (1/2) Epoch 11, batch 1950, loss[loss=0.3015, ctc_loss=0.2151, cr_loss=0.4323, over 18152.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1833, cr_loss=0.3959, over 4112960.82 frames. ], batch size: 108, lr: 7.73e-03, grad_scale: 16.0 2024-09-14 21:27:29,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=186594.83333333334, ans=0.0 2024-09-14 21:27:30,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186594.83333333334, ans=0.1 2024-09-14 21:27:36,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.006e+02 2.164e+02 2.327e+02 3.353e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-14 21:28:39,551 INFO [train.py:1198] (1/2) Epoch 11, batch 2000, loss[loss=0.2679, ctc_loss=0.1839, cr_loss=0.4203, over 20862.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1826, cr_loss=0.3943, over 4111976.43 frames. ], batch size: 65, lr: 7.73e-03, grad_scale: 32.0 2024-09-14 21:28:44,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=186736.5, ans=0.125 2024-09-14 21:28:44,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=186736.5, ans=0.2 2024-09-14 21:29:55,186 INFO [train.py:1198] (1/2) Epoch 11, batch 2050, loss[loss=0.2503, ctc_loss=0.1718, cr_loss=0.3927, over 21013.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1821, cr_loss=0.3936, over 4115212.19 frames. ], batch size: 61, lr: 7.73e-03, grad_scale: 32.0 2024-09-14 21:30:07,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=186878.16666666666, ans=0.125 2024-09-14 21:30:10,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.062e+02 2.212e+02 2.510e+02 4.514e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-14 21:30:22,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=186906.5, ans=0.2 2024-09-14 21:31:10,055 INFO [train.py:1198] (1/2) Epoch 11, batch 2100, loss[loss=0.2438, ctc_loss=0.1698, cr_loss=0.3703, over 20973.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1828, cr_loss=0.3952, over 4120564.68 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:31:14,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=187019.83333333334, ans=0.2 2024-09-14 21:31:28,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=187048.16666666666, ans=0.2 2024-09-14 21:31:35,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=187048.16666666666, ans=0.07 2024-09-14 21:31:37,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187048.16666666666, ans=0.1 2024-09-14 21:31:46,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=187076.5, ans=0.125 2024-09-14 21:31:58,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=187104.83333333334, ans=0.0 2024-09-14 21:32:01,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=187104.83333333334, ans=0.125 2024-09-14 21:32:28,251 INFO [train.py:1198] (1/2) Epoch 11, batch 2150, loss[loss=0.2739, ctc_loss=0.1903, cr_loss=0.4178, over 20875.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1817, cr_loss=0.3936, over 4124312.78 frames. ], batch size: 54, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:32:33,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=187161.5, ans=0.2 2024-09-14 21:32:43,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.021e+02 2.178e+02 2.400e+02 3.234e+02, threshold=4.357e+02, percent-clipped=0.0 2024-09-14 21:33:06,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=187218.16666666666, ans=0.125 2024-09-14 21:33:40,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=187274.83333333334, ans=0.05 2024-09-14 21:33:46,144 INFO [train.py:1198] (1/2) Epoch 11, batch 2200, loss[loss=0.2477, ctc_loss=0.1708, cr_loss=0.3846, over 21049.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1819, cr_loss=0.3932, over 4111667.40 frames. ], batch size: 53, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:34:03,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=187331.5, ans=12.0 2024-09-14 21:34:17,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=187359.83333333334, ans=0.2 2024-09-14 21:34:25,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=187359.83333333334, ans=0.125 2024-09-14 21:34:25,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=187359.83333333334, ans=0.125 2024-09-14 21:35:00,837 INFO [train.py:1198] (1/2) Epoch 11, batch 2250, loss[loss=0.3138, ctc_loss=0.2223, cr_loss=0.4576, over 20859.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1823, cr_loss=0.3939, over 4107465.72 frames. ], batch size: 65, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:35:15,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.089e+02 2.236e+02 2.465e+02 4.073e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-14 21:35:34,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-14 21:36:10,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-09-14 21:36:16,019 INFO [train.py:1198] (1/2) Epoch 11, batch 2300, loss[loss=0.2231, ctc_loss=0.1528, cr_loss=0.3515, over 20992.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1826, cr_loss=0.3941, over 4106083.19 frames. ], batch size: 49, lr: 7.71e-03, grad_scale: 32.0 2024-09-14 21:36:19,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-09-14 21:36:23,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=187586.5, ans=0.0 2024-09-14 21:36:50,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=187643.16666666666, ans=0.2 2024-09-14 21:36:51,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187643.16666666666, ans=0.1 2024-09-14 21:37:30,636 INFO [train.py:1198] (1/2) Epoch 11, batch 2350, loss[loss=0.28, ctc_loss=0.199, cr_loss=0.4053, over 20637.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1837, cr_loss=0.3959, over 4110009.26 frames. ], batch size: 66, lr: 7.71e-03, grad_scale: 16.0 2024-09-14 21:37:37,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=187728.16666666666, ans=0.05 2024-09-14 21:37:47,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.075e+02 2.284e+02 2.653e+02 4.046e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-14 21:38:14,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=187784.83333333334, ans=0.125 2024-09-14 21:38:26,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=187813.16666666666, ans=0.0 2024-09-14 21:38:39,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=187841.5, ans=0.125 2024-09-14 21:38:41,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-14 21:38:48,557 INFO [train.py:1198] (1/2) Epoch 11, batch 2400, loss[loss=0.2133, ctc_loss=0.1463, cr_loss=0.3352, over 20966.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1844, cr_loss=0.3965, over 4108796.65 frames. ], batch size: 48, lr: 7.71e-03, grad_scale: 32.0 2024-09-14 21:39:03,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-14 21:39:32,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187926.5, ans=0.1 2024-09-14 21:40:01,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-14 21:40:06,670 INFO [train.py:1198] (1/2) Epoch 11, batch 2450, loss[loss=0.2571, ctc_loss=0.1765, cr_loss=0.4032, over 20922.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1848, cr_loss=0.3977, over 4111973.05 frames. ], batch size: 60, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:40:23,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 1.984e+02 2.150e+02 2.403e+02 5.578e+02, threshold=4.300e+02, percent-clipped=2.0 2024-09-14 21:40:29,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188039.83333333334, ans=0.125 2024-09-14 21:40:30,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=188039.83333333334, ans=0.125 2024-09-14 21:40:32,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=188039.83333333334, ans=0.125 2024-09-14 21:40:33,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=188039.83333333334, ans=0.025 2024-09-14 21:41:10,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-14 21:41:21,418 INFO [train.py:1198] (1/2) Epoch 11, batch 2500, loss[loss=0.2152, ctc_loss=0.1468, cr_loss=0.3419, over 20936.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1851, cr_loss=0.3973, over 4107865.63 frames. ], batch size: 49, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:41:29,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=188153.16666666666, ans=0.0 2024-09-14 21:41:37,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=22.5 2024-09-14 21:41:41,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=188181.5, ans=0.07 2024-09-14 21:41:51,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=188209.83333333334, ans=0.125 2024-09-14 21:41:56,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=188209.83333333334, ans=0.0 2024-09-14 21:42:08,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=188238.16666666666, ans=0.0 2024-09-14 21:42:23,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=188266.5, ans=0.125 2024-09-14 21:42:26,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=188266.5, ans=0.125 2024-09-14 21:42:29,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188266.5, ans=0.125 2024-09-14 21:42:36,493 INFO [train.py:1198] (1/2) Epoch 11, batch 2550, loss[loss=0.2417, ctc_loss=0.1688, cr_loss=0.3645, over 20888.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1849, cr_loss=0.3968, over 4098097.64 frames. ], batch size: 54, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:42:52,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188323.16666666666, ans=0.1 2024-09-14 21:42:53,257 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.038e+02 2.245e+02 2.484e+02 5.799e+02, threshold=4.490e+02, percent-clipped=2.0 2024-09-14 21:43:01,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=188323.16666666666, ans=0.125 2024-09-14 21:43:10,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=188351.5, ans=0.125 2024-09-14 21:43:16,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=188351.5, ans=0.2 2024-09-14 21:43:17,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=188351.5, ans=0.07 2024-09-14 21:43:27,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-14 21:43:54,771 INFO [train.py:1198] (1/2) Epoch 11, batch 2600, loss[loss=0.3012, ctc_loss=0.2119, cr_loss=0.4464, over 20849.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1851, cr_loss=0.3974, over 4105140.82 frames. ], batch size: 65, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:44:27,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-14 21:45:02,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=188549.83333333334, ans=0.125 2024-09-14 21:45:12,322 INFO [train.py:1198] (1/2) Epoch 11, batch 2650, loss[loss=0.2795, ctc_loss=0.1999, cr_loss=0.398, over 20675.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1852, cr_loss=0.3975, over 4108577.02 frames. ], batch size: 66, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:45:28,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.023e+02 2.156e+02 2.344e+02 3.861e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-14 21:45:35,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-14 21:46:27,963 INFO [train.py:1198] (1/2) Epoch 11, batch 2700, loss[loss=0.2795, ctc_loss=0.1951, cr_loss=0.4221, over 20950.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1855, cr_loss=0.3979, over 4107148.74 frames. ], batch size: 60, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:46:31,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188719.83333333334, ans=0.125 2024-09-14 21:47:08,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-14 21:47:25,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=188804.83333333334, ans=0.125 2024-09-14 21:47:42,714 INFO [train.py:1198] (1/2) Epoch 11, batch 2750, loss[loss=0.2722, ctc_loss=0.1901, cr_loss=0.4105, over 20667.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1849, cr_loss=0.3972, over 4106295.99 frames. ], batch size: 68, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:47:47,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188861.5, ans=0.1 2024-09-14 21:47:59,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.035e+02 2.175e+02 2.452e+02 3.657e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-14 21:48:56,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189003.16666666666, ans=0.1 2024-09-14 21:48:57,611 INFO [train.py:1198] (1/2) Epoch 11, batch 2800, loss[loss=0.2344, ctc_loss=0.1638, cr_loss=0.3528, over 20994.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1842, cr_loss=0.3965, over 4104549.47 frames. ], batch size: 51, lr: 7.68e-03, grad_scale: 32.0 2024-09-14 21:49:10,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=189003.16666666666, ans=0.125 2024-09-14 21:49:54,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=189088.16666666666, ans=0.0 2024-09-14 21:50:08,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=189116.5, ans=0.05 2024-09-14 21:50:11,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=189116.5, ans=0.015 2024-09-14 21:50:13,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-14 21:50:15,633 INFO [train.py:1198] (1/2) Epoch 11, batch 2850, loss[loss=0.3045, ctc_loss=0.2098, cr_loss=0.4737, over 20871.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1854, cr_loss=0.3986, over 4097937.68 frames. ], batch size: 65, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:50:22,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=189144.83333333334, ans=0.125 2024-09-14 21:50:36,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.063e+02 2.238e+02 2.511e+02 4.868e+02, threshold=4.476e+02, percent-clipped=1.0 2024-09-14 21:51:05,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=189229.83333333334, ans=0.025 2024-09-14 21:51:10,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189229.83333333334, ans=0.1 2024-09-14 21:51:33,995 INFO [train.py:1198] (1/2) Epoch 11, batch 2900, loss[loss=0.2692, ctc_loss=0.1895, cr_loss=0.3985, over 20931.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1837, cr_loss=0.3957, over 4103362.05 frames. ], batch size: 60, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:51:43,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=189286.5, ans=0.125 2024-09-14 21:51:55,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189314.83333333334, ans=0.125 2024-09-14 21:51:55,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=189314.83333333334, ans=0.0 2024-09-14 21:51:55,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=189314.83333333334, ans=0.125 2024-09-14 21:51:55,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-14 21:51:56,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=189314.83333333334, ans=0.125 2024-09-14 21:52:22,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=189371.5, ans=0.125 2024-09-14 21:52:26,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-09-14 21:52:49,443 INFO [train.py:1198] (1/2) Epoch 11, batch 2950, loss[loss=0.2323, ctc_loss=0.1557, cr_loss=0.3827, over 20925.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1827, cr_loss=0.3951, over 4111375.59 frames. ], batch size: 50, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:52:51,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=189428.16666666666, ans=0.125 2024-09-14 21:53:06,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=189456.5, ans=0.125 2024-09-14 21:53:07,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.053e+02 2.227e+02 2.462e+02 3.658e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-14 21:53:09,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=189456.5, ans=0.025 2024-09-14 21:53:17,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=189456.5, ans=0.125 2024-09-14 21:53:33,618 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:53:38,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189513.16666666666, ans=0.125 2024-09-14 21:54:05,339 INFO [train.py:1198] (1/2) Epoch 11, batch 3000, loss[loss=0.2454, ctc_loss=0.1671, cr_loss=0.3915, over 20986.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1822, cr_loss=0.3942, over 4118023.40 frames. ], batch size: 52, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:54:05,340 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 21:54:25,342 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.05216, ctc_loss=0.05216, cr_loss=9.981e-15, over 944034.00 frames. 2024-09-14 21:54:25,343 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 21:54:56,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2024-09-14 21:55:11,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=189626.5, ans=15.0 2024-09-14 21:55:44,555 INFO [train.py:1198] (1/2) Epoch 11, batch 3050, loss[loss=0.2561, ctc_loss=0.1784, cr_loss=0.3886, over 20948.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1815, cr_loss=0.3939, over 4124319.89 frames. ], batch size: 60, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:56:05,129 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.057e+02 2.201e+02 2.435e+02 3.335e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-14 21:56:11,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=189739.83333333334, ans=0.0 2024-09-14 21:56:11,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=189739.83333333334, ans=0.0 2024-09-14 21:56:58,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=189824.83333333334, ans=0.125 2024-09-14 21:57:01,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189853.16666666666, ans=0.125 2024-09-14 21:57:02,455 INFO [train.py:1198] (1/2) Epoch 11, batch 3100, loss[loss=0.273, ctc_loss=0.1875, cr_loss=0.4273, over 20789.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.181, cr_loss=0.3935, over 4119628.12 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:57:03,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-09-14 21:57:11,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=189853.16666666666, ans=0.0 2024-09-14 21:57:30,294 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:57:30,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189881.5, ans=0.1 2024-09-14 21:57:52,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=189938.16666666666, ans=10.0 2024-09-14 21:58:13,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=189966.5, ans=0.05 2024-09-14 21:58:18,055 INFO [train.py:1198] (1/2) Epoch 11, batch 3150, loss[loss=0.2074, ctc_loss=0.1423, cr_loss=0.3253, over 20924.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1818, cr_loss=0.3944, over 4112905.72 frames. ], batch size: 49, lr: 7.66e-03, grad_scale: 16.0 2024-09-14 21:58:36,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.063e+02 2.220e+02 2.457e+02 5.443e+02, threshold=4.441e+02, percent-clipped=2.0 2024-09-14 21:58:38,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190023.16666666666, ans=0.1 2024-09-14 21:59:12,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=190079.83333333334, ans=0.125 2024-09-14 21:59:17,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=190108.16666666666, ans=0.0 2024-09-14 21:59:29,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190108.16666666666, ans=0.1 2024-09-14 21:59:33,603 INFO [train.py:1198] (1/2) Epoch 11, batch 3200, loss[loss=0.2218, ctc_loss=0.1523, cr_loss=0.3472, over 21005.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1824, cr_loss=0.394, over 4081396.58 frames. ], batch size: 52, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 21:59:34,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=22.5 2024-09-14 21:59:54,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=190164.83333333334, ans=0.0 2024-09-14 21:59:55,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-14 22:00:51,841 INFO [train.py:1198] (1/2) Epoch 11, batch 3250, loss[loss=0.2788, ctc_loss=0.1965, cr_loss=0.4118, over 20065.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1823, cr_loss=0.3942, over 4083645.51 frames. ], batch size: 80, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 22:01:03,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-14 22:01:10,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 1.992e+02 2.124e+02 2.302e+02 3.184e+02, threshold=4.248e+02, percent-clipped=0.0 2024-09-14 22:01:29,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190334.83333333334, ans=0.0 2024-09-14 22:01:57,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-09-14 22:02:09,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190419.83333333334, ans=0.1 2024-09-14 22:02:10,272 INFO [train.py:1198] (1/2) Epoch 11, batch 3300, loss[loss=0.2915, ctc_loss=0.2046, cr_loss=0.4343, over 20933.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.184, cr_loss=0.3955, over 4071784.69 frames. ], batch size: 60, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 22:02:12,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2024-09-14 22:02:22,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=190419.83333333334, ans=0.125 2024-09-14 22:02:51,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=190476.5, ans=0.04949747468305833 2024-09-14 22:03:05,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=190504.83333333334, ans=0.125 2024-09-14 22:03:06,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190504.83333333334, ans=0.1 2024-09-14 22:03:22,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=190533.16666666666, ans=0.125 2024-09-14 22:03:26,681 INFO [train.py:1198] (1/2) Epoch 11, batch 3350, loss[loss=0.2786, ctc_loss=0.1948, cr_loss=0.4191, over 21029.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.184, cr_loss=0.3954, over 4083758.19 frames. ], batch size: 62, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:03:44,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.145e+02 2.309e+02 2.665e+02 4.687e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-14 22:04:00,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-14 22:04:30,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-14 22:04:41,917 INFO [train.py:1198] (1/2) Epoch 11, batch 3400, loss[loss=0.2793, ctc_loss=0.1945, cr_loss=0.4241, over 21035.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1836, cr_loss=0.3963, over 4096360.62 frames. ], batch size: 63, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:04:42,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=190703.16666666666, ans=0.04949747468305833 2024-09-14 22:05:12,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-09-14 22:05:16,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190759.83333333334, ans=0.1 2024-09-14 22:05:21,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=190759.83333333334, ans=0.125 2024-09-14 22:05:27,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=190788.16666666666, ans=0.125 2024-09-14 22:05:47,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-09-14 22:05:56,893 INFO [train.py:1198] (1/2) Epoch 11, batch 3450, loss[loss=0.2604, ctc_loss=0.1829, cr_loss=0.3872, over 20642.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1843, cr_loss=0.3971, over 4094364.03 frames. ], batch size: 71, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:06:17,701 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.076e+02 2.245e+02 2.566e+02 4.508e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-14 22:06:22,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=190873.16666666666, ans=0.0 2024-09-14 22:06:28,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=190901.5, ans=0.025 2024-09-14 22:06:36,420 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:06:47,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190929.83333333334, ans=0.1 2024-09-14 22:07:18,378 INFO [train.py:1198] (1/2) Epoch 11, batch 3500, loss[loss=0.2681, ctc_loss=0.1852, cr_loss=0.4145, over 20760.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1842, cr_loss=0.3966, over 4092117.35 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:07:27,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=190986.5, ans=0.125 2024-09-14 22:07:53,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=191043.16666666666, ans=0.2 2024-09-14 22:08:19,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.88 vs. limit=15.0 2024-09-14 22:08:25,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191099.83333333334, ans=0.1 2024-09-14 22:08:33,677 INFO [train.py:1198] (1/2) Epoch 11, batch 3550, loss[loss=0.2763, ctc_loss=0.1913, cr_loss=0.4252, over 20679.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1842, cr_loss=0.3975, over 4100045.83 frames. ], batch size: 71, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:08:33,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=191128.16666666666, ans=0.125 2024-09-14 22:08:51,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.034e+02 2.207e+02 2.412e+02 4.031e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-14 22:08:55,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-09-14 22:09:14,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2024-09-14 22:09:27,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2024-09-14 22:09:34,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=191241.5, ans=0.025 2024-09-14 22:09:44,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=191241.5, ans=0.95 2024-09-14 22:09:49,204 INFO [train.py:1198] (1/2) Epoch 11, batch 3600, loss[loss=0.2732, ctc_loss=0.1889, cr_loss=0.4214, over 21084.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1847, cr_loss=0.397, over 4099298.87 frames. ], batch size: 59, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:09:55,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=191269.83333333334, ans=0.125 2024-09-14 22:10:07,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=191298.16666666666, ans=0.125 2024-09-14 22:10:13,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=191298.16666666666, ans=0.025 2024-09-14 22:10:15,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-14 22:10:22,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=191326.5, ans=0.125 2024-09-14 22:10:22,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-14 22:11:04,266 INFO [train.py:1198] (1/2) Epoch 11, batch 3650, loss[loss=0.2435, ctc_loss=0.168, cr_loss=0.3778, over 20970.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1844, cr_loss=0.397, over 4113892.62 frames. ], batch size: 51, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:11:15,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191411.5, ans=0.1 2024-09-14 22:11:22,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.121e+02 2.378e+02 2.732e+02 5.615e+02, threshold=4.755e+02, percent-clipped=3.0 2024-09-14 22:11:44,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191468.16666666666, ans=0.1 2024-09-14 22:11:55,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=191496.5, ans=0.0 2024-09-14 22:12:17,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=191524.83333333334, ans=0.125 2024-09-14 22:12:23,406 INFO [train.py:1198] (1/2) Epoch 11, batch 3700, loss[loss=0.292, ctc_loss=0.2076, cr_loss=0.4222, over 21019.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1841, cr_loss=0.396, over 4089292.71 frames. ], batch size: 61, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:12:52,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=191609.83333333334, ans=0.2 2024-09-14 22:13:20,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=191638.16666666666, ans=0.0 2024-09-14 22:13:20,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191638.16666666666, ans=0.125 2024-09-14 22:13:22,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=191638.16666666666, ans=0.2 2024-09-14 22:13:33,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-14 22:13:41,892 INFO [train.py:1198] (1/2) Epoch 11, batch 3750, loss[loss=0.2537, ctc_loss=0.178, cr_loss=0.3787, over 20990.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1848, cr_loss=0.3971, over 4083214.25 frames. ], batch size: 55, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:13:49,962 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:14:00,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.133e+02 2.343e+02 2.728e+02 4.551e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-14 22:14:06,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191723.16666666666, ans=0.125 2024-09-14 22:14:13,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-14 22:14:26,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.69 vs. limit=10.0 2024-09-14 22:14:47,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=22.5 2024-09-14 22:14:57,197 INFO [train.py:1198] (1/2) Epoch 11, batch 3800, loss[loss=0.243, ctc_loss=0.1707, cr_loss=0.3617, over 21062.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.185, cr_loss=0.3978, over 4082722.33 frames. ], batch size: 53, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:15:14,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=191864.83333333334, ans=0.125 2024-09-14 22:15:26,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-14 22:16:12,637 INFO [train.py:1198] (1/2) Epoch 11, batch 3850, loss[loss=0.2264, ctc_loss=0.1534, cr_loss=0.365, over 21003.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1842, cr_loss=0.3974, over 4088222.43 frames. ], batch size: 48, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:16:14,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=191978.16666666666, ans=0.0 2024-09-14 22:16:16,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2024-09-14 22:16:30,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.082e+02 2.386e+02 2.720e+02 5.191e+02, threshold=4.772e+02, percent-clipped=1.0 2024-09-14 22:16:33,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=192006.5, ans=0.0 2024-09-14 22:16:35,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=192006.5, ans=0.125 2024-09-14 22:16:35,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=192006.5, ans=0.0 2024-09-14 22:16:51,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=192034.83333333334, ans=0.0 2024-09-14 22:17:16,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=22.5 2024-09-14 22:17:29,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-14 22:17:30,769 INFO [train.py:1198] (1/2) Epoch 11, batch 3900, loss[loss=0.2374, ctc_loss=0.1628, cr_loss=0.3728, over 20889.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1839, cr_loss=0.3969, over 4095345.80 frames. ], batch size: 57, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:18:08,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=192176.5, ans=0.0 2024-09-14 22:18:18,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=22.5 2024-09-14 22:18:20,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=192204.83333333334, ans=0.125 2024-09-14 22:18:25,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192204.83333333334, ans=0.0 2024-09-14 22:18:49,266 INFO [train.py:1198] (1/2) Epoch 11, batch 3950, loss[loss=0.2337, ctc_loss=0.1597, cr_loss=0.3698, over 21040.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1839, cr_loss=0.3967, over 4104420.24 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:19:07,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.695e+02 2.031e+02 2.161e+02 2.445e+02 3.958e+02, threshold=4.322e+02, percent-clipped=0.0 2024-09-14 22:19:09,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-14 22:19:54,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=192374.83333333334, ans=0.025 2024-09-14 22:20:05,152 INFO [train.py:1198] (1/2) Epoch 11, batch 4000, loss[loss=0.3033, ctc_loss=0.2178, cr_loss=0.4272, over 21024.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1842, cr_loss=0.3976, over 4106349.63 frames. ], batch size: 61, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:20:18,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192431.5, ans=0.125 2024-09-14 22:20:28,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=192431.5, ans=22.5 2024-09-14 22:21:07,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=192516.5, ans=0.125 2024-09-14 22:21:07,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=22.5 2024-09-14 22:21:19,989 INFO [train.py:1198] (1/2) Epoch 11, batch 4050, loss[loss=0.2416, ctc_loss=0.1691, cr_loss=0.3624, over 21009.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1838, cr_loss=0.3965, over 4111868.81 frames. ], batch size: 63, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:21:23,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2024-09-14 22:21:32,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192544.83333333334, ans=0.1 2024-09-14 22:21:38,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.075e+02 2.181e+02 2.376e+02 3.176e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-14 22:21:50,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-14 22:21:52,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-14 22:22:32,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192658.16666666666, ans=0.125 2024-09-14 22:22:36,845 INFO [train.py:1198] (1/2) Epoch 11, batch 4100, loss[loss=0.2551, ctc_loss=0.1824, cr_loss=0.3635, over 20906.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1827, cr_loss=0.3949, over 4120199.45 frames. ], batch size: 57, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:23:05,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=192743.16666666666, ans=0.125 2024-09-14 22:23:20,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=192743.16666666666, ans=0.2 2024-09-14 22:23:49,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=192799.83333333334, ans=0.025 2024-09-14 22:23:54,809 INFO [train.py:1198] (1/2) Epoch 11, batch 4150, loss[loss=0.2709, ctc_loss=0.188, cr_loss=0.4148, over 21036.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1825, cr_loss=0.395, over 4122055.91 frames. ], batch size: 62, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:24:12,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.063e+02 2.243e+02 2.473e+02 4.794e+02, threshold=4.485e+02, percent-clipped=1.0 2024-09-14 22:24:28,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=192884.83333333334, ans=0.125 2024-09-14 22:25:13,701 INFO [train.py:1198] (1/2) Epoch 11, batch 4200, loss[loss=0.2646, ctc_loss=0.1854, cr_loss=0.3956, over 20794.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1816, cr_loss=0.3935, over 4121184.25 frames. ], batch size: 53, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:25:18,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=192969.83333333334, ans=0.125 2024-09-14 22:25:27,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=192998.16666666666, ans=0.5 2024-09-14 22:25:30,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=192998.16666666666, ans=0.0 2024-09-14 22:25:51,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=193026.5, ans=0.0 2024-09-14 22:25:59,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=193054.83333333334, ans=0.0 2024-09-14 22:26:09,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=193054.83333333334, ans=0.125 2024-09-14 22:26:28,301 INFO [train.py:1198] (1/2) Epoch 11, batch 4250, loss[loss=0.2901, ctc_loss=0.2019, cr_loss=0.4407, over 20068.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1821, cr_loss=0.3949, over 4112545.52 frames. ], batch size: 80, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:26:47,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.033e+02 2.225e+02 2.442e+02 5.640e+02, threshold=4.450e+02, percent-clipped=1.0 2024-09-14 22:27:22,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=193196.5, ans=0.125 2024-09-14 22:27:43,891 INFO [train.py:1198] (1/2) Epoch 11, batch 4300, loss[loss=0.238, ctc_loss=0.1642, cr_loss=0.3689, over 19978.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1824, cr_loss=0.3951, over 4116614.54 frames. ], batch size: 44, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:28:36,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=193338.16666666666, ans=0.125 2024-09-14 22:28:50,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=193366.5, ans=0.5 2024-09-14 22:28:56,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=193366.5, ans=0.125 2024-09-14 22:28:59,146 INFO [train.py:1198] (1/2) Epoch 11, batch 4350, loss[loss=0.2633, ctc_loss=0.1826, cr_loss=0.4038, over 20679.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.183, cr_loss=0.396, over 4109109.69 frames. ], batch size: 71, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:29:20,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=193423.16666666666, ans=0.125 2024-09-14 22:29:21,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.059e+02 2.250e+02 2.557e+02 4.118e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-14 22:29:40,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=193451.5, ans=0.07 2024-09-14 22:29:52,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=193479.83333333334, ans=10.0 2024-09-14 22:30:20,480 INFO [train.py:1198] (1/2) Epoch 11, batch 4400, loss[loss=0.2627, ctc_loss=0.1796, cr_loss=0.4153, over 20772.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1825, cr_loss=0.3958, over 4120471.67 frames. ], batch size: 53, lr: 7.60e-03, grad_scale: 32.0 2024-09-14 22:31:36,549 INFO [train.py:1198] (1/2) Epoch 11, batch 4450, loss[loss=0.2715, ctc_loss=0.1901, cr_loss=0.407, over 20887.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1828, cr_loss=0.3963, over 4121243.77 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:31:55,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.076e+02 2.170e+02 2.410e+02 3.388e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-14 22:32:08,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=193734.83333333334, ans=0.0 2024-09-14 22:32:11,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=193734.83333333334, ans=0.0 2024-09-14 22:32:15,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-14 22:32:51,468 INFO [train.py:1198] (1/2) Epoch 11, batch 4500, loss[loss=0.2925, ctc_loss=0.2066, cr_loss=0.4295, over 20675.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1836, cr_loss=0.397, over 4107578.21 frames. ], batch size: 66, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:32:51,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=193819.83333333334, ans=0.95 2024-09-14 22:32:54,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=193819.83333333334, ans=0.0 2024-09-14 22:33:14,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=193848.16666666666, ans=0.125 2024-09-14 22:33:19,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-14 22:33:33,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=193876.5, ans=0.2 2024-09-14 22:34:06,554 INFO [train.py:1198] (1/2) Epoch 11, batch 4550, loss[loss=0.2518, ctc_loss=0.1728, cr_loss=0.395, over 21073.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1842, cr_loss=0.3978, over 4099333.02 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:34:26,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.021e+02 2.156e+02 2.337e+02 5.025e+02, threshold=4.313e+02, percent-clipped=1.0 2024-09-14 22:35:24,163 INFO [train.py:1198] (1/2) Epoch 11, batch 4600, loss[loss=0.2519, ctc_loss=0.1762, cr_loss=0.3781, over 20904.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1838, cr_loss=0.3976, over 4098791.18 frames. ], batch size: 54, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:35:43,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=194131.5, ans=0.0 2024-09-14 22:36:14,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=22.5 2024-09-14 22:36:41,921 INFO [train.py:1198] (1/2) Epoch 11, batch 4650, loss[loss=0.272, ctc_loss=0.1884, cr_loss=0.4184, over 20779.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1844, cr_loss=0.3983, over 4093841.25 frames. ], batch size: 53, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:36:52,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194244.83333333334, ans=0.1 2024-09-14 22:37:01,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.092e+02 2.261e+02 2.510e+02 4.190e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-14 22:37:05,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2024-09-14 22:37:56,999 INFO [train.py:1198] (1/2) Epoch 11, batch 4700, loss[loss=0.3369, ctc_loss=0.2508, cr_loss=0.4307, over 13940.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1833, cr_loss=0.3961, over 4089209.42 frames. ], batch size: 150, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:38:49,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=194471.5, ans=0.025 2024-09-14 22:39:02,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2024-09-14 22:39:12,308 INFO [train.py:1198] (1/2) Epoch 11, batch 4750, loss[loss=0.2455, ctc_loss=0.1666, cr_loss=0.3949, over 20785.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1826, cr_loss=0.3949, over 4097653.59 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:39:14,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=194528.16666666666, ans=0.95 2024-09-14 22:39:20,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=194528.16666666666, ans=0.125 2024-09-14 22:39:31,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.095e+02 2.360e+02 2.671e+02 3.932e+02, threshold=4.721e+02, percent-clipped=0.0 2024-09-14 22:39:49,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194584.83333333334, ans=0.125 2024-09-14 22:39:59,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194613.16666666666, ans=0.1 2024-09-14 22:40:31,069 INFO [train.py:1198] (1/2) Epoch 11, batch 4800, loss[loss=0.2837, ctc_loss=0.1991, cr_loss=0.4227, over 20671.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1825, cr_loss=0.3946, over 4089509.01 frames. ], batch size: 66, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:40:37,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=194669.83333333334, ans=0.125 2024-09-14 22:40:38,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194669.83333333334, ans=0.125 2024-09-14 22:40:51,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-14 22:40:55,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=194698.16666666666, ans=0.2 2024-09-14 22:41:08,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-09-14 22:41:44,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-14 22:41:49,686 INFO [train.py:1198] (1/2) Epoch 11, batch 4850, loss[loss=0.235, ctc_loss=0.1621, cr_loss=0.3645, over 20861.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1826, cr_loss=0.3946, over 4073027.38 frames. ], batch size: 57, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:42:09,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.035e+02 2.137e+02 2.381e+02 4.678e+02, threshold=4.274e+02, percent-clipped=0.0 2024-09-14 22:42:44,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-14 22:43:05,483 INFO [train.py:1198] (1/2) Epoch 11, batch 4900, loss[loss=0.2395, ctc_loss=0.1649, cr_loss=0.3731, over 21081.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1824, cr_loss=0.395, over 4083913.12 frames. ], batch size: 59, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:43:34,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195009.83333333334, ans=0.1 2024-09-14 22:43:35,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-09-14 22:44:02,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=195038.16666666666, ans=0.125 2024-09-14 22:44:06,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=195066.5, ans=0.0 2024-09-14 22:44:07,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=195066.5, ans=0.125 2024-09-14 22:44:20,301 INFO [train.py:1198] (1/2) Epoch 11, batch 4950, loss[loss=0.2692, ctc_loss=0.1861, cr_loss=0.4155, over 21026.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1817, cr_loss=0.3941, over 4095271.25 frames. ], batch size: 63, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:44:34,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195123.16666666666, ans=0.1 2024-09-14 22:44:39,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.117e+02 2.305e+02 2.605e+02 3.873e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-14 22:45:16,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-14 22:45:22,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=195208.16666666666, ans=0.0 2024-09-14 22:45:29,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=195208.16666666666, ans=0.2 2024-09-14 22:45:34,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.54 vs. limit=10.0 2024-09-14 22:45:35,052 INFO [train.py:1198] (1/2) Epoch 11, batch 5000, loss[loss=0.2276, ctc_loss=0.1587, cr_loss=0.3444, over 21073.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1804, cr_loss=0.3927, over 4106339.02 frames. ], batch size: 53, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:46:20,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=195321.5, ans=10.0 2024-09-14 22:46:29,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-14 22:46:49,002 INFO [train.py:1198] (1/2) Epoch 11, batch 5050, loss[loss=0.2238, ctc_loss=0.155, cr_loss=0.344, over 20941.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1807, cr_loss=0.393, over 4103247.37 frames. ], batch size: 49, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:47:05,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-09-14 22:47:08,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.103e+02 2.237e+02 2.505e+02 5.290e+02, threshold=4.475e+02, percent-clipped=1.0 2024-09-14 22:47:40,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=15.0 2024-09-14 22:48:06,014 INFO [train.py:1198] (1/2) Epoch 11, batch 5100, loss[loss=0.2673, ctc_loss=0.1857, cr_loss=0.4081, over 21020.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1795, cr_loss=0.3919, over 4110537.02 frames. ], batch size: 63, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:49:08,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=195633.16666666666, ans=0.0 2024-09-14 22:49:19,270 INFO [train.py:1198] (1/2) Epoch 11, batch 5150, loss[loss=0.2792, ctc_loss=0.198, cr_loss=0.406, over 20946.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1811, cr_loss=0.3933, over 4111328.04 frames. ], batch size: 60, lr: 7.55e-03, grad_scale: 32.0 2024-09-14 22:49:36,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=195689.83333333334, ans=0.125 2024-09-14 22:49:38,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.067e+02 2.222e+02 2.445e+02 7.397e+02, threshold=4.443e+02, percent-clipped=1.0 2024-09-14 22:49:44,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=195689.83333333334, ans=0.1 2024-09-14 22:50:01,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=195718.16666666666, ans=0.125 2024-09-14 22:50:11,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=195746.5, ans=0.04949747468305833 2024-09-14 22:50:30,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=195774.83333333334, ans=0.125 2024-09-14 22:50:36,769 INFO [train.py:1198] (1/2) Epoch 11, batch 5200, loss[loss=0.3426, ctc_loss=0.2572, cr_loss=0.4272, over 14106.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1821, cr_loss=0.3932, over 4082950.62 frames. ], batch size: 150, lr: 7.55e-03, grad_scale: 32.0 2024-09-14 22:51:40,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195916.5, ans=0.1 2024-09-14 22:51:50,782 INFO [train.py:1198] (1/2) Epoch 11, batch 5250, loss[loss=0.2561, ctc_loss=0.177, cr_loss=0.3957, over 21042.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1824, cr_loss=0.3939, over 4086356.13 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2024-09-14 22:52:11,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.049e+02 2.194e+02 2.440e+02 5.103e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-14 22:52:22,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=196001.5, ans=0.125 2024-09-14 22:52:29,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=196001.5, ans=0.2 2024-09-14 22:52:31,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2024-09-14 22:53:05,857 INFO [train.py:1198] (1/2) Epoch 11, batch 5300, loss[loss=0.2359, ctc_loss=0.1641, cr_loss=0.3587, over 20945.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1828, cr_loss=0.395, over 4094122.24 frames. ], batch size: 48, lr: 7.55e-03, grad_scale: 16.0 2024-09-14 22:53:11,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196086.5, ans=0.1 2024-09-14 22:53:25,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=196114.83333333334, ans=0.125 2024-09-14 22:53:26,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196114.83333333334, ans=0.0 2024-09-14 22:53:29,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=196114.83333333334, ans=0.025 2024-09-14 22:53:37,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2024-09-14 22:54:08,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-14 22:54:19,790 INFO [train.py:1198] (1/2) Epoch 11, batch 5350, loss[loss=0.255, ctc_loss=0.1763, cr_loss=0.3935, over 20962.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.183, cr_loss=0.3963, over 4099696.41 frames. ], batch size: 55, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:54:20,273 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:54:27,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2024-09-14 22:54:27,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-09-14 22:54:29,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-14 22:54:40,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.012e+02 2.142e+02 2.285e+02 4.157e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-14 22:54:41,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=196256.5, ans=0.125 2024-09-14 22:55:26,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196341.5, ans=0.0 2024-09-14 22:55:28,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=196341.5, ans=0.125 2024-09-14 22:55:33,562 INFO [train.py:1198] (1/2) Epoch 11, batch 5400, loss[loss=0.262, ctc_loss=0.1821, cr_loss=0.3994, over 20891.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1834, cr_loss=0.3968, over 4102631.85 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:56:47,786 INFO [train.py:1198] (1/2) Epoch 11, batch 5450, loss[loss=0.2955, ctc_loss=0.2098, cr_loss=0.4286, over 18595.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1842, cr_loss=0.3981, over 4097911.63 frames. ], batch size: 108, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:56:52,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=196511.5, ans=0.125 2024-09-14 22:56:52,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-14 22:57:01,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=196539.83333333334, ans=0.125 2024-09-14 22:57:11,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.050e+02 2.185e+02 2.424e+02 3.814e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-14 22:58:04,564 INFO [train.py:1198] (1/2) Epoch 11, batch 5500, loss[loss=0.2405, ctc_loss=0.1659, cr_loss=0.3728, over 20859.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1844, cr_loss=0.3983, over 4082516.29 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:58:05,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-09-14 22:58:40,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=196709.83333333334, ans=0.0 2024-09-14 22:59:08,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=196766.5, ans=22.5 2024-09-14 22:59:15,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=196766.5, ans=0.125 2024-09-14 22:59:21,254 INFO [train.py:1198] (1/2) Epoch 11, batch 5550, loss[loss=0.2764, ctc_loss=0.1919, cr_loss=0.422, over 20936.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1833, cr_loss=0.3971, over 4087937.75 frames. ], batch size: 64, lr: 7.53e-03, grad_scale: 16.0 2024-09-14 22:59:26,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=196794.83333333334, ans=0.09899494936611666 2024-09-14 22:59:41,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.051e+02 2.210e+02 2.430e+02 4.499e+02, threshold=4.420e+02, percent-clipped=1.0 2024-09-14 22:59:44,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-14 22:59:54,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=196851.5, ans=0.0 2024-09-14 23:00:05,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-14 23:00:10,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=196879.83333333334, ans=0.025 2024-09-14 23:00:16,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=196879.83333333334, ans=0.125 2024-09-14 23:00:19,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=196908.16666666666, ans=0.125 2024-09-14 23:00:25,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=196908.16666666666, ans=0.125 2024-09-14 23:00:32,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196908.16666666666, ans=0.0 2024-09-14 23:00:35,302 INFO [train.py:1198] (1/2) Epoch 11, batch 5600, loss[loss=0.2831, ctc_loss=0.2069, cr_loss=0.381, over 18086.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1836, cr_loss=0.3972, over 4091128.69 frames. ], batch size: 108, lr: 7.53e-03, grad_scale: 32.0 2024-09-14 23:00:44,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196936.5, ans=0.0 2024-09-14 23:00:50,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=196964.83333333334, ans=0.025 2024-09-14 23:00:52,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=196964.83333333334, ans=0.035 2024-09-14 23:00:53,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=196964.83333333334, ans=0.2 2024-09-14 23:01:01,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=196964.83333333334, ans=0.2 2024-09-14 23:01:26,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=197021.5, ans=0.2 2024-09-14 23:01:49,567 INFO [train.py:1198] (1/2) Epoch 11, batch 5650, loss[loss=0.2723, ctc_loss=0.1871, cr_loss=0.4259, over 20968.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1831, cr_loss=0.396, over 4085822.50 frames. ], batch size: 58, lr: 7.53e-03, grad_scale: 16.0 2024-09-14 23:01:52,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197078.16666666666, ans=0.1 2024-09-14 23:02:11,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.693e+02 2.041e+02 2.164e+02 2.431e+02 3.749e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-14 23:02:12,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=197106.5, ans=0.125 2024-09-14 23:02:14,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=197106.5, ans=0.125 2024-09-14 23:02:25,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197134.83333333334, ans=0.125 2024-09-14 23:02:44,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=197163.16666666666, ans=0.0 2024-09-14 23:02:49,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=197191.5, ans=0.2 2024-09-14 23:02:52,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=197191.5, ans=0.04949747468305833 2024-09-14 23:02:54,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197191.5, ans=0.1 2024-09-14 23:03:04,343 INFO [train.py:1198] (1/2) Epoch 11, batch 5700, loss[loss=0.2685, ctc_loss=0.1872, cr_loss=0.4067, over 20968.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.183, cr_loss=0.3961, over 4091286.39 frames. ], batch size: 64, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:03:28,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=197248.16666666666, ans=0.0 2024-09-14 23:03:53,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=197304.83333333334, ans=0.0 2024-09-14 23:03:59,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=22.5 2024-09-14 23:04:11,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197333.16666666666, ans=0.125 2024-09-14 23:04:18,501 INFO [train.py:1198] (1/2) Epoch 11, batch 5750, loss[loss=0.3324, ctc_loss=0.2424, cr_loss=0.45, over 18508.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1835, cr_loss=0.3966, over 4084863.94 frames. ], batch size: 108, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:04:29,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=197361.5, ans=0.0 2024-09-14 23:04:40,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.036e+02 2.196e+02 2.468e+02 8.245e+02, threshold=4.392e+02, percent-clipped=1.0 2024-09-14 23:04:41,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=197389.83333333334, ans=0.95 2024-09-14 23:04:46,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-14 23:05:01,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=197446.5, ans=0.125 2024-09-14 23:05:23,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197474.83333333334, ans=0.125 2024-09-14 23:05:32,588 INFO [train.py:1198] (1/2) Epoch 11, batch 5800, loss[loss=0.3072, ctc_loss=0.221, cr_loss=0.431, over 18278.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1837, cr_loss=0.3967, over 4087979.93 frames. ], batch size: 108, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:05:48,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=197531.5, ans=10.0 2024-09-14 23:05:51,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=197531.5, ans=0.0 2024-09-14 23:06:34,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=197616.5, ans=0.025 2024-09-14 23:06:48,726 INFO [train.py:1198] (1/2) Epoch 11, batch 5850, loss[loss=0.2833, ctc_loss=0.1993, cr_loss=0.4205, over 19627.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1841, cr_loss=0.397, over 4079420.78 frames. ], batch size: 90, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:06:56,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=197644.83333333334, ans=0.0 2024-09-14 23:07:09,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197673.16666666666, ans=0.1 2024-09-14 23:07:10,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.049e+02 2.261e+02 2.520e+02 4.441e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-14 23:07:23,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197701.5, ans=0.1 2024-09-14 23:07:41,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197729.83333333334, ans=0.0 2024-09-14 23:07:41,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=197729.83333333334, ans=0.125 2024-09-14 23:07:46,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-14 23:07:59,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=197758.16666666666, ans=0.2 2024-09-14 23:08:05,398 INFO [train.py:1198] (1/2) Epoch 11, batch 5900, loss[loss=0.2641, ctc_loss=0.1865, cr_loss=0.388, over 21006.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1838, cr_loss=0.396, over 4078663.15 frames. ], batch size: 63, lr: 7.51e-03, grad_scale: 16.0 2024-09-14 23:08:08,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197786.5, ans=0.125 2024-09-14 23:08:22,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=197814.83333333334, ans=0.0 2024-09-14 23:08:26,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197814.83333333334, ans=0.1 2024-09-14 23:08:54,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=197871.5, ans=10.0 2024-09-14 23:09:00,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=197871.5, ans=0.125 2024-09-14 23:09:12,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=197899.83333333334, ans=0.2 2024-09-14 23:09:13,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=197899.83333333334, ans=0.125 2024-09-14 23:09:18,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197928.16666666666, ans=0.1 2024-09-14 23:09:19,535 INFO [train.py:1198] (1/2) Epoch 11, batch 5950, loss[loss=0.2736, ctc_loss=0.1897, cr_loss=0.4196, over 20867.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1837, cr_loss=0.3954, over 4076242.02 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 16.0 2024-09-14 23:09:27,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-14 23:09:41,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.187e+02 2.368e+02 2.660e+02 5.289e+02, threshold=4.737e+02, percent-clipped=1.0 2024-09-14 23:09:46,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=197956.5, ans=0.025 2024-09-14 23:10:27,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198041.5, ans=0.1 2024-09-14 23:10:33,211 INFO [train.py:1198] (1/2) Epoch 11, batch 6000, loss[loss=0.2443, ctc_loss=0.1705, cr_loss=0.3689, over 20928.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1824, cr_loss=0.3934, over 4088041.98 frames. ], batch size: 49, lr: 7.51e-03, grad_scale: 32.0 2024-09-14 23:10:33,211 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 23:10:54,095 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.05224, ctc_loss=0.05224, cr_loss=9.702e-15, over 944034.00 frames. 2024-09-14 23:10:54,095 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 23:12:07,828 INFO [train.py:1198] (1/2) Epoch 11, batch 6050, loss[loss=0.2632, ctc_loss=0.1809, cr_loss=0.4112, over 20776.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1817, cr_loss=0.3928, over 4093576.51 frames. ], batch size: 56, lr: 7.51e-03, grad_scale: 32.0 2024-09-14 23:12:09,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=198211.5, ans=0.2 2024-09-14 23:12:30,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.042e+02 2.233e+02 2.421e+02 4.039e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-14 23:12:50,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-14 23:13:22,620 INFO [train.py:1198] (1/2) Epoch 11, batch 6100, loss[loss=0.3125, ctc_loss=0.2204, cr_loss=0.4605, over 19468.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.182, cr_loss=0.3935, over 4096009.26 frames. ], batch size: 90, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:13:33,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=198353.16666666666, ans=0.0 2024-09-14 23:14:14,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=198438.16666666666, ans=0.125 2024-09-14 23:14:31,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2024-09-14 23:14:33,777 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:14:37,852 INFO [train.py:1198] (1/2) Epoch 11, batch 6150, loss[loss=0.321, ctc_loss=0.2297, cr_loss=0.4566, over 18112.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.183, cr_loss=0.3942, over 4072876.19 frames. ], batch size: 108, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:14:56,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=198523.16666666666, ans=0.2 2024-09-14 23:15:00,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.029e+02 2.171e+02 2.361e+02 3.596e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-14 23:15:03,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=198523.16666666666, ans=0.05 2024-09-14 23:15:07,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=198551.5, ans=0.2 2024-09-14 23:15:53,426 INFO [train.py:1198] (1/2) Epoch 11, batch 6200, loss[loss=0.3213, ctc_loss=0.2387, cr_loss=0.4128, over 14061.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1824, cr_loss=0.3933, over 4065955.03 frames. ], batch size: 150, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:16:14,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=198664.83333333334, ans=0.125 2024-09-14 23:16:42,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=198721.5, ans=0.125 2024-09-14 23:17:06,034 INFO [train.py:1198] (1/2) Epoch 11, batch 6250, loss[loss=0.3224, ctc_loss=0.2382, cr_loss=0.4211, over 14114.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1841, cr_loss=0.3954, over 4032821.60 frames. ], batch size: 149, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:17:06,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-14 23:17:09,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=198778.16666666666, ans=0.0 2024-09-14 23:17:18,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=198778.16666666666, ans=0.5 2024-09-14 23:17:27,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.085e+02 2.323e+02 2.630e+02 3.401e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-14 23:17:54,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=198863.16666666666, ans=0.025 2024-09-14 23:18:18,744 INFO [train.py:1198] (1/2) Epoch 11, batch 6300, loss[loss=0.2527, ctc_loss=0.175, cr_loss=0.3885, over 20982.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1867, cr_loss=0.399, over 4014944.51 frames. ], batch size: 55, lr: 7.49e-03, grad_scale: 32.0 2024-09-14 23:18:34,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198948.16666666666, ans=0.1 2024-09-14 23:18:43,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=198948.16666666666, ans=0.125 2024-09-14 23:18:45,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-14 23:19:06,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=199004.83333333334, ans=15.0 2024-09-14 23:19:11,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=199004.83333333334, ans=0.125 2024-09-14 23:19:17,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=199033.16666666666, ans=10.0 2024-09-14 23:19:28,448 INFO [train.py:1198] (1/2) Epoch 11, batch 6350, loss[loss=0.3166, ctc_loss=0.2374, cr_loss=0.3963, over 13893.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1947, cr_loss=0.4012, over 3784898.80 frames. ], batch size: 149, lr: 7.49e-03, grad_scale: 32.0 2024-09-14 23:19:47,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=199089.83333333334, ans=0.0 2024-09-14 23:19:49,898 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.274e+02 2.540e+02 2.740e+02 3.993e+02, threshold=5.080e+02, percent-clipped=0.0 2024-09-14 23:20:12,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=199146.5, ans=0.125 2024-09-14 23:20:14,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199146.5, ans=0.1 2024-09-14 23:21:14,915 INFO [train.py:1198] (1/2) Epoch 12, batch 0, loss[loss=0.2226, ctc_loss=0.1517, cr_loss=0.3546, over 20966.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1517, cr_loss=0.3546, over 20966.00 frames. ], batch size: 49, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:21:14,915 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-14 23:21:21,475 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6300, 4.2953, 4.3044, 4.4358], device='cuda:1') 2024-09-14 23:21:29,871 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9686, 3.9106, 3.9110, 3.5965], device='cuda:1') 2024-09-14 23:21:33,501 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.0521, ctc_loss=0.0521, cr_loss=1e-14, over 944034.00 frames. 2024-09-14 23:21:33,502 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-14 23:21:33,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=199177.66666666666, ans=0.0 2024-09-14 23:22:10,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:22:12,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=199234.33333333334, ans=0.125 2024-09-14 23:22:51,657 INFO [train.py:1198] (1/2) Epoch 12, batch 50, loss[loss=0.2674, ctc_loss=0.1821, cr_loss=0.4262, over 21079.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1828, cr_loss=0.3951, over 913915.02 frames. ], batch size: 59, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:23:10,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199347.66666666666, ans=0.1 2024-09-14 23:23:13,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=199347.66666666666, ans=0.125 2024-09-14 23:23:27,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.054e+02 2.268e+02 2.482e+02 4.845e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-14 23:23:46,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-14 23:23:59,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=199432.66666666666, ans=0.125 2024-09-14 23:24:05,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=199461.0, ans=0.2 2024-09-14 23:24:06,556 INFO [train.py:1198] (1/2) Epoch 12, batch 100, loss[loss=0.2278, ctc_loss=0.1552, cr_loss=0.3628, over 19904.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1843, cr_loss=0.3968, over 1605678.36 frames. ], batch size: 44, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:24:10,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2024-09-14 23:24:25,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2024-09-14 23:24:32,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=199489.33333333334, ans=0.025 2024-09-14 23:25:24,606 INFO [train.py:1198] (1/2) Epoch 12, batch 150, loss[loss=0.2508, ctc_loss=0.1721, cr_loss=0.3932, over 21023.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1836, cr_loss=0.3972, over 2152658.12 frames. ], batch size: 63, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:25:50,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-14 23:26:00,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.665e+02 2.051e+02 2.164e+02 2.318e+02 3.095e+02, threshold=4.327e+02, percent-clipped=0.0 2024-09-14 23:26:11,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=199687.66666666666, ans=0.0 2024-09-14 23:26:18,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=199687.66666666666, ans=0.025 2024-09-14 23:26:32,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=199716.0, ans=0.2 2024-09-14 23:26:39,416 INFO [train.py:1198] (1/2) Epoch 12, batch 200, loss[loss=0.2304, ctc_loss=0.1547, cr_loss=0.3784, over 20997.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1823, cr_loss=0.3932, over 2570535.28 frames. ], batch size: 61, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:27:55,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=199857.66666666666, ans=0.05 2024-09-14 23:27:58,192 INFO [train.py:1198] (1/2) Epoch 12, batch 250, loss[loss=0.2604, ctc_loss=0.1848, cr_loss=0.3783, over 20704.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1825, cr_loss=0.3929, over 2895977.39 frames. ], batch size: 71, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:27:58,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=199886.0, ans=0.0 2024-09-14 23:28:34,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.052e+02 2.123e+02 2.278e+02 3.665e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-14 23:28:41,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=199942.66666666666, ans=0.05 2024-09-14 23:28:51,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=199971.0, ans=0.2 2024-09-14 23:29:07,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 23:29:09,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=199999.33333333334, ans=0.2 2024-09-14 23:29:14,131 INFO [train.py:1198] (1/2) Epoch 12, batch 300, loss[loss=0.2635, ctc_loss=0.179, cr_loss=0.4222, over 20897.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1821, cr_loss=0.3937, over 3158413.73 frames. ], batch size: 54, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:30:18,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=200141.0, ans=0.2 2024-09-14 23:30:24,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=200141.0, ans=0.2 2024-09-14 23:30:33,250 INFO [train.py:1198] (1/2) Epoch 12, batch 350, loss[loss=0.2707, ctc_loss=0.1856, cr_loss=0.4256, over 20934.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1812, cr_loss=0.3939, over 3370283.85 frames. ], batch size: 60, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:30:41,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-14 23:30:52,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=200197.66666666666, ans=0.125 2024-09-14 23:30:59,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-14 23:31:05,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-09-14 23:31:08,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.048e+02 2.277e+02 2.512e+02 3.323e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-14 23:31:30,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=200254.33333333334, ans=0.125 2024-09-14 23:31:33,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-14 23:31:34,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200282.66666666666, ans=0.125 2024-09-14 23:31:41,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-14 23:31:47,888 INFO [train.py:1198] (1/2) Epoch 12, batch 400, loss[loss=0.3305, ctc_loss=0.2454, cr_loss=0.4255, over 14712.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1819, cr_loss=0.3956, over 3527913.94 frames. ], batch size: 150, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:33:02,969 INFO [train.py:1198] (1/2) Epoch 12, batch 450, loss[loss=0.2314, ctc_loss=0.1575, cr_loss=0.3699, over 20960.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1816, cr_loss=0.3956, over 3652654.54 frames. ], batch size: 51, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:33:41,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=200509.33333333334, ans=0.0 2024-09-14 23:33:42,257 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.085e+02 2.237e+02 2.396e+02 3.420e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-14 23:34:21,393 INFO [train.py:1198] (1/2) Epoch 12, batch 500, loss[loss=0.2516, ctc_loss=0.1731, cr_loss=0.3927, over 20908.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1814, cr_loss=0.3954, over 3750499.06 frames. ], batch size: 54, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:34:30,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=200594.33333333334, ans=0.125 2024-09-14 23:34:37,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-14 23:34:38,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-14 23:34:48,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200622.66666666666, ans=0.125 2024-09-14 23:35:36,066 INFO [train.py:1198] (1/2) Epoch 12, batch 550, loss[loss=0.2753, ctc_loss=0.1949, cr_loss=0.402, over 20839.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1811, cr_loss=0.3955, over 3827832.30 frames. ], batch size: 65, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:36:11,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=200792.66666666666, ans=0.125 2024-09-14 23:36:15,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.011e+02 2.141e+02 2.400e+02 3.199e+02, threshold=4.282e+02, percent-clipped=0.0 2024-09-14 23:36:54,619 INFO [train.py:1198] (1/2) Epoch 12, batch 600, loss[loss=0.3021, ctc_loss=0.2105, cr_loss=0.4584, over 20261.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1805, cr_loss=0.3953, over 3896114.09 frames. ], batch size: 74, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:37:11,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=22.5 2024-09-14 23:37:43,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.98 vs. limit=10.0 2024-09-14 23:38:06,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2024-09-14 23:38:09,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-14 23:38:10,065 INFO [train.py:1198] (1/2) Epoch 12, batch 650, loss[loss=0.2777, ctc_loss=0.1954, cr_loss=0.4117, over 20381.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1804, cr_loss=0.3945, over 3943535.41 frames. ], batch size: 74, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:38:36,244 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:38:46,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.028e+02 2.221e+02 2.579e+02 3.998e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-14 23:38:48,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-09-14 23:39:20,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-14 23:39:28,729 INFO [train.py:1198] (1/2) Epoch 12, batch 700, loss[loss=0.241, ctc_loss=0.1687, cr_loss=0.3616, over 19934.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1815, cr_loss=0.3952, over 3969043.15 frames. ], batch size: 44, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:39:38,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=201161.0, ans=0.125 2024-09-14 23:39:48,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=201189.33333333334, ans=0.0 2024-09-14 23:40:13,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201246.0, ans=0.125 2024-09-14 23:40:26,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201246.0, ans=0.1 2024-09-14 23:40:44,474 INFO [train.py:1198] (1/2) Epoch 12, batch 750, loss[loss=0.2411, ctc_loss=0.1633, cr_loss=0.3889, over 20988.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1811, cr_loss=0.3949, over 4000486.55 frames. ], batch size: 48, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:41:03,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=201331.0, ans=0.2 2024-09-14 23:41:07,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=201331.0, ans=0.125 2024-09-14 23:41:07,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=201331.0, ans=0.125 2024-09-14 23:41:20,640 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.117e+02 2.254e+02 2.486e+02 3.486e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-14 23:41:58,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=201416.0, ans=0.125 2024-09-14 23:42:02,815 INFO [train.py:1198] (1/2) Epoch 12, batch 800, loss[loss=0.2792, ctc_loss=0.1926, cr_loss=0.4331, over 20820.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1815, cr_loss=0.3956, over 4022523.42 frames. ], batch size: 59, lr: 7.13e-03, grad_scale: 32.0 2024-09-14 23:42:13,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=201444.33333333334, ans=0.2 2024-09-14 23:42:27,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-14 23:42:48,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=201529.33333333334, ans=0.2 2024-09-14 23:43:07,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=201557.66666666666, ans=0.0 2024-09-14 23:43:09,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201557.66666666666, ans=0.125 2024-09-14 23:43:17,831 INFO [train.py:1198] (1/2) Epoch 12, batch 850, loss[loss=0.2625, ctc_loss=0.1839, cr_loss=0.393, over 21087.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1807, cr_loss=0.3948, over 4048735.01 frames. ], batch size: 59, lr: 7.13e-03, grad_scale: 32.0 2024-09-14 23:43:53,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-14 23:43:55,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.028e+02 2.126e+02 2.326e+02 4.010e+02, threshold=4.252e+02, percent-clipped=0.0 2024-09-14 23:44:03,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201671.0, ans=0.1 2024-09-14 23:44:08,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201671.0, ans=0.125 2024-09-14 23:44:15,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-14 23:44:17,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-14 23:44:32,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=201727.66666666666, ans=0.125 2024-09-14 23:44:33,270 INFO [train.py:1198] (1/2) Epoch 12, batch 900, loss[loss=0.2314, ctc_loss=0.1579, cr_loss=0.3675, over 20955.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1811, cr_loss=0.3956, over 4070208.61 frames. ], batch size: 50, lr: 7.13e-03, grad_scale: 16.0 2024-09-14 23:44:38,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=201727.66666666666, ans=0.125 2024-09-14 23:44:39,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=201727.66666666666, ans=0.0 2024-09-14 23:44:45,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201727.66666666666, ans=0.125 2024-09-14 23:44:53,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-14 23:44:59,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=201756.0, ans=0.125 2024-09-14 23:45:04,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=201756.0, ans=0.125 2024-09-14 23:45:13,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=12.0 2024-09-14 23:45:34,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-14 23:45:51,685 INFO [train.py:1198] (1/2) Epoch 12, batch 950, loss[loss=0.2994, ctc_loss=0.2103, cr_loss=0.4457, over 20046.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.18, cr_loss=0.3936, over 4081448.09 frames. ], batch size: 80, lr: 7.13e-03, grad_scale: 16.0 2024-09-14 23:46:16,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=201897.66666666666, ans=0.125 2024-09-14 23:46:16,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=12.0 2024-09-14 23:46:29,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.059e+02 2.168e+02 2.366e+02 3.474e+02, threshold=4.336e+02, percent-clipped=0.0 2024-09-14 23:46:43,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201954.33333333334, ans=0.125 2024-09-14 23:46:58,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2024-09-14 23:47:06,723 INFO [train.py:1198] (1/2) Epoch 12, batch 1000, loss[loss=0.3156, ctc_loss=0.2199, cr_loss=0.4782, over 20952.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3948, over 4089136.83 frames. ], batch size: 64, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:47:17,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=202011.0, ans=0.125 2024-09-14 23:48:10,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202124.33333333334, ans=0.1 2024-09-14 23:48:24,971 INFO [train.py:1198] (1/2) Epoch 12, batch 1050, loss[loss=0.2669, ctc_loss=0.1868, cr_loss=0.4003, over 20832.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1803, cr_loss=0.3947, over 4104438.02 frames. ], batch size: 59, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:48:46,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-09-14 23:49:02,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.055e+02 2.178e+02 2.434e+02 4.992e+02, threshold=4.356e+02, percent-clipped=1.0 2024-09-14 23:49:39,785 INFO [train.py:1198] (1/2) Epoch 12, batch 1100, loss[loss=0.2579, ctc_loss=0.1761, cr_loss=0.4094, over 20847.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1804, cr_loss=0.3939, over 4103411.70 frames. ], batch size: 59, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:49:52,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=202294.33333333334, ans=0.0 2024-09-14 23:50:17,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=202351.0, ans=0.125 2024-09-14 23:50:57,992 INFO [train.py:1198] (1/2) Epoch 12, batch 1150, loss[loss=0.2601, ctc_loss=0.1766, cr_loss=0.4177, over 20796.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.18, cr_loss=0.3939, over 4101122.91 frames. ], batch size: 53, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:50:58,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-14 23:51:17,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=202464.33333333334, ans=0.05 2024-09-14 23:51:23,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202464.33333333334, ans=0.1 2024-09-14 23:51:31,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=202492.66666666666, ans=0.025 2024-09-14 23:51:35,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.071e+02 2.230e+02 2.568e+02 3.703e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-14 23:51:38,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=202492.66666666666, ans=0.0 2024-09-14 23:51:46,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202521.0, ans=0.1 2024-09-14 23:51:51,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=202521.0, ans=0.125 2024-09-14 23:52:11,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202549.33333333334, ans=0.125 2024-09-14 23:52:13,615 INFO [train.py:1198] (1/2) Epoch 12, batch 1200, loss[loss=0.2894, ctc_loss=0.2058, cr_loss=0.4182, over 18270.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1796, cr_loss=0.3925, over 4107166.86 frames. ], batch size: 108, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:52:26,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202577.66666666666, ans=0.1 2024-09-14 23:52:29,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=202606.0, ans=0.05 2024-09-14 23:53:32,084 INFO [train.py:1198] (1/2) Epoch 12, batch 1250, loss[loss=0.2661, ctc_loss=0.188, cr_loss=0.3904, over 21022.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1807, cr_loss=0.394, over 4102379.01 frames. ], batch size: 62, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:54:04,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=202776.0, ans=0.125 2024-09-14 23:54:10,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.054e+02 2.172e+02 2.318e+02 4.479e+02, threshold=4.343e+02, percent-clipped=1.0 2024-09-14 23:54:39,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=202832.66666666666, ans=0.0 2024-09-14 23:54:47,852 INFO [train.py:1198] (1/2) Epoch 12, batch 1300, loss[loss=0.2445, ctc_loss=0.1687, cr_loss=0.379, over 20980.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1824, cr_loss=0.396, over 4090942.09 frames. ], batch size: 52, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:55:25,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=202917.66666666666, ans=0.125 2024-09-14 23:55:27,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2024-09-14 23:56:03,977 INFO [train.py:1198] (1/2) Epoch 12, batch 1350, loss[loss=0.213, ctc_loss=0.1467, cr_loss=0.3315, over 19896.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1812, cr_loss=0.3945, over 4096930.61 frames. ], batch size: 44, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:56:15,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=203002.66666666666, ans=0.0 2024-09-14 23:56:44,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.112e+02 2.313e+02 2.522e+02 3.662e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-14 23:56:53,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-14 23:57:22,840 INFO [train.py:1198] (1/2) Epoch 12, batch 1400, loss[loss=0.2371, ctc_loss=0.1643, cr_loss=0.3639, over 21067.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1814, cr_loss=0.3952, over 4099828.69 frames. ], batch size: 53, lr: 7.10e-03, grad_scale: 32.0 2024-09-14 23:57:42,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203172.66666666666, ans=0.1 2024-09-14 23:57:55,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=203201.0, ans=0.125 2024-09-14 23:58:38,415 INFO [train.py:1198] (1/2) Epoch 12, batch 1450, loss[loss=0.2857, ctc_loss=0.1959, cr_loss=0.4491, over 20867.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1815, cr_loss=0.3952, over 4100060.85 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2024-09-14 23:59:13,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-14 23:59:19,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.029e+02 2.186e+02 2.387e+02 3.305e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-14 23:59:57,351 INFO [train.py:1198] (1/2) Epoch 12, batch 1500, loss[loss=0.2753, ctc_loss=0.1915, cr_loss=0.419, over 20645.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1823, cr_loss=0.3963, over 4100325.83 frames. ], batch size: 68, lr: 7.10e-03, grad_scale: 32.0 2024-09-15 00:00:40,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=203512.66666666666, ans=0.125 2024-09-15 00:00:49,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=203512.66666666666, ans=0.125 2024-09-15 00:00:49,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=203512.66666666666, ans=0.025 2024-09-15 00:01:12,181 INFO [train.py:1198] (1/2) Epoch 12, batch 1550, loss[loss=0.2411, ctc_loss=0.1646, cr_loss=0.3821, over 20760.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1819, cr_loss=0.3956, over 4100260.23 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2024-09-15 00:01:15,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=203569.33333333334, ans=0.5 2024-09-15 00:01:28,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=203597.66666666666, ans=0.125 2024-09-15 00:01:34,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=203597.66666666666, ans=0.025 2024-09-15 00:01:34,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=203597.66666666666, ans=0.0 2024-09-15 00:01:36,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=203597.66666666666, ans=0.125 2024-09-15 00:01:42,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=203626.0, ans=0.125 2024-09-15 00:01:44,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-15 00:01:49,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.133e+02 2.264e+02 2.612e+02 3.617e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-15 00:02:25,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=203682.66666666666, ans=0.125 2024-09-15 00:02:30,966 INFO [train.py:1198] (1/2) Epoch 12, batch 1600, loss[loss=0.2198, ctc_loss=0.1476, cr_loss=0.3606, over 20957.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.181, cr_loss=0.395, over 4106405.48 frames. ], batch size: 49, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:02:32,929 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:02:58,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=203739.33333333334, ans=0.0 2024-09-15 00:03:30,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=203824.33333333334, ans=0.0 2024-09-15 00:03:31,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=203824.33333333334, ans=0.1 2024-09-15 00:03:45,916 INFO [train.py:1198] (1/2) Epoch 12, batch 1650, loss[loss=0.2349, ctc_loss=0.1572, cr_loss=0.3884, over 20976.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1811, cr_loss=0.3959, over 4109078.05 frames. ], batch size: 64, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:03:49,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=203852.66666666666, ans=0.125 2024-09-15 00:04:20,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=203909.33333333334, ans=0.0 2024-09-15 00:04:21,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-15 00:04:23,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.121e+02 2.258e+02 2.449e+02 4.454e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-15 00:04:40,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203937.66666666666, ans=0.1 2024-09-15 00:05:04,406 INFO [train.py:1198] (1/2) Epoch 12, batch 1700, loss[loss=0.2639, ctc_loss=0.1832, cr_loss=0.4033, over 19387.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1812, cr_loss=0.3964, over 4113447.05 frames. ], batch size: 90, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:05:16,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203994.33333333334, ans=0.1 2024-09-15 00:05:33,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2024-09-15 00:05:36,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204051.0, ans=0.1 2024-09-15 00:05:45,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204051.0, ans=0.125 2024-09-15 00:05:55,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=204079.33333333334, ans=0.125 2024-09-15 00:05:57,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2024-09-15 00:06:07,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=204107.66666666666, ans=0.125 2024-09-15 00:06:07,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=12.0 2024-09-15 00:06:20,318 INFO [train.py:1198] (1/2) Epoch 12, batch 1750, loss[loss=0.2414, ctc_loss=0.1684, cr_loss=0.3649, over 21078.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1811, cr_loss=0.3958, over 4114271.37 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:06:23,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=204136.0, ans=0.025 2024-09-15 00:06:46,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=204164.33333333334, ans=0.0 2024-09-15 00:06:57,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.094e+02 2.312e+02 2.586e+02 3.910e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-15 00:07:13,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=204221.0, ans=0.2 2024-09-15 00:07:38,292 INFO [train.py:1198] (1/2) Epoch 12, batch 1800, loss[loss=0.2249, ctc_loss=0.1552, cr_loss=0.3481, over 21065.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1813, cr_loss=0.3961, over 4117429.16 frames. ], batch size: 53, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:07:56,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=204306.0, ans=0.125 2024-09-15 00:08:15,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.94 vs. limit=10.0 2024-09-15 00:08:35,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=204362.66666666666, ans=0.5 2024-09-15 00:08:46,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204391.0, ans=0.125 2024-09-15 00:08:48,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=204391.0, ans=0.0 2024-09-15 00:08:53,807 INFO [train.py:1198] (1/2) Epoch 12, batch 1850, loss[loss=0.2759, ctc_loss=0.1917, cr_loss=0.4212, over 20682.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1811, cr_loss=0.3945, over 4101254.63 frames. ], batch size: 71, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:09:26,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=204476.0, ans=0.2 2024-09-15 00:09:31,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.050e+02 2.188e+02 2.385e+02 3.212e+02, threshold=4.377e+02, percent-clipped=0.0 2024-09-15 00:09:39,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=204504.33333333334, ans=0.025 2024-09-15 00:09:52,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=204504.33333333334, ans=0.125 2024-09-15 00:09:52,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=204504.33333333334, ans=0.04949747468305833 2024-09-15 00:10:07,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2024-09-15 00:10:10,081 INFO [train.py:1198] (1/2) Epoch 12, batch 1900, loss[loss=0.2534, ctc_loss=0.1761, cr_loss=0.3865, over 20974.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1803, cr_loss=0.3935, over 4110706.84 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:10:18,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=204561.0, ans=0.125 2024-09-15 00:10:33,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=204589.33333333334, ans=0.2 2024-09-15 00:10:53,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=204617.66666666666, ans=0.035 2024-09-15 00:11:03,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=204646.0, ans=0.0 2024-09-15 00:11:14,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204674.33333333334, ans=0.125 2024-09-15 00:11:29,586 INFO [train.py:1198] (1/2) Epoch 12, batch 1950, loss[loss=0.2689, ctc_loss=0.1865, cr_loss=0.4122, over 20881.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3945, over 4112011.65 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:11:38,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204702.66666666666, ans=0.125 2024-09-15 00:12:07,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.027e+02 2.133e+02 2.345e+02 3.140e+02, threshold=4.266e+02, percent-clipped=0.0 2024-09-15 00:12:17,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=204787.66666666666, ans=0.0 2024-09-15 00:12:20,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204787.66666666666, ans=0.1 2024-09-15 00:12:44,724 INFO [train.py:1198] (1/2) Epoch 12, batch 2000, loss[loss=0.2065, ctc_loss=0.1371, cr_loss=0.3471, over 20975.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1809, cr_loss=0.3959, over 4118570.48 frames. ], batch size: 51, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:13:16,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=204901.0, ans=0.5 2024-09-15 00:13:34,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=204929.33333333334, ans=0.125 2024-09-15 00:13:57,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-09-15 00:14:02,586 INFO [train.py:1198] (1/2) Epoch 12, batch 2050, loss[loss=0.2974, ctc_loss=0.2114, cr_loss=0.43, over 20629.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1815, cr_loss=0.397, over 4123490.39 frames. ], batch size: 66, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:14:27,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205014.33333333334, ans=0.125 2024-09-15 00:14:39,887 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.109e+02 2.263e+02 2.516e+02 3.982e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-15 00:15:17,500 INFO [train.py:1198] (1/2) Epoch 12, batch 2100, loss[loss=0.2228, ctc_loss=0.1547, cr_loss=0.3409, over 19897.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1815, cr_loss=0.3961, over 4115407.91 frames. ], batch size: 44, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:15:34,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=205156.0, ans=0.125 2024-09-15 00:15:53,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=205184.33333333334, ans=0.0 2024-09-15 00:16:16,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=205212.66666666666, ans=0.2 2024-09-15 00:16:20,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-09-15 00:16:35,125 INFO [train.py:1198] (1/2) Epoch 12, batch 2150, loss[loss=0.2634, ctc_loss=0.1802, cr_loss=0.416, over 20829.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1813, cr_loss=0.3951, over 4114977.61 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:16:41,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=205269.33333333334, ans=0.2 2024-09-15 00:16:52,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=205297.66666666666, ans=0.2 2024-09-15 00:17:12,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.124e+02 2.316e+02 2.764e+02 4.368e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-15 00:17:12,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205326.0, ans=0.1 2024-09-15 00:17:49,759 INFO [train.py:1198] (1/2) Epoch 12, batch 2200, loss[loss=0.2491, ctc_loss=0.1738, cr_loss=0.3766, over 20860.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1813, cr_loss=0.3954, over 4104869.52 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:18:04,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=22.5 2024-09-15 00:18:21,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=8.0 2024-09-15 00:19:03,198 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:19:08,734 INFO [train.py:1198] (1/2) Epoch 12, batch 2250, loss[loss=0.2459, ctc_loss=0.1711, cr_loss=0.3741, over 20987.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1821, cr_loss=0.3963, over 4095783.81 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:19:22,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=205581.0, ans=0.125 2024-09-15 00:19:46,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.049e+02 2.205e+02 2.394e+02 3.210e+02, threshold=4.410e+02, percent-clipped=0.0 2024-09-15 00:19:53,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=205637.66666666666, ans=0.125 2024-09-15 00:19:54,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=205637.66666666666, ans=0.0 2024-09-15 00:19:57,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=205637.66666666666, ans=0.125 2024-09-15 00:20:24,624 INFO [train.py:1198] (1/2) Epoch 12, batch 2300, loss[loss=0.2847, ctc_loss=0.2012, cr_loss=0.4173, over 20644.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1812, cr_loss=0.3953, over 4093166.14 frames. ], batch size: 68, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:20:38,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=205722.66666666666, ans=0.025 2024-09-15 00:20:46,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=205722.66666666666, ans=0.125 2024-09-15 00:21:39,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=205807.66666666666, ans=0.0 2024-09-15 00:21:43,040 INFO [train.py:1198] (1/2) Epoch 12, batch 2350, loss[loss=0.2627, ctc_loss=0.1823, cr_loss=0.4018, over 20831.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1815, cr_loss=0.3951, over 4098809.69 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:22:20,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.085e+02 2.225e+02 2.542e+02 4.079e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 00:22:55,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205949.33333333334, ans=0.1 2024-09-15 00:22:57,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=205977.66666666666, ans=0.0 2024-09-15 00:22:58,592 INFO [train.py:1198] (1/2) Epoch 12, batch 2400, loss[loss=0.2589, ctc_loss=0.1787, cr_loss=0.4015, over 20973.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1817, cr_loss=0.3955, over 4088418.96 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:23:29,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=206034.33333333334, ans=0.025 2024-09-15 00:23:30,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=206034.33333333334, ans=0.035 2024-09-15 00:24:14,623 INFO [train.py:1198] (1/2) Epoch 12, batch 2450, loss[loss=0.2575, ctc_loss=0.1799, cr_loss=0.3878, over 20765.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3945, over 4097872.06 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:24:16,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=206119.33333333334, ans=0.2 2024-09-15 00:24:23,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=206119.33333333334, ans=0.2 2024-09-15 00:24:31,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=206147.66666666666, ans=0.0 2024-09-15 00:24:42,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=206147.66666666666, ans=0.07 2024-09-15 00:24:49,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=206176.0, ans=0.125 2024-09-15 00:24:55,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.068e+02 2.363e+02 2.661e+02 5.748e+02, threshold=4.727e+02, percent-clipped=1.0 2024-09-15 00:25:08,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=206204.33333333334, ans=0.125 2024-09-15 00:25:31,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=206261.0, ans=0.2 2024-09-15 00:25:32,786 INFO [train.py:1198] (1/2) Epoch 12, batch 2500, loss[loss=0.2481, ctc_loss=0.1718, cr_loss=0.3816, over 21028.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1807, cr_loss=0.3946, over 4081806.73 frames. ], batch size: 62, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:26:01,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=206317.66666666666, ans=0.0 2024-09-15 00:26:28,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=22.5 2024-09-15 00:26:44,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=206374.33333333334, ans=0.1 2024-09-15 00:26:48,438 INFO [train.py:1198] (1/2) Epoch 12, batch 2550, loss[loss=0.2509, ctc_loss=0.1729, cr_loss=0.3896, over 20927.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1812, cr_loss=0.3941, over 4059399.71 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:26:56,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-15 00:27:11,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=22.5 2024-09-15 00:27:27,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=206459.33333333334, ans=0.2 2024-09-15 00:27:29,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.032e+02 2.192e+02 2.408e+02 3.885e+02, threshold=4.384e+02, percent-clipped=0.0 2024-09-15 00:27:39,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-09-15 00:27:49,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=206487.66666666666, ans=0.025 2024-09-15 00:28:07,144 INFO [train.py:1198] (1/2) Epoch 12, batch 2600, loss[loss=0.237, ctc_loss=0.1661, cr_loss=0.3545, over 20974.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1804, cr_loss=0.3935, over 4071925.83 frames. ], batch size: 51, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:28:26,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-15 00:28:27,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-09-15 00:28:30,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=206572.66666666666, ans=0.0 2024-09-15 00:28:30,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=58.56 vs. limit=15.0 2024-09-15 00:28:33,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206572.66666666666, ans=0.125 2024-09-15 00:28:48,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=206601.0, ans=0.125 2024-09-15 00:28:48,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-15 00:29:23,305 INFO [train.py:1198] (1/2) Epoch 12, batch 2650, loss[loss=0.2873, ctc_loss=0.2018, cr_loss=0.4277, over 20871.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1807, cr_loss=0.3937, over 4068435.02 frames. ], batch size: 65, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:29:53,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-15 00:30:01,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 1.967e+02 2.078e+02 2.242e+02 3.678e+02, threshold=4.157e+02, percent-clipped=0.0 2024-09-15 00:30:26,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206799.33333333334, ans=0.1 2024-09-15 00:30:41,644 INFO [train.py:1198] (1/2) Epoch 12, batch 2700, loss[loss=0.2683, ctc_loss=0.1832, cr_loss=0.4258, over 20274.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1813, cr_loss=0.3943, over 4068432.91 frames. ], batch size: 74, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:30:54,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=206827.66666666666, ans=0.0 2024-09-15 00:30:54,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=206827.66666666666, ans=0.0 2024-09-15 00:31:10,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 00:31:25,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=206912.66666666666, ans=0.0 2024-09-15 00:31:32,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=206912.66666666666, ans=0.0 2024-09-15 00:31:40,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-15 00:31:55,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206969.33333333334, ans=0.1 2024-09-15 00:31:56,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=22.5 2024-09-15 00:31:56,749 INFO [train.py:1198] (1/2) Epoch 12, batch 2750, loss[loss=0.2754, ctc_loss=0.1895, cr_loss=0.4295, over 20859.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1809, cr_loss=0.3942, over 4066731.20 frames. ], batch size: 65, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:32:07,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=206969.33333333334, ans=10.0 2024-09-15 00:32:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=206997.66666666666, ans=0.125 2024-09-15 00:32:28,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=207026.0, ans=0.09899494936611666 2024-09-15 00:32:34,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.050e+02 2.149e+02 2.309e+02 3.718e+02, threshold=4.299e+02, percent-clipped=0.0 2024-09-15 00:33:15,745 INFO [train.py:1198] (1/2) Epoch 12, batch 2800, loss[loss=0.2491, ctc_loss=0.1744, cr_loss=0.3731, over 21004.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1799, cr_loss=0.3935, over 4081628.77 frames. ], batch size: 63, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:33:19,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=207111.0, ans=0.5 2024-09-15 00:33:26,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207111.0, ans=0.1 2024-09-15 00:33:31,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=207139.33333333334, ans=0.0 2024-09-15 00:33:38,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=207139.33333333334, ans=0.125 2024-09-15 00:33:56,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=207167.66666666666, ans=0.0 2024-09-15 00:34:02,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=207196.0, ans=0.0 2024-09-15 00:34:16,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=207224.33333333334, ans=0.2 2024-09-15 00:34:22,623 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:34:31,465 INFO [train.py:1198] (1/2) Epoch 12, batch 2850, loss[loss=0.2208, ctc_loss=0.1535, cr_loss=0.3364, over 20961.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1797, cr_loss=0.3927, over 4085005.73 frames. ], batch size: 49, lr: 7.03e-03, grad_scale: 32.0 2024-09-15 00:34:49,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=207281.0, ans=0.125 2024-09-15 00:34:54,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-15 00:35:08,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.052e+02 2.191e+02 2.360e+02 5.826e+02, threshold=4.382e+02, percent-clipped=1.0 2024-09-15 00:35:10,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=207309.33333333334, ans=0.0 2024-09-15 00:35:45,880 INFO [train.py:1198] (1/2) Epoch 12, batch 2900, loss[loss=0.3042, ctc_loss=0.2209, cr_loss=0.4165, over 18160.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1808, cr_loss=0.3941, over 4084364.22 frames. ], batch size: 108, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:36:45,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=207479.33333333334, ans=0.125 2024-09-15 00:36:49,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=207507.66666666666, ans=0.125 2024-09-15 00:37:04,492 INFO [train.py:1198] (1/2) Epoch 12, batch 2950, loss[loss=0.2258, ctc_loss=0.1543, cr_loss=0.3575, over 20960.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1804, cr_loss=0.3941, over 4092332.73 frames. ], batch size: 50, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:37:42,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.060e+02 2.201e+02 2.426e+02 3.999e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-15 00:37:49,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=207621.0, ans=0.125 2024-09-15 00:37:56,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=207621.0, ans=0.125 2024-09-15 00:38:19,901 INFO [train.py:1198] (1/2) Epoch 12, batch 3000, loss[loss=0.2582, ctc_loss=0.177, cr_loss=0.4061, over 20996.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1794, cr_loss=0.3931, over 4101764.15 frames. ], batch size: 63, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:38:19,902 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 00:38:48,199 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8699, 5.8269, 5.8401, 5.3182], device='cuda:1') 2024-09-15 00:38:50,666 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.04992, ctc_loss=0.04992, cr_loss=1.002e-14, over 944034.00 frames. 2024-09-15 00:38:50,667 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 00:38:51,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207677.66666666666, ans=0.125 2024-09-15 00:39:39,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=207762.66666666666, ans=0.125 2024-09-15 00:39:42,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=207762.66666666666, ans=0.0 2024-09-15 00:39:58,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=207791.0, ans=0.2 2024-09-15 00:39:58,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207791.0, ans=0.1 2024-09-15 00:40:05,813 INFO [train.py:1198] (1/2) Epoch 12, batch 3050, loss[loss=0.2502, ctc_loss=0.1733, cr_loss=0.3843, over 21051.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1786, cr_loss=0.392, over 4111360.92 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:40:44,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.018e+02 2.174e+02 2.362e+02 3.569e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 00:41:03,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=207904.33333333334, ans=0.025 2024-09-15 00:41:20,721 INFO [train.py:1198] (1/2) Epoch 12, batch 3100, loss[loss=0.2632, ctc_loss=0.1889, cr_loss=0.3714, over 21025.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1791, cr_loss=0.3923, over 4115254.67 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:41:31,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207961.0, ans=0.125 2024-09-15 00:41:51,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=207989.33333333334, ans=0.0 2024-09-15 00:42:31,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=208074.33333333334, ans=0.09899494936611666 2024-09-15 00:42:39,349 INFO [train.py:1198] (1/2) Epoch 12, batch 3150, loss[loss=0.2575, ctc_loss=0.1786, cr_loss=0.3942, over 21011.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1794, cr_loss=0.3928, over 4115661.09 frames. ], batch size: 63, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:42:44,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208102.66666666666, ans=0.1 2024-09-15 00:42:47,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=208102.66666666666, ans=0.125 2024-09-15 00:42:47,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-15 00:43:18,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.077e+02 2.209e+02 2.364e+02 5.147e+02, threshold=4.419e+02, percent-clipped=2.0 2024-09-15 00:43:32,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=208187.66666666666, ans=0.0 2024-09-15 00:43:44,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=208216.0, ans=0.025 2024-09-15 00:43:54,941 INFO [train.py:1198] (1/2) Epoch 12, batch 3200, loss[loss=0.2581, ctc_loss=0.1783, cr_loss=0.3987, over 20986.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1794, cr_loss=0.3922, over 4099870.80 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:44:49,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=208329.33333333334, ans=0.125 2024-09-15 00:45:13,007 INFO [train.py:1198] (1/2) Epoch 12, batch 3250, loss[loss=0.2776, ctc_loss=0.1914, cr_loss=0.4309, over 20641.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1794, cr_loss=0.3929, over 4099514.94 frames. ], batch size: 66, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:45:13,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=208386.0, ans=0.035 2024-09-15 00:45:45,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208442.66666666666, ans=0.0 2024-09-15 00:45:52,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.024e+02 2.214e+02 2.411e+02 3.904e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-15 00:45:55,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=208442.66666666666, ans=0.125 2024-09-15 00:46:03,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=208471.0, ans=0.0 2024-09-15 00:46:28,701 INFO [train.py:1198] (1/2) Epoch 12, batch 3300, loss[loss=0.2928, ctc_loss=0.2072, cr_loss=0.4279, over 20629.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1792, cr_loss=0.392, over 4104144.97 frames. ], batch size: 66, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:47:10,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=208584.33333333334, ans=0.2 2024-09-15 00:47:13,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=208612.66666666666, ans=0.025 2024-09-15 00:47:40,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=208641.0, ans=0.125 2024-09-15 00:47:47,255 INFO [train.py:1198] (1/2) Epoch 12, batch 3350, loss[loss=0.242, ctc_loss=0.1668, cr_loss=0.376, over 20965.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1792, cr_loss=0.3918, over 4103783.05 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:47:50,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208669.33333333334, ans=0.125 2024-09-15 00:48:17,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=208726.0, ans=0.0 2024-09-15 00:48:26,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.089e+02 2.360e+02 2.670e+02 4.104e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-15 00:48:29,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=208726.0, ans=0.2 2024-09-15 00:48:38,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=208754.33333333334, ans=0.0 2024-09-15 00:48:55,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=208782.66666666666, ans=0.0 2024-09-15 00:49:02,684 INFO [train.py:1198] (1/2) Epoch 12, batch 3400, loss[loss=0.2745, ctc_loss=0.1902, cr_loss=0.4215, over 21031.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1785, cr_loss=0.3916, over 4112808.25 frames. ], batch size: 62, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:49:02,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=208811.0, ans=0.125 2024-09-15 00:49:09,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=208811.0, ans=0.0 2024-09-15 00:49:33,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=208867.66666666666, ans=0.1 2024-09-15 00:50:20,382 INFO [train.py:1198] (1/2) Epoch 12, batch 3450, loss[loss=0.2495, ctc_loss=0.1742, cr_loss=0.3768, over 21041.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1791, cr_loss=0.3916, over 4093053.20 frames. ], batch size: 62, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:50:28,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=208952.66666666666, ans=0.2 2024-09-15 00:50:59,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.051e+02 2.199e+02 2.429e+02 3.668e+02, threshold=4.398e+02, percent-clipped=0.0 2024-09-15 00:51:22,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=209066.0, ans=0.015 2024-09-15 00:51:35,722 INFO [train.py:1198] (1/2) Epoch 12, batch 3500, loss[loss=0.2887, ctc_loss=0.2042, cr_loss=0.4225, over 20039.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1795, cr_loss=0.3931, over 4098449.18 frames. ], batch size: 80, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:51:58,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209122.66666666666, ans=0.125 2024-09-15 00:52:00,331 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:52:08,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=209151.0, ans=0.125 2024-09-15 00:52:40,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-15 00:52:51,410 INFO [train.py:1198] (1/2) Epoch 12, batch 3550, loss[loss=0.2304, ctc_loss=0.1592, cr_loss=0.3562, over 19877.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1797, cr_loss=0.3934, over 4085487.22 frames. ], batch size: 44, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:53:33,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.060e+02 2.201e+02 2.397e+02 3.812e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-15 00:53:59,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-15 00:54:00,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=209349.33333333334, ans=0.09899494936611666 2024-09-15 00:54:01,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=209349.33333333334, ans=0.025 2024-09-15 00:54:08,885 INFO [train.py:1198] (1/2) Epoch 12, batch 3600, loss[loss=0.3139, ctc_loss=0.2283, cr_loss=0.4281, over 14189.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1804, cr_loss=0.3942, over 4075567.82 frames. ], batch size: 149, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:54:12,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=209377.66666666666, ans=0.125 2024-09-15 00:54:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=209406.0, ans=0.2 2024-09-15 00:55:24,050 INFO [train.py:1198] (1/2) Epoch 12, batch 3650, loss[loss=0.2702, ctc_loss=0.1927, cr_loss=0.3875, over 20205.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1801, cr_loss=0.3935, over 4075986.03 frames. ], batch size: 74, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:55:29,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=209519.33333333334, ans=0.07 2024-09-15 00:55:39,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=209547.66666666666, ans=0.0 2024-09-15 00:56:06,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.062e+02 2.222e+02 2.439e+02 4.103e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 00:56:42,766 INFO [train.py:1198] (1/2) Epoch 12, batch 3700, loss[loss=0.271, ctc_loss=0.1892, cr_loss=0.4088, over 20822.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.181, cr_loss=0.3951, over 4093710.59 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:56:49,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=209661.0, ans=0.2 2024-09-15 00:57:14,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209717.66666666666, ans=0.1 2024-09-15 00:57:22,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=209717.66666666666, ans=0.125 2024-09-15 00:57:49,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-15 00:57:53,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=209774.33333333334, ans=0.05 2024-09-15 00:57:57,963 INFO [train.py:1198] (1/2) Epoch 12, batch 3750, loss[loss=0.2862, ctc_loss=0.2025, cr_loss=0.4186, over 21045.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.18, cr_loss=0.3943, over 4102394.66 frames. ], batch size: 62, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:58:32,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=209859.33333333334, ans=0.125 2024-09-15 00:58:36,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.054e+02 2.173e+02 2.361e+02 3.984e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-15 00:58:51,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-15 00:59:03,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=209916.0, ans=0.5 2024-09-15 00:59:11,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=209916.0, ans=0.125 2024-09-15 00:59:13,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2024-09-15 00:59:15,887 INFO [train.py:1198] (1/2) Epoch 12, batch 3800, loss[loss=0.2634, ctc_loss=0.1839, cr_loss=0.3977, over 20925.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1812, cr_loss=0.3951, over 4091238.27 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:59:36,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=209972.66666666666, ans=0.0 2024-09-15 01:00:02,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=210029.33333333334, ans=0.125 2024-09-15 01:00:26,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=210057.66666666666, ans=0.05 2024-09-15 01:00:31,090 INFO [train.py:1198] (1/2) Epoch 12, batch 3850, loss[loss=0.321, ctc_loss=0.2405, cr_loss=0.4025, over 14734.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1811, cr_loss=0.3951, over 4089945.44 frames. ], batch size: 149, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 01:00:37,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210086.0, ans=0.1 2024-09-15 01:00:43,825 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:00:55,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=210114.33333333334, ans=0.125 2024-09-15 01:01:11,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.136e+02 2.330e+02 2.610e+02 4.569e+02, threshold=4.661e+02, percent-clipped=3.0 2024-09-15 01:01:31,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=12.0 2024-09-15 01:01:43,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=210199.33333333334, ans=0.0 2024-09-15 01:01:49,199 INFO [train.py:1198] (1/2) Epoch 12, batch 3900, loss[loss=0.2484, ctc_loss=0.1721, cr_loss=0.3813, over 20961.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1808, cr_loss=0.3946, over 4082788.88 frames. ], batch size: 48, lr: 6.98e-03, grad_scale: 16.0 2024-09-15 01:01:49,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=210227.66666666666, ans=0.125 2024-09-15 01:02:36,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=210312.66666666666, ans=0.2 2024-09-15 01:02:39,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=210312.66666666666, ans=0.0 2024-09-15 01:03:04,603 INFO [train.py:1198] (1/2) Epoch 12, batch 3950, loss[loss=0.2781, ctc_loss=0.1968, cr_loss=0.4069, over 20306.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1803, cr_loss=0.3939, over 4086591.46 frames. ], batch size: 74, lr: 6.98e-03, grad_scale: 16.0 2024-09-15 01:03:20,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-15 01:03:44,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=210426.0, ans=0.0 2024-09-15 01:03:45,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.013e+02 2.157e+02 2.298e+02 3.651e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 01:04:04,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=210482.66666666666, ans=0.0 2024-09-15 01:04:20,199 INFO [train.py:1198] (1/2) Epoch 12, batch 4000, loss[loss=0.2279, ctc_loss=0.157, cr_loss=0.3545, over 20958.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.179, cr_loss=0.3923, over 4095291.11 frames. ], batch size: 48, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:04:27,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=210511.0, ans=0.0 2024-09-15 01:04:33,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=210511.0, ans=0.0 2024-09-15 01:04:45,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=210539.33333333334, ans=0.125 2024-09-15 01:05:08,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-15 01:05:16,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-15 01:05:18,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-15 01:05:38,392 INFO [train.py:1198] (1/2) Epoch 12, batch 4050, loss[loss=0.2533, ctc_loss=0.1772, cr_loss=0.3803, over 20644.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1791, cr_loss=0.393, over 4091797.22 frames. ], batch size: 66, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:05:46,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=210652.66666666666, ans=0.2 2024-09-15 01:05:54,071 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:06:00,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-15 01:06:02,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=210681.0, ans=0.0 2024-09-15 01:06:19,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.126e+02 2.320e+02 2.555e+02 4.216e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-15 01:06:40,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=210766.0, ans=0.125 2024-09-15 01:06:53,699 INFO [train.py:1198] (1/2) Epoch 12, batch 4100, loss[loss=0.2606, ctc_loss=0.1819, cr_loss=0.3935, over 21024.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1796, cr_loss=0.394, over 4086113.34 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:06:58,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=210794.33333333334, ans=0.125 2024-09-15 01:07:00,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210794.33333333334, ans=0.1 2024-09-15 01:07:02,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-15 01:07:03,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=210794.33333333334, ans=0.0 2024-09-15 01:07:27,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=210851.0, ans=0.0 2024-09-15 01:07:45,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=210879.33333333334, ans=0.125 2024-09-15 01:07:48,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-15 01:08:12,256 INFO [train.py:1198] (1/2) Epoch 12, batch 4150, loss[loss=0.2825, ctc_loss=0.1966, cr_loss=0.4293, over 20626.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1795, cr_loss=0.3942, over 4091835.98 frames. ], batch size: 68, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:08:27,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-15 01:08:42,639 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:08:42,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=210992.66666666666, ans=0.125 2024-09-15 01:08:52,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.058e+02 2.176e+02 2.478e+02 4.157e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 01:08:53,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.75 vs. limit=10.0 2024-09-15 01:08:53,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-09-15 01:09:18,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=211049.33333333334, ans=0.125 2024-09-15 01:09:20,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211049.33333333334, ans=0.125 2024-09-15 01:09:27,784 INFO [train.py:1198] (1/2) Epoch 12, batch 4200, loss[loss=0.2363, ctc_loss=0.1597, cr_loss=0.3831, over 20955.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1786, cr_loss=0.393, over 4098987.96 frames. ], batch size: 50, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:09:53,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=211106.0, ans=0.0 2024-09-15 01:10:02,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=211134.33333333334, ans=0.125 2024-09-15 01:10:28,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=211162.66666666666, ans=0.0 2024-09-15 01:10:46,260 INFO [train.py:1198] (1/2) Epoch 12, batch 4250, loss[loss=0.2591, ctc_loss=0.1782, cr_loss=0.4042, over 20793.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1783, cr_loss=0.3931, over 4102939.57 frames. ], batch size: 53, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:10:51,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=211219.33333333334, ans=0.0 2024-09-15 01:10:57,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-15 01:11:04,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=211247.66666666666, ans=0.125 2024-09-15 01:11:10,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=211247.66666666666, ans=0.0 2024-09-15 01:11:18,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.99 vs. limit=10.0 2024-09-15 01:11:26,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.071e+02 2.226e+02 2.559e+02 4.154e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 01:12:00,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=211361.0, ans=0.125 2024-09-15 01:12:01,430 INFO [train.py:1198] (1/2) Epoch 12, batch 4300, loss[loss=0.2683, ctc_loss=0.1871, cr_loss=0.4058, over 21075.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1775, cr_loss=0.3921, over 4113648.53 frames. ], batch size: 59, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:13:12,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211474.33333333334, ans=0.1 2024-09-15 01:13:15,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=211474.33333333334, ans=0.025 2024-09-15 01:13:16,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=211474.33333333334, ans=0.0 2024-09-15 01:13:19,453 INFO [train.py:1198] (1/2) Epoch 12, batch 4350, loss[loss=0.3006, ctc_loss=0.2168, cr_loss=0.4193, over 18105.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1778, cr_loss=0.3933, over 4117317.88 frames. ], batch size: 108, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:13:56,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=211559.33333333334, ans=0.05 2024-09-15 01:14:00,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.054e+02 2.156e+02 2.364e+02 4.364e+02, threshold=4.311e+02, percent-clipped=0.0 2024-09-15 01:14:17,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=211587.66666666666, ans=0.0 2024-09-15 01:14:21,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=211616.0, ans=0.025 2024-09-15 01:14:35,011 INFO [train.py:1198] (1/2) Epoch 12, batch 4400, loss[loss=0.2616, ctc_loss=0.1836, cr_loss=0.3902, over 21027.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1787, cr_loss=0.394, over 4112704.42 frames. ], batch size: 62, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:14:38,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=211644.33333333334, ans=0.0 2024-09-15 01:15:04,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-15 01:15:44,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=211757.66666666666, ans=15.0 2024-09-15 01:15:53,461 INFO [train.py:1198] (1/2) Epoch 12, batch 4450, loss[loss=0.2852, ctc_loss=0.2021, cr_loss=0.4156, over 19435.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1792, cr_loss=0.3942, over 4111814.67 frames. ], batch size: 90, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:15:58,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.48 vs. limit=6.0 2024-09-15 01:16:21,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=211814.33333333334, ans=0.125 2024-09-15 01:16:26,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=211842.66666666666, ans=0.0 2024-09-15 01:16:28,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=211842.66666666666, ans=0.0 2024-09-15 01:16:31,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211842.66666666666, ans=0.1 2024-09-15 01:16:34,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.005e+02 2.145e+02 2.285e+02 3.180e+02, threshold=4.291e+02, percent-clipped=0.0 2024-09-15 01:16:37,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=211871.0, ans=0.0 2024-09-15 01:17:07,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-15 01:17:08,039 INFO [train.py:1198] (1/2) Epoch 12, batch 4500, loss[loss=0.2626, ctc_loss=0.1903, cr_loss=0.3616, over 21073.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1793, cr_loss=0.3934, over 4102911.99 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:17:08,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-09-15 01:17:25,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=211956.0, ans=0.0 2024-09-15 01:17:32,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=211956.0, ans=0.125 2024-09-15 01:17:50,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-09-15 01:17:59,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=212012.66666666666, ans=0.2 2024-09-15 01:18:19,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-09-15 01:18:23,279 INFO [train.py:1198] (1/2) Epoch 12, batch 4550, loss[loss=0.2557, ctc_loss=0.1759, cr_loss=0.3988, over 20997.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1793, cr_loss=0.3935, over 4106316.90 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:18:23,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=212069.33333333334, ans=0.025 2024-09-15 01:18:26,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=212069.33333333334, ans=0.125 2024-09-15 01:18:51,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212097.66666666666, ans=0.1 2024-09-15 01:19:06,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.143e+02 2.292e+02 2.537e+02 7.495e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-15 01:19:07,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=212126.0, ans=0.05 2024-09-15 01:19:40,388 INFO [train.py:1198] (1/2) Epoch 12, batch 4600, loss[loss=0.2611, ctc_loss=0.1801, cr_loss=0.4049, over 21012.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1794, cr_loss=0.3939, over 4101463.57 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:19:45,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=212211.0, ans=0.125 2024-09-15 01:19:58,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=212239.33333333334, ans=0.125 2024-09-15 01:20:02,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=22.5 2024-09-15 01:20:12,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=212267.66666666666, ans=0.125 2024-09-15 01:20:56,194 INFO [train.py:1198] (1/2) Epoch 12, batch 4650, loss[loss=0.3116, ctc_loss=0.2295, cr_loss=0.4106, over 14478.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.179, cr_loss=0.3941, over 4097982.05 frames. ], batch size: 152, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:21:39,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.028e+02 2.193e+02 2.342e+02 4.357e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 01:21:46,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212437.66666666666, ans=0.1 2024-09-15 01:21:55,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=212437.66666666666, ans=0.0 2024-09-15 01:22:07,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=212466.0, ans=0.0 2024-09-15 01:22:13,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=212494.33333333334, ans=0.125 2024-09-15 01:22:14,367 INFO [train.py:1198] (1/2) Epoch 12, batch 4700, loss[loss=0.2914, ctc_loss=0.2024, cr_loss=0.4446, over 20866.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1795, cr_loss=0.3954, over 4108443.35 frames. ], batch size: 65, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:23:30,214 INFO [train.py:1198] (1/2) Epoch 12, batch 4750, loss[loss=0.2612, ctc_loss=0.1816, cr_loss=0.3982, over 20977.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1791, cr_loss=0.3938, over 4108958.44 frames. ], batch size: 58, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:23:39,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2024-09-15 01:24:13,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.026e+02 2.151e+02 2.358e+02 3.126e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-15 01:24:24,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212721.0, ans=0.1 2024-09-15 01:24:48,172 INFO [train.py:1198] (1/2) Epoch 12, batch 4800, loss[loss=0.2236, ctc_loss=0.1507, cr_loss=0.3644, over 20967.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1794, cr_loss=0.3935, over 4092680.47 frames. ], batch size: 48, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:25:35,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212862.66666666666, ans=0.1 2024-09-15 01:26:03,864 INFO [train.py:1198] (1/2) Epoch 12, batch 4850, loss[loss=0.2529, ctc_loss=0.1738, cr_loss=0.3956, over 20976.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1776, cr_loss=0.3909, over 4093073.64 frames. ], batch size: 50, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:26:44,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.032e+02 2.158e+02 2.366e+02 7.708e+02, threshold=4.315e+02, percent-clipped=1.0 2024-09-15 01:27:18,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=213032.66666666666, ans=0.035 2024-09-15 01:27:22,418 INFO [train.py:1198] (1/2) Epoch 12, batch 4900, loss[loss=0.2624, ctc_loss=0.1849, cr_loss=0.3873, over 20665.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1784, cr_loss=0.3919, over 4088194.12 frames. ], batch size: 71, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:27:28,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=213061.0, ans=10.0 2024-09-15 01:27:30,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=22.5 2024-09-15 01:27:55,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=213117.66666666666, ans=0.125 2024-09-15 01:28:01,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=213117.66666666666, ans=0.125 2024-09-15 01:28:18,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=213146.0, ans=0.0 2024-09-15 01:28:37,190 INFO [train.py:1198] (1/2) Epoch 12, batch 4950, loss[loss=0.3062, ctc_loss=0.2161, cr_loss=0.4506, over 20339.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1786, cr_loss=0.3924, over 4093794.34 frames. ], batch size: 74, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:28:55,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=213231.0, ans=0.05 2024-09-15 01:29:17,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.095e+02 2.327e+02 2.579e+02 5.653e+02, threshold=4.653e+02, percent-clipped=2.0 2024-09-15 01:29:21,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213287.66666666666, ans=0.1 2024-09-15 01:29:26,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=213287.66666666666, ans=0.125 2024-09-15 01:29:51,742 INFO [train.py:1198] (1/2) Epoch 12, batch 5000, loss[loss=0.2664, ctc_loss=0.1857, cr_loss=0.4038, over 21055.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1775, cr_loss=0.3908, over 4107233.48 frames. ], batch size: 59, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:30:07,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=22.5 2024-09-15 01:30:15,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=213372.66666666666, ans=0.125 2024-09-15 01:30:30,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=213401.0, ans=0.125 2024-09-15 01:30:52,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213457.66666666666, ans=0.1 2024-09-15 01:31:05,374 INFO [train.py:1198] (1/2) Epoch 12, batch 5050, loss[loss=0.2773, ctc_loss=0.197, cr_loss=0.4012, over 21051.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1786, cr_loss=0.3921, over 4096659.47 frames. ], batch size: 62, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:31:05,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-15 01:31:44,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.109e+02 2.283e+02 2.545e+02 3.226e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-15 01:31:49,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=213571.0, ans=0.2 2024-09-15 01:31:56,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=22.5 2024-09-15 01:32:01,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=213571.0, ans=0.125 2024-09-15 01:32:21,524 INFO [train.py:1198] (1/2) Epoch 12, batch 5100, loss[loss=0.2226, ctc_loss=0.1556, cr_loss=0.3348, over 20943.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1796, cr_loss=0.3934, over 4092525.07 frames. ], batch size: 49, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:33:05,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-15 01:33:07,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=213712.66666666666, ans=0.125 2024-09-15 01:33:08,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-15 01:33:19,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=213741.0, ans=0.0 2024-09-15 01:33:23,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=213741.0, ans=0.2 2024-09-15 01:33:35,652 INFO [train.py:1198] (1/2) Epoch 12, batch 5150, loss[loss=0.2085, ctc_loss=0.1408, cr_loss=0.3385, over 20973.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1796, cr_loss=0.3937, over 4084525.75 frames. ], batch size: 50, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:33:49,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=213797.66666666666, ans=0.95 2024-09-15 01:33:52,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=213797.66666666666, ans=0.0 2024-09-15 01:34:15,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.041e+02 2.271e+02 2.498e+02 3.657e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-15 01:34:31,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=213854.33333333334, ans=0.025 2024-09-15 01:34:36,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=213882.66666666666, ans=0.125 2024-09-15 01:34:49,008 INFO [train.py:1198] (1/2) Epoch 12, batch 5200, loss[loss=0.2425, ctc_loss=0.1662, cr_loss=0.3819, over 20789.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1799, cr_loss=0.3938, over 4082254.48 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:34:55,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213911.0, ans=0.1 2024-09-15 01:34:55,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=213911.0, ans=0.0 2024-09-15 01:34:59,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=213911.0, ans=0.125 2024-09-15 01:35:11,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=213939.33333333334, ans=0.125 2024-09-15 01:36:02,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2024-09-15 01:36:02,994 INFO [train.py:1198] (1/2) Epoch 12, batch 5250, loss[loss=0.2644, ctc_loss=0.1812, cr_loss=0.4157, over 20977.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1793, cr_loss=0.3934, over 4081434.42 frames. ], batch size: 55, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:36:31,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=214109.33333333334, ans=0.0 2024-09-15 01:36:33,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=214109.33333333334, ans=0.125 2024-09-15 01:36:43,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.036e+02 2.149e+02 2.371e+02 5.610e+02, threshold=4.298e+02, percent-clipped=1.0 2024-09-15 01:36:45,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=214109.33333333334, ans=0.025 2024-09-15 01:36:55,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=214137.66666666666, ans=0.125 2024-09-15 01:37:20,006 INFO [train.py:1198] (1/2) Epoch 12, batch 5300, loss[loss=0.2746, ctc_loss=0.1963, cr_loss=0.3918, over 20860.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1795, cr_loss=0.3935, over 4082861.33 frames. ], batch size: 65, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:37:33,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=214222.66666666666, ans=0.125 2024-09-15 01:37:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=214222.66666666666, ans=0.125 2024-09-15 01:37:45,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=214222.66666666666, ans=0.125 2024-09-15 01:38:06,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214279.33333333334, ans=0.125 2024-09-15 01:38:15,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=214279.33333333334, ans=0.09899494936611666 2024-09-15 01:38:34,841 INFO [train.py:1198] (1/2) Epoch 12, batch 5350, loss[loss=0.1961, ctc_loss=0.1313, cr_loss=0.3239, over 20938.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1793, cr_loss=0.3928, over 4085544.93 frames. ], batch size: 49, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:39:03,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-15 01:39:04,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-15 01:39:15,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.055e+02 2.211e+02 2.434e+02 3.498e+02, threshold=4.423e+02, percent-clipped=0.0 2024-09-15 01:39:23,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=214421.0, ans=0.0 2024-09-15 01:39:33,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=214449.33333333334, ans=0.125 2024-09-15 01:39:40,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=214449.33333333334, ans=0.125 2024-09-15 01:39:49,614 INFO [train.py:1198] (1/2) Epoch 12, batch 5400, loss[loss=0.246, ctc_loss=0.1691, cr_loss=0.3844, over 20775.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1789, cr_loss=0.3917, over 4087626.08 frames. ], batch size: 53, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:39:54,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=214477.66666666666, ans=0.1 2024-09-15 01:39:55,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=214477.66666666666, ans=0.0 2024-09-15 01:40:56,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=214591.0, ans=0.0 2024-09-15 01:41:03,588 INFO [train.py:1198] (1/2) Epoch 12, batch 5450, loss[loss=0.2801, ctc_loss=0.196, cr_loss=0.4209, over 20684.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1786, cr_loss=0.3924, over 4100079.29 frames. ], batch size: 66, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:41:06,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=214619.33333333334, ans=0.125 2024-09-15 01:41:26,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-09-15 01:41:39,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-15 01:41:45,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.039e+02 2.124e+02 2.309e+02 4.463e+02, threshold=4.249e+02, percent-clipped=1.0 2024-09-15 01:42:19,657 INFO [train.py:1198] (1/2) Epoch 12, batch 5500, loss[loss=0.2737, ctc_loss=0.1924, cr_loss=0.4064, over 19288.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1793, cr_loss=0.394, over 4101818.37 frames. ], batch size: 90, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:42:32,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-15 01:42:41,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214789.33333333334, ans=0.1 2024-09-15 01:43:13,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214846.0, ans=0.125 2024-09-15 01:43:22,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214874.33333333334, ans=0.1 2024-09-15 01:43:33,845 INFO [train.py:1198] (1/2) Epoch 12, batch 5550, loss[loss=0.3355, ctc_loss=0.2482, cr_loss=0.4364, over 14628.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1798, cr_loss=0.3945, over 4103854.83 frames. ], batch size: 150, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:43:34,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=214902.66666666666, ans=0.2 2024-09-15 01:44:13,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.034e+02 2.125e+02 2.381e+02 4.036e+02, threshold=4.249e+02, percent-clipped=0.0 2024-09-15 01:44:47,887 INFO [train.py:1198] (1/2) Epoch 12, batch 5600, loss[loss=0.3297, ctc_loss=0.2468, cr_loss=0.4148, over 14600.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1785, cr_loss=0.3924, over 4097573.64 frames. ], batch size: 150, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:44:49,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=215044.33333333334, ans=0.0 2024-09-15 01:45:31,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215129.33333333334, ans=0.1 2024-09-15 01:46:03,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215186.0, ans=0.125 2024-09-15 01:46:03,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215186.0, ans=0.0 2024-09-15 01:46:04,707 INFO [train.py:1198] (1/2) Epoch 12, batch 5650, loss[loss=0.2495, ctc_loss=0.1748, cr_loss=0.3733, over 20946.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1794, cr_loss=0.3931, over 4083884.65 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:46:07,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=215186.0, ans=0.125 2024-09-15 01:46:09,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=215186.0, ans=0.0 2024-09-15 01:46:44,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.090e+02 2.337e+02 2.625e+02 3.498e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-15 01:46:48,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=215271.0, ans=0.0 2024-09-15 01:47:18,391 INFO [train.py:1198] (1/2) Epoch 12, batch 5700, loss[loss=0.265, ctc_loss=0.1855, cr_loss=0.3975, over 20318.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1795, cr_loss=0.393, over 4082763.36 frames. ], batch size: 74, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:47:36,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=215356.0, ans=0.125 2024-09-15 01:47:39,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=215356.0, ans=0.2 2024-09-15 01:47:58,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=215384.33333333334, ans=0.0 2024-09-15 01:48:33,669 INFO [train.py:1198] (1/2) Epoch 12, batch 5750, loss[loss=0.2508, ctc_loss=0.173, cr_loss=0.3888, over 20933.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1792, cr_loss=0.3929, over 4089015.38 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:49:06,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215526.0, ans=0.1 2024-09-15 01:49:14,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.077e+02 2.189e+02 2.401e+02 3.549e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 01:49:14,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=215526.0, ans=0.0 2024-09-15 01:49:17,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=215554.33333333334, ans=0.125 2024-09-15 01:49:50,551 INFO [train.py:1198] (1/2) Epoch 12, batch 5800, loss[loss=0.2702, ctc_loss=0.1933, cr_loss=0.3846, over 20374.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1782, cr_loss=0.3919, over 4097361.78 frames. ], batch size: 74, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:50:10,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215639.33333333334, ans=0.125 2024-09-15 01:50:11,646 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:50:11,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=215639.33333333334, ans=0.125 2024-09-15 01:50:55,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=215724.33333333334, ans=0.0 2024-09-15 01:50:58,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=215724.33333333334, ans=0.125 2024-09-15 01:51:04,347 INFO [train.py:1198] (1/2) Epoch 12, batch 5850, loss[loss=0.2652, ctc_loss=0.1777, cr_loss=0.4375, over 20365.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1795, cr_loss=0.3935, over 4088982.85 frames. ], batch size: 45, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:51:06,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=215752.66666666666, ans=0.0 2024-09-15 01:51:09,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=215752.66666666666, ans=0.2 2024-09-15 01:51:10,856 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:51:38,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215809.33333333334, ans=0.0 2024-09-15 01:51:43,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.132e+02 2.269e+02 2.517e+02 4.802e+02, threshold=4.538e+02, percent-clipped=1.0 2024-09-15 01:51:45,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=215809.33333333334, ans=0.2 2024-09-15 01:51:46,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=215837.66666666666, ans=0.05 2024-09-15 01:52:00,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215837.66666666666, ans=0.125 2024-09-15 01:52:16,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=215894.33333333334, ans=0.0 2024-09-15 01:52:17,220 INFO [train.py:1198] (1/2) Epoch 12, batch 5900, loss[loss=0.2687, ctc_loss=0.1856, cr_loss=0.4156, over 20951.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.179, cr_loss=0.3924, over 4091463.43 frames. ], batch size: 64, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:52:42,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-15 01:52:52,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215951.0, ans=0.1 2024-09-15 01:53:14,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=216007.66666666666, ans=0.0 2024-09-15 01:53:19,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216007.66666666666, ans=0.125 2024-09-15 01:53:30,861 INFO [train.py:1198] (1/2) Epoch 12, batch 5950, loss[loss=0.2958, ctc_loss=0.21, cr_loss=0.429, over 20861.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3943, over 4086299.29 frames. ], batch size: 65, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:53:55,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=216064.33333333334, ans=0.125 2024-09-15 01:53:59,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=216092.66666666666, ans=0.025 2024-09-15 01:54:06,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=216092.66666666666, ans=0.5 2024-09-15 01:54:07,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=216092.66666666666, ans=0.125 2024-09-15 01:54:13,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.691e+02 2.098e+02 2.217e+02 2.366e+02 3.588e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-15 01:54:46,933 INFO [train.py:1198] (1/2) Epoch 12, batch 6000, loss[loss=0.2472, ctc_loss=0.1691, cr_loss=0.3908, over 20928.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1802, cr_loss=0.3945, over 4093280.72 frames. ], batch size: 50, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:54:46,933 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 01:55:14,065 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.04862, ctc_loss=0.04862, cr_loss=9.819e-15, over 944034.00 frames. 2024-09-15 01:55:14,065 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 01:55:14,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-15 01:55:23,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216177.66666666666, ans=0.1 2024-09-15 01:55:29,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=216206.0, ans=0.07 2024-09-15 01:55:30,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=216206.0, ans=0.0 2024-09-15 01:55:52,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216234.33333333334, ans=0.125 2024-09-15 01:56:09,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=216262.66666666666, ans=0.125 2024-09-15 01:56:15,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=216291.0, ans=0.0 2024-09-15 01:56:29,448 INFO [train.py:1198] (1/2) Epoch 12, batch 6050, loss[loss=0.2557, ctc_loss=0.1772, cr_loss=0.3925, over 21078.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1798, cr_loss=0.3937, over 4090971.99 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:56:31,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-09-15 01:56:49,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=216347.66666666666, ans=0.025 2024-09-15 01:57:09,791 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.018e+02 2.157e+02 2.427e+02 3.919e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 01:57:30,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=216432.66666666666, ans=0.0 2024-09-15 01:57:45,240 INFO [train.py:1198] (1/2) Epoch 12, batch 6100, loss[loss=0.2126, ctc_loss=0.1437, cr_loss=0.3445, over 20962.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1799, cr_loss=0.3943, over 4093725.36 frames. ], batch size: 50, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 01:57:50,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=216461.0, ans=0.0 2024-09-15 01:58:09,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=216489.33333333334, ans=0.0 2024-09-15 01:58:12,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=216489.33333333334, ans=0.0 2024-09-15 01:58:42,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=216546.0, ans=0.125 2024-09-15 01:58:56,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=216574.33333333334, ans=0.0 2024-09-15 01:58:59,372 INFO [train.py:1198] (1/2) Epoch 12, batch 6150, loss[loss=0.2592, ctc_loss=0.1799, cr_loss=0.3962, over 21026.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1803, cr_loss=0.3941, over 4077592.26 frames. ], batch size: 63, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 01:59:18,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216631.0, ans=0.1 2024-09-15 01:59:39,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.020e+02 2.125e+02 2.318e+02 3.120e+02, threshold=4.251e+02, percent-clipped=0.0 2024-09-15 02:00:12,972 INFO [train.py:1198] (1/2) Epoch 12, batch 6200, loss[loss=0.245, ctc_loss=0.1692, cr_loss=0.379, over 21063.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1793, cr_loss=0.3922, over 4065574.64 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 02:00:20,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=216744.33333333334, ans=0.125 2024-09-15 02:01:19,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-09-15 02:01:21,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=216857.66666666666, ans=0.125 2024-09-15 02:01:25,668 INFO [train.py:1198] (1/2) Epoch 12, batch 6250, loss[loss=0.3129, ctc_loss=0.2341, cr_loss=0.3937, over 14337.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1794, cr_loss=0.392, over 4055911.99 frames. ], batch size: 150, lr: 6.88e-03, grad_scale: 32.0 2024-09-15 02:01:27,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=216886.0, ans=0.025 2024-09-15 02:01:36,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=216886.0, ans=0.025 2024-09-15 02:02:02,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216942.66666666666, ans=0.125 2024-09-15 02:02:07,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.049e+02 2.202e+02 2.453e+02 4.543e+02, threshold=4.404e+02, percent-clipped=2.0 2024-09-15 02:02:25,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=216999.33333333334, ans=0.125 2024-09-15 02:02:39,728 INFO [train.py:1198] (1/2) Epoch 12, batch 6300, loss[loss=0.3247, ctc_loss=0.2358, cr_loss=0.4445, over 14121.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1815, cr_loss=0.3946, over 4022481.22 frames. ], batch size: 150, lr: 6.88e-03, grad_scale: 32.0 2024-09-15 02:02:43,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-15 02:02:50,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-09-15 02:03:19,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-15 02:03:21,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=217112.66666666666, ans=0.0 2024-09-15 02:03:22,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=217112.66666666666, ans=0.0 2024-09-15 02:03:28,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=217112.66666666666, ans=0.125 2024-09-15 02:03:31,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-09-15 02:03:42,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217141.0, ans=0.1 2024-09-15 02:03:50,271 INFO [train.py:1198] (1/2) Epoch 12, batch 6350, loss[loss=0.304, ctc_loss=0.2225, cr_loss=0.4078, over 14500.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.187, cr_loss=0.3966, over 3847792.64 frames. ], batch size: 149, lr: 6.87e-03, grad_scale: 32.0 2024-09-15 02:04:06,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217197.66666666666, ans=0.1 2024-09-15 02:04:20,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=217226.0, ans=0.125 2024-09-15 02:04:30,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.369e+02 2.580e+02 2.709e+02 3.659e+02, threshold=5.161e+02, percent-clipped=0.0 2024-09-15 02:04:34,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=217254.33333333334, ans=0.0 2024-09-15 02:05:36,638 INFO [train.py:1198] (1/2) Epoch 13, batch 0, loss[loss=0.3046, ctc_loss=0.2149, cr_loss=0.4485, over 20863.00 frames. ], tot_loss[loss=0.3046, ctc_loss=0.2149, cr_loss=0.4485, over 20863.00 frames. ], batch size: 65, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:05:36,638 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 02:05:51,857 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4051, 2.6904, 3.1030, 2.2787], device='cuda:1') 2024-09-15 02:05:55,019 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.05013, ctc_loss=0.05013, cr_loss=9.52e-15, over 944034.00 frames. 2024-09-15 02:05:55,020 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 02:05:56,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=217285.5, ans=0.0 2024-09-15 02:06:01,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=217285.5, ans=0.125 2024-09-15 02:06:13,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=217313.83333333334, ans=0.0 2024-09-15 02:06:49,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=217370.5, ans=10.0 2024-09-15 02:07:09,689 INFO [train.py:1198] (1/2) Epoch 13, batch 50, loss[loss=0.263, ctc_loss=0.1827, cr_loss=0.4016, over 21034.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1816, cr_loss=0.3982, over 931072.43 frames. ], batch size: 63, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:07:13,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=217427.16666666666, ans=0.125 2024-09-15 02:07:38,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=217483.83333333334, ans=0.125 2024-09-15 02:07:41,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=217483.83333333334, ans=0.0 2024-09-15 02:07:41,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217483.83333333334, ans=0.125 2024-09-15 02:07:45,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217483.83333333334, ans=0.1 2024-09-15 02:07:57,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=217512.16666666666, ans=0.125 2024-09-15 02:08:05,092 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.120e+02 2.301e+02 2.578e+02 3.916e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-15 02:08:23,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217568.83333333334, ans=0.1 2024-09-15 02:08:24,526 INFO [train.py:1198] (1/2) Epoch 13, batch 100, loss[loss=0.2529, ctc_loss=0.1729, cr_loss=0.3998, over 21078.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1795, cr_loss=0.3941, over 1629891.63 frames. ], batch size: 59, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:08:41,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=217597.16666666666, ans=0.125 2024-09-15 02:08:58,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=217625.5, ans=0.125 2024-09-15 02:09:13,620 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:09:30,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217682.16666666666, ans=0.1 2024-09-15 02:09:45,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217710.5, ans=0.125 2024-09-15 02:09:46,712 INFO [train.py:1198] (1/2) Epoch 13, batch 150, loss[loss=0.2527, ctc_loss=0.178, cr_loss=0.3737, over 21041.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1799, cr_loss=0.395, over 2166908.99 frames. ], batch size: 56, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:09:46,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217710.5, ans=0.125 2024-09-15 02:10:00,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217738.83333333334, ans=0.1 2024-09-15 02:10:41,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 1.973e+02 2.130e+02 2.293e+02 4.231e+02, threshold=4.260e+02, percent-clipped=0.0 2024-09-15 02:10:42,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=217795.5, ans=0.125 2024-09-15 02:11:01,240 INFO [train.py:1198] (1/2) Epoch 13, batch 200, loss[loss=0.2801, ctc_loss=0.1954, cr_loss=0.4233, over 20976.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1791, cr_loss=0.3954, over 2602482.40 frames. ], batch size: 64, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:11:21,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=217880.5, ans=0.125 2024-09-15 02:11:37,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=217908.83333333334, ans=0.0 2024-09-15 02:11:40,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217908.83333333334, ans=0.125 2024-09-15 02:12:16,878 INFO [train.py:1198] (1/2) Epoch 13, batch 250, loss[loss=0.3142, ctc_loss=0.2309, cr_loss=0.4166, over 13861.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1774, cr_loss=0.3926, over 2931070.91 frames. ], batch size: 149, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:12:33,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218022.16666666666, ans=0.0 2024-09-15 02:12:39,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218022.16666666666, ans=0.1 2024-09-15 02:12:41,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218022.16666666666, ans=0.0 2024-09-15 02:12:47,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=218050.5, ans=0.025 2024-09-15 02:12:52,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=218050.5, ans=0.0 2024-09-15 02:13:09,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=22.5 2024-09-15 02:13:13,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.014e+02 2.202e+02 2.449e+02 3.252e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-15 02:13:13,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=218078.83333333334, ans=0.125 2024-09-15 02:13:19,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=218107.16666666666, ans=0.025 2024-09-15 02:13:32,330 INFO [train.py:1198] (1/2) Epoch 13, batch 300, loss[loss=0.2189, ctc_loss=0.151, cr_loss=0.3394, over 20982.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1778, cr_loss=0.3911, over 3171281.41 frames. ], batch size: 51, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:14:11,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=218192.16666666666, ans=0.0 2024-09-15 02:14:32,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-09-15 02:14:50,746 INFO [train.py:1198] (1/2) Epoch 13, batch 350, loss[loss=0.2543, ctc_loss=0.1748, cr_loss=0.3975, over 21026.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1773, cr_loss=0.3908, over 3388102.02 frames. ], batch size: 63, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:15:51,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.038e+02 2.174e+02 2.334e+02 3.174e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-15 02:16:09,281 INFO [train.py:1198] (1/2) Epoch 13, batch 400, loss[loss=0.2461, ctc_loss=0.1717, cr_loss=0.3718, over 21024.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1778, cr_loss=0.3923, over 3550922.87 frames. ], batch size: 61, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:16:11,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=218418.83333333334, ans=0.0 2024-09-15 02:16:43,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=218475.5, ans=0.0 2024-09-15 02:16:54,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=218503.83333333334, ans=0.125 2024-09-15 02:17:12,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=22.5 2024-09-15 02:17:21,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=218532.16666666666, ans=0.125 2024-09-15 02:17:24,416 INFO [train.py:1198] (1/2) Epoch 13, batch 450, loss[loss=0.2798, ctc_loss=0.1967, cr_loss=0.4153, over 21003.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1783, cr_loss=0.393, over 3681708.25 frames. ], batch size: 61, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:17:25,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.48 vs. limit=10.0 2024-09-15 02:17:27,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=218560.5, ans=0.125 2024-09-15 02:17:37,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-15 02:18:11,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=218645.5, ans=0.125 2024-09-15 02:18:21,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.023e+02 2.176e+02 2.401e+02 3.872e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 02:18:39,148 INFO [train.py:1198] (1/2) Epoch 13, batch 500, loss[loss=0.2468, ctc_loss=0.1694, cr_loss=0.387, over 20776.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1789, cr_loss=0.3941, over 3765066.36 frames. ], batch size: 56, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:19:27,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=218787.16666666666, ans=0.0 2024-09-15 02:19:41,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218815.5, ans=0.0 2024-09-15 02:19:49,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=218815.5, ans=0.025 2024-09-15 02:19:54,065 INFO [train.py:1198] (1/2) Epoch 13, batch 550, loss[loss=0.2558, ctc_loss=0.1792, cr_loss=0.383, over 21052.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1785, cr_loss=0.3934, over 3841099.27 frames. ], batch size: 56, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:20:11,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-15 02:20:12,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=218872.16666666666, ans=0.125 2024-09-15 02:20:26,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218900.5, ans=0.1 2024-09-15 02:20:53,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218928.83333333334, ans=0.1 2024-09-15 02:20:54,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.048e+02 2.205e+02 2.451e+02 4.095e+02, threshold=4.409e+02, percent-clipped=0.0 2024-09-15 02:21:15,692 INFO [train.py:1198] (1/2) Epoch 13, batch 600, loss[loss=0.3033, ctc_loss=0.2142, cr_loss=0.4453, over 20029.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1779, cr_loss=0.3923, over 3898261.94 frames. ], batch size: 80, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:21:44,636 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:21:59,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=219070.5, ans=0.025 2024-09-15 02:22:01,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219070.5, ans=0.1 2024-09-15 02:22:04,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=219070.5, ans=0.0 2024-09-15 02:22:15,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-15 02:22:18,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-09-15 02:22:24,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=219098.83333333334, ans=0.0 2024-09-15 02:22:25,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219098.83333333334, ans=0.1 2024-09-15 02:22:28,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=219098.83333333334, ans=0.125 2024-09-15 02:22:31,258 INFO [train.py:1198] (1/2) Epoch 13, batch 650, loss[loss=0.286, ctc_loss=0.2003, cr_loss=0.4281, over 19424.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1775, cr_loss=0.3911, over 3942342.53 frames. ], batch size: 90, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:22:31,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=219127.16666666666, ans=0.125 2024-09-15 02:22:36,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=219127.16666666666, ans=0.04949747468305833 2024-09-15 02:22:53,048 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:22:58,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=219155.5, ans=0.125 2024-09-15 02:23:05,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2024-09-15 02:23:06,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219183.83333333334, ans=0.125 2024-09-15 02:23:28,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.137e+02 2.272e+02 2.536e+02 3.616e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-15 02:23:46,391 INFO [train.py:1198] (1/2) Epoch 13, batch 700, loss[loss=0.2652, ctc_loss=0.1879, cr_loss=0.3868, over 20360.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1772, cr_loss=0.3903, over 3971292.49 frames. ], batch size: 74, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:24:06,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=219297.16666666666, ans=0.125 2024-09-15 02:24:15,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219325.5, ans=0.1 2024-09-15 02:24:25,968 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:25:01,834 INFO [train.py:1198] (1/2) Epoch 13, batch 750, loss[loss=0.2521, ctc_loss=0.1758, cr_loss=0.3815, over 21068.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1767, cr_loss=0.3904, over 4007200.40 frames. ], batch size: 56, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:25:59,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.047e+02 2.137e+02 2.327e+02 3.350e+02, threshold=4.275e+02, percent-clipped=0.0 2024-09-15 02:26:15,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=219523.83333333334, ans=0.2 2024-09-15 02:26:19,806 INFO [train.py:1198] (1/2) Epoch 13, batch 800, loss[loss=0.2471, ctc_loss=0.1715, cr_loss=0.3781, over 20657.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1769, cr_loss=0.3905, over 4025566.20 frames. ], batch size: 68, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:26:32,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219552.16666666666, ans=0.125 2024-09-15 02:26:36,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219580.5, ans=0.125 2024-09-15 02:26:41,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=219580.5, ans=0.0 2024-09-15 02:26:56,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=219608.83333333334, ans=0.025 2024-09-15 02:27:02,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219608.83333333334, ans=0.1 2024-09-15 02:27:36,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-15 02:27:37,815 INFO [train.py:1198] (1/2) Epoch 13, batch 850, loss[loss=0.2967, ctc_loss=0.2088, cr_loss=0.4394, over 21070.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1766, cr_loss=0.3899, over 4039696.42 frames. ], batch size: 59, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:27:38,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=219693.83333333334, ans=0.125 2024-09-15 02:27:42,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=219693.83333333334, ans=0.125 2024-09-15 02:27:56,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-15 02:28:31,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=219778.83333333334, ans=0.07 2024-09-15 02:28:31,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-15 02:28:35,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.051e+02 2.169e+02 2.415e+02 3.935e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 02:28:53,237 INFO [train.py:1198] (1/2) Epoch 13, batch 900, loss[loss=0.2008, ctc_loss=0.139, cr_loss=0.3092, over 20977.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1768, cr_loss=0.3907, over 4065674.33 frames. ], batch size: 49, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:29:04,034 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:29:24,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=219892.16666666666, ans=0.0 2024-09-15 02:30:02,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2024-09-15 02:30:09,299 INFO [train.py:1198] (1/2) Epoch 13, batch 950, loss[loss=0.2576, ctc_loss=0.1789, cr_loss=0.3935, over 20933.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.177, cr_loss=0.3914, over 4077758.78 frames. ], batch size: 60, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:31:04,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=220062.16666666666, ans=0.2 2024-09-15 02:31:07,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.053e+02 2.160e+02 2.321e+02 3.157e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 02:31:19,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=220090.5, ans=0.025 2024-09-15 02:31:24,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=220118.83333333334, ans=0.125 2024-09-15 02:31:25,282 INFO [train.py:1198] (1/2) Epoch 13, batch 1000, loss[loss=0.2141, ctc_loss=0.1431, cr_loss=0.355, over 20881.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1764, cr_loss=0.3908, over 4087099.12 frames. ], batch size: 54, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:31:31,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=220118.83333333334, ans=0.0 2024-09-15 02:32:09,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=220175.5, ans=0.025 2024-09-15 02:32:14,267 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:32:21,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=220203.83333333334, ans=0.2 2024-09-15 02:32:27,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=220232.16666666666, ans=0.0 2024-09-15 02:32:46,956 INFO [train.py:1198] (1/2) Epoch 13, batch 1050, loss[loss=0.2803, ctc_loss=0.195, cr_loss=0.4266, over 20850.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1761, cr_loss=0.3906, over 4084786.18 frames. ], batch size: 65, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:32:47,340 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:33:14,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220288.83333333334, ans=0.0 2024-09-15 02:33:26,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=220317.16666666666, ans=0.025 2024-09-15 02:33:29,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220317.16666666666, ans=0.125 2024-09-15 02:33:43,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.044e+02 2.167e+02 2.351e+02 4.531e+02, threshold=4.334e+02, percent-clipped=1.0 2024-09-15 02:34:01,644 INFO [train.py:1198] (1/2) Epoch 13, batch 1100, loss[loss=0.2422, ctc_loss=0.1668, cr_loss=0.3769, over 20970.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.177, cr_loss=0.3917, over 4082616.48 frames. ], batch size: 58, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:34:06,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-15 02:34:09,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=220402.16666666666, ans=0.125 2024-09-15 02:34:13,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220402.16666666666, ans=0.125 2024-09-15 02:34:36,909 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:35:05,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=220515.5, ans=0.125 2024-09-15 02:35:16,917 INFO [train.py:1198] (1/2) Epoch 13, batch 1150, loss[loss=0.3242, ctc_loss=0.2301, cr_loss=0.4704, over 18561.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1775, cr_loss=0.3927, over 4088683.63 frames. ], batch size: 108, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:35:47,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=220600.5, ans=0.0 2024-09-15 02:35:54,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=220600.5, ans=0.0 2024-09-15 02:36:05,101 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:36:14,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.085e+02 2.221e+02 2.392e+02 4.086e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 02:36:32,146 INFO [train.py:1198] (1/2) Epoch 13, batch 1200, loss[loss=0.2173, ctc_loss=0.1512, cr_loss=0.3306, over 20969.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1771, cr_loss=0.3918, over 4089318.58 frames. ], batch size: 48, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:36:32,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=220685.5, ans=0.125 2024-09-15 02:36:44,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=220685.5, ans=0.09899494936611666 2024-09-15 02:36:48,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220713.83333333334, ans=0.1 2024-09-15 02:37:02,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=220742.16666666666, ans=0.125 2024-09-15 02:37:25,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=220770.5, ans=0.125 2024-09-15 02:37:49,113 INFO [train.py:1198] (1/2) Epoch 13, batch 1250, loss[loss=0.2344, ctc_loss=0.1629, cr_loss=0.3578, over 20951.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1776, cr_loss=0.392, over 4089807.46 frames. ], batch size: 49, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:38:21,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-09-15 02:38:29,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=220883.83333333334, ans=0.5 2024-09-15 02:38:35,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220912.16666666666, ans=0.125 2024-09-15 02:38:48,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.061e+02 2.280e+02 2.497e+02 2.985e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-15 02:39:01,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:39:04,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=220940.5, ans=0.125 2024-09-15 02:39:07,304 INFO [train.py:1198] (1/2) Epoch 13, batch 1300, loss[loss=0.2149, ctc_loss=0.1445, cr_loss=0.3521, over 21020.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1777, cr_loss=0.3921, over 4088916.59 frames. ], batch size: 52, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:39:11,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-15 02:40:22,367 INFO [train.py:1198] (1/2) Epoch 13, batch 1350, loss[loss=0.2342, ctc_loss=0.1606, cr_loss=0.3682, over 21005.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1786, cr_loss=0.3929, over 4077614.70 frames. ], batch size: 61, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:40:53,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-09-15 02:41:19,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.070e+02 2.240e+02 2.532e+02 4.939e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-15 02:41:26,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=221223.83333333334, ans=0.05 2024-09-15 02:41:29,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221223.83333333334, ans=0.1 2024-09-15 02:41:32,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-15 02:41:37,055 INFO [train.py:1198] (1/2) Epoch 13, batch 1400, loss[loss=0.2507, ctc_loss=0.1739, cr_loss=0.3843, over 21069.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1794, cr_loss=0.3942, over 4081039.42 frames. ], batch size: 59, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:41:37,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221252.16666666666, ans=0.1 2024-09-15 02:42:25,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=221337.16666666666, ans=0.125 2024-09-15 02:42:41,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=221365.5, ans=0.015 2024-09-15 02:42:52,128 INFO [train.py:1198] (1/2) Epoch 13, batch 1450, loss[loss=0.2464, ctc_loss=0.1685, cr_loss=0.3896, over 20957.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1805, cr_loss=0.3966, over 4087704.15 frames. ], batch size: 50, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:43:00,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.68 vs. limit=10.0 2024-09-15 02:43:09,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=221422.16666666666, ans=0.125 2024-09-15 02:43:52,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 1.976e+02 2.115e+02 2.299e+02 3.468e+02, threshold=4.231e+02, percent-clipped=0.0 2024-09-15 02:44:12,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-09-15 02:44:13,248 INFO [train.py:1198] (1/2) Epoch 13, batch 1500, loss[loss=0.2774, ctc_loss=0.1977, cr_loss=0.3987, over 18245.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1797, cr_loss=0.3949, over 4087385.12 frames. ], batch size: 108, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:44:22,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=221535.5, ans=0.0 2024-09-15 02:44:39,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=221563.83333333334, ans=0.0 2024-09-15 02:45:15,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=221648.83333333334, ans=0.0 2024-09-15 02:45:28,831 INFO [train.py:1198] (1/2) Epoch 13, batch 1550, loss[loss=0.2881, ctc_loss=0.2021, cr_loss=0.4298, over 20654.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.179, cr_loss=0.3949, over 4088776.57 frames. ], batch size: 66, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:45:54,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=221705.5, ans=0.125 2024-09-15 02:46:25,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.025e+02 2.167e+02 2.353e+02 3.428e+02, threshold=4.333e+02, percent-clipped=0.0 2024-09-15 02:46:43,864 INFO [train.py:1198] (1/2) Epoch 13, batch 1600, loss[loss=0.2513, ctc_loss=0.1746, cr_loss=0.3835, over 21010.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1786, cr_loss=0.3947, over 4104549.92 frames. ], batch size: 63, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:46:48,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=221818.83333333334, ans=0.125 2024-09-15 02:46:56,504 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:47:06,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=221847.16666666666, ans=0.125 2024-09-15 02:47:37,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=221903.83333333334, ans=15.0 2024-09-15 02:47:59,147 INFO [train.py:1198] (1/2) Epoch 13, batch 1650, loss[loss=0.2725, ctc_loss=0.1918, cr_loss=0.4034, over 19428.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1773, cr_loss=0.3924, over 4116518.91 frames. ], batch size: 90, lr: 6.53e-03, grad_scale: 32.0 2024-09-15 02:48:06,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2024-09-15 02:48:07,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=221960.5, ans=0.125 2024-09-15 02:48:20,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=221988.83333333334, ans=0.0 2024-09-15 02:48:59,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.061e+02 2.203e+02 2.439e+02 3.566e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 02:49:17,370 INFO [train.py:1198] (1/2) Epoch 13, batch 1700, loss[loss=0.2712, ctc_loss=0.1894, cr_loss=0.4087, over 19556.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1775, cr_loss=0.3924, over 4104655.51 frames. ], batch size: 90, lr: 6.53e-03, grad_scale: 32.0 2024-09-15 02:49:38,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=222130.5, ans=0.025 2024-09-15 02:49:38,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=222130.5, ans=0.0 2024-09-15 02:49:43,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=22.5 2024-09-15 02:50:34,698 INFO [train.py:1198] (1/2) Epoch 13, batch 1750, loss[loss=0.2896, ctc_loss=0.1992, cr_loss=0.4518, over 20715.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1784, cr_loss=0.3928, over 4092338.77 frames. ], batch size: 68, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:51:32,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.109e+02 2.307e+02 2.537e+02 5.137e+02, threshold=4.614e+02, percent-clipped=2.0 2024-09-15 02:51:48,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=222385.5, ans=22.5 2024-09-15 02:51:49,575 INFO [train.py:1198] (1/2) Epoch 13, batch 1800, loss[loss=0.2403, ctc_loss=0.1656, cr_loss=0.3734, over 20980.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1784, cr_loss=0.3933, over 4095230.82 frames. ], batch size: 52, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:52:22,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=222442.16666666666, ans=0.125 2024-09-15 02:52:27,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=222442.16666666666, ans=0.09899494936611666 2024-09-15 02:52:31,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2024-09-15 02:52:46,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=222470.5, ans=0.5 2024-09-15 02:53:05,102 INFO [train.py:1198] (1/2) Epoch 13, batch 1850, loss[loss=0.2546, ctc_loss=0.1778, cr_loss=0.3841, over 20829.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1774, cr_loss=0.3919, over 4102483.17 frames. ], batch size: 59, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:53:34,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=222583.83333333334, ans=0.125 2024-09-15 02:53:36,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2024-09-15 02:53:42,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=22.5 2024-09-15 02:53:43,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=222583.83333333334, ans=15.0 2024-09-15 02:54:03,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.071e+02 2.261e+02 2.540e+02 4.776e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-15 02:54:12,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=222640.5, ans=0.125 2024-09-15 02:54:19,955 INFO [train.py:1198] (1/2) Epoch 13, batch 1900, loss[loss=0.2486, ctc_loss=0.1685, cr_loss=0.4001, over 20801.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.178, cr_loss=0.3933, over 4109407.65 frames. ], batch size: 53, lr: 6.52e-03, grad_scale: 16.0 2024-09-15 02:54:23,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=222668.83333333334, ans=0.0 2024-09-15 02:54:30,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=222668.83333333334, ans=0.2 2024-09-15 02:54:52,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=222725.5, ans=0.125 2024-09-15 02:55:32,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-09-15 02:55:33,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=222782.16666666666, ans=0.125 2024-09-15 02:55:40,336 INFO [train.py:1198] (1/2) Epoch 13, batch 1950, loss[loss=0.2724, ctc_loss=0.1886, cr_loss=0.4191, over 20845.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1777, cr_loss=0.3932, over 4115013.79 frames. ], batch size: 65, lr: 6.52e-03, grad_scale: 16.0 2024-09-15 02:55:51,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=222810.5, ans=0.05 2024-09-15 02:56:25,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=222895.5, ans=0.125 2024-09-15 02:56:28,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222895.5, ans=0.125 2024-09-15 02:56:38,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.128e+02 2.318e+02 2.625e+02 5.150e+02, threshold=4.636e+02, percent-clipped=1.0 2024-09-15 02:56:55,082 INFO [train.py:1198] (1/2) Epoch 13, batch 2000, loss[loss=0.2705, ctc_loss=0.1895, cr_loss=0.4046, over 20707.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1779, cr_loss=0.3941, over 4116511.42 frames. ], batch size: 71, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:57:07,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=222952.16666666666, ans=0.0 2024-09-15 02:57:16,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=222980.5, ans=0.125 2024-09-15 02:57:19,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=222980.5, ans=0.125 2024-09-15 02:57:38,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=223037.16666666666, ans=0.125 2024-09-15 02:57:42,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223037.16666666666, ans=0.0 2024-09-15 02:57:53,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223065.5, ans=0.0 2024-09-15 02:58:04,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=223065.5, ans=0.125 2024-09-15 02:58:08,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=223093.83333333334, ans=0.0 2024-09-15 02:58:09,932 INFO [train.py:1198] (1/2) Epoch 13, batch 2050, loss[loss=0.2383, ctc_loss=0.1593, cr_loss=0.3948, over 20883.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1768, cr_loss=0.3937, over 4124196.54 frames. ], batch size: 54, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:58:37,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223122.16666666666, ans=0.125 2024-09-15 02:58:46,494 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:59:08,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.051e+02 2.212e+02 2.362e+02 4.328e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 02:59:24,850 INFO [train.py:1198] (1/2) Epoch 13, batch 2100, loss[loss=0.2855, ctc_loss=0.1977, cr_loss=0.439, over 20841.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1769, cr_loss=0.393, over 4110374.11 frames. ], batch size: 59, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:59:26,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=223235.5, ans=0.2 2024-09-15 02:59:34,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=223235.5, ans=0.125 2024-09-15 02:59:38,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=223263.83333333334, ans=0.0 2024-09-15 03:00:40,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-09-15 03:00:42,936 INFO [train.py:1198] (1/2) Epoch 13, batch 2150, loss[loss=0.2214, ctc_loss=0.1498, cr_loss=0.3582, over 20950.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1771, cr_loss=0.3931, over 4102503.78 frames. ], batch size: 51, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:01:43,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.118e+02 2.303e+02 2.599e+02 3.957e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-15 03:01:59,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=223518.83333333334, ans=0.0 2024-09-15 03:01:59,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=223518.83333333334, ans=0.0 2024-09-15 03:02:00,600 INFO [train.py:1198] (1/2) Epoch 13, batch 2200, loss[loss=0.2768, ctc_loss=0.1934, cr_loss=0.4168, over 21030.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1771, cr_loss=0.3932, over 4106229.77 frames. ], batch size: 63, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:02:35,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223575.5, ans=0.125 2024-09-15 03:02:47,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223603.83333333334, ans=0.125 2024-09-15 03:03:13,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=223632.16666666666, ans=0.0 2024-09-15 03:03:16,149 INFO [train.py:1198] (1/2) Epoch 13, batch 2250, loss[loss=0.2017, ctc_loss=0.1363, cr_loss=0.3271, over 20272.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1757, cr_loss=0.3911, over 4114915.98 frames. ], batch size: 45, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:03:36,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-15 03:03:40,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=223688.83333333334, ans=0.025 2024-09-15 03:04:14,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.093e+02 2.283e+02 2.553e+02 6.758e+02, threshold=4.565e+02, percent-clipped=1.0 2024-09-15 03:04:30,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223802.16666666666, ans=0.125 2024-09-15 03:04:31,380 INFO [train.py:1198] (1/2) Epoch 13, batch 2300, loss[loss=0.267, ctc_loss=0.1799, cr_loss=0.4353, over 21076.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1767, cr_loss=0.3931, over 4101946.12 frames. ], batch size: 59, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:04:31,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=223802.16666666666, ans=0.125 2024-09-15 03:04:43,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2024-09-15 03:05:15,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=223887.16666666666, ans=0.0 2024-09-15 03:05:16,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223887.16666666666, ans=0.125 2024-09-15 03:05:47,217 INFO [train.py:1198] (1/2) Epoch 13, batch 2350, loss[loss=0.2769, ctc_loss=0.1944, cr_loss=0.4122, over 20821.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1768, cr_loss=0.3934, over 4107674.38 frames. ], batch size: 59, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:06:50,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.070e+02 2.265e+02 2.565e+02 3.309e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-15 03:07:07,012 INFO [train.py:1198] (1/2) Epoch 13, batch 2400, loss[loss=0.2197, ctc_loss=0.1489, cr_loss=0.3537, over 20962.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1763, cr_loss=0.3916, over 4096232.44 frames. ], batch size: 49, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:07:15,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224085.5, ans=0.1 2024-09-15 03:07:18,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-15 03:07:42,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=224142.16666666666, ans=0.0 2024-09-15 03:07:45,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=224142.16666666666, ans=0.125 2024-09-15 03:08:00,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224170.5, ans=0.1 2024-09-15 03:08:01,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=224170.5, ans=0.125 2024-09-15 03:08:22,764 INFO [train.py:1198] (1/2) Epoch 13, batch 2450, loss[loss=0.2794, ctc_loss=0.194, cr_loss=0.427, over 20950.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1764, cr_loss=0.3914, over 4087757.95 frames. ], batch size: 64, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:08:42,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=224255.5, ans=0.025 2024-09-15 03:09:20,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 1.968e+02 2.112e+02 2.233e+02 3.264e+02, threshold=4.224e+02, percent-clipped=0.0 2024-09-15 03:09:21,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=224340.5, ans=0.0 2024-09-15 03:09:24,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=224340.5, ans=0.0 2024-09-15 03:09:37,180 INFO [train.py:1198] (1/2) Epoch 13, batch 2500, loss[loss=0.261, ctc_loss=0.1773, cr_loss=0.4185, over 21062.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.176, cr_loss=0.3911, over 4090719.40 frames. ], batch size: 56, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:09:51,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=224397.16666666666, ans=0.125 2024-09-15 03:10:11,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224425.5, ans=0.125 2024-09-15 03:10:28,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=224453.83333333334, ans=0.125 2024-09-15 03:10:32,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=224453.83333333334, ans=0.125 2024-09-15 03:10:52,322 INFO [train.py:1198] (1/2) Epoch 13, batch 2550, loss[loss=0.1947, ctc_loss=0.1333, cr_loss=0.3071, over 20997.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1768, cr_loss=0.3914, over 4081207.12 frames. ], batch size: 51, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:11:16,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224538.83333333334, ans=0.1 2024-09-15 03:11:39,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224595.5, ans=0.125 2024-09-15 03:11:40,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224595.5, ans=0.1 2024-09-15 03:11:48,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224595.5, ans=0.125 2024-09-15 03:11:53,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.084e+02 2.228e+02 2.537e+02 4.090e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 03:11:58,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.83 vs. limit=22.5 2024-09-15 03:12:10,205 INFO [train.py:1198] (1/2) Epoch 13, batch 2600, loss[loss=0.2495, ctc_loss=0.1749, cr_loss=0.3729, over 20956.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1777, cr_loss=0.393, over 4090120.65 frames. ], batch size: 51, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:12:10,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=224652.16666666666, ans=0.025 2024-09-15 03:12:22,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=224652.16666666666, ans=0.0 2024-09-15 03:12:53,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=224708.83333333334, ans=0.125 2024-09-15 03:12:56,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=224737.16666666666, ans=0.0 2024-09-15 03:12:59,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=224737.16666666666, ans=0.2 2024-09-15 03:13:27,660 INFO [train.py:1198] (1/2) Epoch 13, batch 2650, loss[loss=0.2775, ctc_loss=0.2001, cr_loss=0.3866, over 20981.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1778, cr_loss=0.3925, over 4092769.26 frames. ], batch size: 67, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:13:30,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=224793.83333333334, ans=0.125 2024-09-15 03:13:51,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-15 03:14:05,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-15 03:14:18,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=224878.83333333334, ans=0.07 2024-09-15 03:14:24,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=224878.83333333334, ans=0.05 2024-09-15 03:14:26,944 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.696e+02 2.046e+02 2.182e+02 2.356e+02 3.113e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 03:14:32,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-15 03:14:43,303 INFO [train.py:1198] (1/2) Epoch 13, batch 2700, loss[loss=0.3277, ctc_loss=0.2414, cr_loss=0.4316, over 13963.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1772, cr_loss=0.392, over 4086290.81 frames. ], batch size: 149, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:14:43,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224935.5, ans=0.125 2024-09-15 03:14:57,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=224963.83333333334, ans=0.0 2024-09-15 03:15:03,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224963.83333333334, ans=0.125 2024-09-15 03:15:28,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=225020.5, ans=0.125 2024-09-15 03:15:30,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=225020.5, ans=0.125 2024-09-15 03:15:58,632 INFO [train.py:1198] (1/2) Epoch 13, batch 2750, loss[loss=0.2571, ctc_loss=0.1767, cr_loss=0.4021, over 20778.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1772, cr_loss=0.3923, over 4088044.55 frames. ], batch size: 53, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:16:22,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=225105.5, ans=0.025 2024-09-15 03:16:23,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225105.5, ans=0.1 2024-09-15 03:16:34,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=225133.83333333334, ans=0.2 2024-09-15 03:16:37,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=225133.83333333334, ans=0.0 2024-09-15 03:16:42,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=225162.16666666666, ans=0.2 2024-09-15 03:16:51,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=225162.16666666666, ans=0.0 2024-09-15 03:16:54,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=12.0 2024-09-15 03:16:56,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.036e+02 2.256e+02 2.486e+02 3.603e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-15 03:17:00,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=225190.5, ans=0.125 2024-09-15 03:17:00,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=225190.5, ans=0.04949747468305833 2024-09-15 03:17:03,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-09-15 03:17:16,027 INFO [train.py:1198] (1/2) Epoch 13, batch 2800, loss[loss=0.2499, ctc_loss=0.1696, cr_loss=0.4012, over 20976.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1773, cr_loss=0.3926, over 4083754.58 frames. ], batch size: 55, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:17:31,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=225247.16666666666, ans=0.0 2024-09-15 03:17:42,044 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:17:58,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225275.5, ans=0.1 2024-09-15 03:18:20,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=225332.16666666666, ans=0.125 2024-09-15 03:18:22,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=225332.16666666666, ans=0.04949747468305833 2024-09-15 03:18:31,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=225332.16666666666, ans=0.125 2024-09-15 03:18:34,285 INFO [train.py:1198] (1/2) Epoch 13, batch 2850, loss[loss=0.2583, ctc_loss=0.1804, cr_loss=0.3896, over 21083.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1769, cr_loss=0.3919, over 4089910.16 frames. ], batch size: 59, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:18:34,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=12.0 2024-09-15 03:18:37,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=225360.5, ans=0.125 2024-09-15 03:19:01,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=225388.83333333334, ans=0.05 2024-09-15 03:19:04,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225417.16666666666, ans=0.1 2024-09-15 03:19:04,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2024-09-15 03:19:32,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.707e+02 2.056e+02 2.193e+02 2.407e+02 3.132e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 03:19:49,189 INFO [train.py:1198] (1/2) Epoch 13, batch 2900, loss[loss=0.2265, ctc_loss=0.1572, cr_loss=0.3465, over 20922.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1769, cr_loss=0.3921, over 4096359.22 frames. ], batch size: 49, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:19:49,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225502.16666666666, ans=0.1 2024-09-15 03:20:36,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=225587.16666666666, ans=0.125 2024-09-15 03:20:42,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=225587.16666666666, ans=0.07 2024-09-15 03:20:43,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=225587.16666666666, ans=0.125 2024-09-15 03:21:04,642 INFO [train.py:1198] (1/2) Epoch 13, batch 2950, loss[loss=0.2421, ctc_loss=0.1683, cr_loss=0.3692, over 20967.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1759, cr_loss=0.3908, over 4109923.42 frames. ], batch size: 58, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:21:14,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=225643.83333333334, ans=0.95 2024-09-15 03:22:03,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.072e+02 2.214e+02 2.416e+02 4.093e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-15 03:22:20,043 INFO [train.py:1198] (1/2) Epoch 13, batch 3000, loss[loss=0.2382, ctc_loss=0.1652, cr_loss=0.3654, over 20988.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1761, cr_loss=0.3906, over 4093054.96 frames. ], batch size: 51, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:22:20,044 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 03:22:43,007 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.0496, ctc_loss=0.0496, cr_loss=9.603e-15, over 944034.00 frames. 2024-09-15 03:22:43,008 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 03:22:46,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=225785.5, ans=0.035 2024-09-15 03:23:54,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=225898.83333333334, ans=0.125 2024-09-15 03:24:01,449 INFO [train.py:1198] (1/2) Epoch 13, batch 3050, loss[loss=0.2723, ctc_loss=0.1854, cr_loss=0.4343, over 21089.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1752, cr_loss=0.3896, over 4098157.16 frames. ], batch size: 59, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:24:09,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=225927.16666666666, ans=0.025 2024-09-15 03:24:18,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=225955.5, ans=0.125 2024-09-15 03:24:25,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=225955.5, ans=0.125 2024-09-15 03:24:33,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=225983.83333333334, ans=0.125 2024-09-15 03:24:42,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=225983.83333333334, ans=0.0 2024-09-15 03:25:00,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.130e+02 2.273e+02 2.517e+02 7.134e+02, threshold=4.545e+02, percent-clipped=1.0 2024-09-15 03:25:17,350 INFO [train.py:1198] (1/2) Epoch 13, batch 3100, loss[loss=0.2716, ctc_loss=0.1852, cr_loss=0.4318, over 20954.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.176, cr_loss=0.3915, over 4108741.64 frames. ], batch size: 64, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:25:37,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=226097.16666666666, ans=0.125 2024-09-15 03:26:01,184 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:26:05,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=226153.83333333334, ans=0.0 2024-09-15 03:26:08,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226153.83333333334, ans=0.1 2024-09-15 03:26:32,574 INFO [train.py:1198] (1/2) Epoch 13, batch 3150, loss[loss=0.2798, ctc_loss=0.1951, cr_loss=0.4233, over 20701.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.178, cr_loss=0.3944, over 4093914.46 frames. ], batch size: 71, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:27:26,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=226295.5, ans=0.125 2024-09-15 03:27:29,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226295.5, ans=0.125 2024-09-15 03:27:31,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.107e+02 2.272e+02 2.467e+02 3.174e+02, threshold=4.545e+02, percent-clipped=0.0 2024-09-15 03:27:33,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=226323.83333333334, ans=0.0 2024-09-15 03:27:44,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-15 03:27:48,129 INFO [train.py:1198] (1/2) Epoch 13, batch 3200, loss[loss=0.2976, ctc_loss=0.2055, cr_loss=0.4602, over 20729.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1782, cr_loss=0.3943, over 4080847.35 frames. ], batch size: 71, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:27:48,517 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:27:55,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=226352.16666666666, ans=0.125 2024-09-15 03:28:00,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226352.16666666666, ans=0.0 2024-09-15 03:28:15,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=226380.5, ans=0.2 2024-09-15 03:28:45,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=226437.16666666666, ans=0.125 2024-09-15 03:28:47,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2024-09-15 03:28:59,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=226465.5, ans=0.0 2024-09-15 03:29:09,378 INFO [train.py:1198] (1/2) Epoch 13, batch 3250, loss[loss=0.2687, ctc_loss=0.1885, cr_loss=0.401, over 20829.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1772, cr_loss=0.3931, over 4086416.19 frames. ], batch size: 59, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:29:24,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=226522.16666666666, ans=0.125 2024-09-15 03:29:30,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=226522.16666666666, ans=0.0 2024-09-15 03:30:07,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226578.83333333334, ans=0.1 2024-09-15 03:30:08,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.068e+02 2.228e+02 2.357e+02 4.229e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-15 03:30:25,070 INFO [train.py:1198] (1/2) Epoch 13, batch 3300, loss[loss=0.2381, ctc_loss=0.1612, cr_loss=0.3845, over 20765.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1765, cr_loss=0.3923, over 4092952.98 frames. ], batch size: 56, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:31:37,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=226748.83333333334, ans=0.125 2024-09-15 03:31:42,156 INFO [train.py:1198] (1/2) Epoch 13, batch 3350, loss[loss=0.1955, ctc_loss=0.1307, cr_loss=0.3239, over 19442.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1763, cr_loss=0.3925, over 4087948.56 frames. ], batch size: 43, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:31:49,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226777.16666666666, ans=0.1 2024-09-15 03:32:27,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226862.16666666666, ans=0.1 2024-09-15 03:32:31,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=226862.16666666666, ans=0.125 2024-09-15 03:32:40,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.020e+02 2.116e+02 2.354e+02 3.758e+02, threshold=4.231e+02, percent-clipped=0.0 2024-09-15 03:32:56,761 INFO [train.py:1198] (1/2) Epoch 13, batch 3400, loss[loss=0.2709, ctc_loss=0.1896, cr_loss=0.4067, over 20286.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1759, cr_loss=0.391, over 4087013.56 frames. ], batch size: 74, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:33:07,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=226918.83333333334, ans=0.125 2024-09-15 03:33:43,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=227003.83333333334, ans=0.0 2024-09-15 03:34:15,207 INFO [train.py:1198] (1/2) Epoch 13, batch 3450, loss[loss=0.3454, ctc_loss=0.2447, cr_loss=0.5035, over 18272.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.177, cr_loss=0.3936, over 4086240.09 frames. ], batch size: 108, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:34:43,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=227117.16666666666, ans=0.125 2024-09-15 03:34:57,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=22.5 2024-09-15 03:35:15,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.054e+02 2.143e+02 2.366e+02 3.326e+02, threshold=4.286e+02, percent-clipped=0.0 2024-09-15 03:35:26,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=227173.83333333334, ans=0.125 2024-09-15 03:35:32,285 INFO [train.py:1198] (1/2) Epoch 13, batch 3500, loss[loss=0.2862, ctc_loss=0.2014, cr_loss=0.424, over 20970.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1772, cr_loss=0.394, over 4095124.85 frames. ], batch size: 67, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:35:37,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=227202.16666666666, ans=0.025 2024-09-15 03:35:42,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=15.0 2024-09-15 03:35:46,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=227230.5, ans=0.09899494936611666 2024-09-15 03:36:21,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227287.16666666666, ans=0.1 2024-09-15 03:36:33,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=227315.5, ans=0.125 2024-09-15 03:36:48,057 INFO [train.py:1198] (1/2) Epoch 13, batch 3550, loss[loss=0.2654, ctc_loss=0.1829, cr_loss=0.4124, over 20866.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1775, cr_loss=0.3945, over 4095451.04 frames. ], batch size: 57, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:37:29,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-15 03:37:36,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=227428.83333333334, ans=0.0 2024-09-15 03:37:39,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=227428.83333333334, ans=0.125 2024-09-15 03:37:47,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 1.995e+02 2.164e+02 2.282e+02 3.713e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 03:37:49,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227457.16666666666, ans=0.1 2024-09-15 03:38:03,621 INFO [train.py:1198] (1/2) Epoch 13, batch 3600, loss[loss=0.2545, ctc_loss=0.1776, cr_loss=0.3847, over 21000.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1775, cr_loss=0.394, over 4092363.02 frames. ], batch size: 61, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:38:23,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=227513.83333333334, ans=0.0 2024-09-15 03:39:17,838 INFO [train.py:1198] (1/2) Epoch 13, batch 3650, loss[loss=0.2669, ctc_loss=0.1842, cr_loss=0.4136, over 20976.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1785, cr_loss=0.3958, over 4093108.25 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:39:18,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=227627.16666666666, ans=0.125 2024-09-15 03:39:18,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=227627.16666666666, ans=0.0 2024-09-15 03:40:01,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=227683.83333333334, ans=0.125 2024-09-15 03:40:18,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.048e+02 2.204e+02 2.376e+02 6.268e+02, threshold=4.409e+02, percent-clipped=1.0 2024-09-15 03:40:29,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227740.5, ans=0.1 2024-09-15 03:40:38,054 INFO [train.py:1198] (1/2) Epoch 13, batch 3700, loss[loss=0.2332, ctc_loss=0.1594, cr_loss=0.369, over 20985.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1785, cr_loss=0.3951, over 4095837.36 frames. ], batch size: 48, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:40:38,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227768.83333333334, ans=0.1 2024-09-15 03:40:42,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-15 03:41:53,033 INFO [train.py:1198] (1/2) Epoch 13, batch 3750, loss[loss=0.2518, ctc_loss=0.1766, cr_loss=0.3759, over 20892.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1789, cr_loss=0.3957, over 4092004.13 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 64.0 2024-09-15 03:42:02,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=227910.5, ans=0.125 2024-09-15 03:42:06,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=227938.83333333334, ans=0.035 2024-09-15 03:42:15,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=227938.83333333334, ans=0.07 2024-09-15 03:42:18,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=227938.83333333334, ans=0.2 2024-09-15 03:42:51,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.063e+02 2.193e+02 2.473e+02 8.023e+02, threshold=4.386e+02, percent-clipped=1.0 2024-09-15 03:43:05,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228023.83333333334, ans=0.1 2024-09-15 03:43:08,139 INFO [train.py:1198] (1/2) Epoch 13, batch 3800, loss[loss=0.2904, ctc_loss=0.2023, cr_loss=0.4408, over 21017.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1789, cr_loss=0.396, over 4090361.89 frames. ], batch size: 63, lr: 6.45e-03, grad_scale: 64.0 2024-09-15 03:43:34,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2024-09-15 03:43:50,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=228108.83333333334, ans=0.125 2024-09-15 03:44:07,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=22.5 2024-09-15 03:44:12,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=228165.5, ans=0.2 2024-09-15 03:44:23,056 INFO [train.py:1198] (1/2) Epoch 13, batch 3850, loss[loss=0.2481, ctc_loss=0.1714, cr_loss=0.3834, over 20787.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1787, cr_loss=0.3948, over 4091739.50 frames. ], batch size: 53, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:44:33,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=228193.83333333334, ans=0.2 2024-09-15 03:44:44,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=228222.16666666666, ans=0.125 2024-09-15 03:45:26,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.056e+02 2.219e+02 2.407e+02 4.578e+02, threshold=4.439e+02, percent-clipped=1.0 2024-09-15 03:45:33,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=228307.16666666666, ans=0.125 2024-09-15 03:45:41,257 INFO [train.py:1198] (1/2) Epoch 13, batch 3900, loss[loss=0.2292, ctc_loss=0.1557, cr_loss=0.3675, over 20885.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.177, cr_loss=0.3922, over 4100071.58 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:45:46,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=228335.5, ans=0.2 2024-09-15 03:45:47,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=228335.5, ans=0.125 2024-09-15 03:45:53,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=228335.5, ans=0.2 2024-09-15 03:46:50,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=228448.83333333334, ans=0.0 2024-09-15 03:46:53,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228448.83333333334, ans=0.1 2024-09-15 03:46:58,769 INFO [train.py:1198] (1/2) Epoch 13, batch 3950, loss[loss=0.2315, ctc_loss=0.1571, cr_loss=0.3721, over 20884.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1772, cr_loss=0.3927, over 4094047.81 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:47:13,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=228505.5, ans=0.2 2024-09-15 03:47:58,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.016e+02 2.161e+02 2.409e+02 5.821e+02, threshold=4.323e+02, percent-clipped=1.0 2024-09-15 03:48:13,142 INFO [train.py:1198] (1/2) Epoch 13, batch 4000, loss[loss=0.2368, ctc_loss=0.1619, cr_loss=0.3742, over 20775.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1767, cr_loss=0.392, over 4093966.07 frames. ], batch size: 53, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:48:36,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=228647.16666666666, ans=0.125 2024-09-15 03:48:39,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=228647.16666666666, ans=0.125 2024-09-15 03:48:43,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228675.5, ans=0.1 2024-09-15 03:48:47,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2024-09-15 03:48:55,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=228675.5, ans=0.0 2024-09-15 03:49:05,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-09-15 03:49:16,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=228732.16666666666, ans=0.125 2024-09-15 03:49:28,335 INFO [train.py:1198] (1/2) Epoch 13, batch 4050, loss[loss=0.2585, ctc_loss=0.1806, cr_loss=0.3894, over 20967.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1777, cr_loss=0.393, over 4091511.91 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:49:46,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2024-09-15 03:49:51,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228788.83333333334, ans=0.1 2024-09-15 03:50:05,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=228817.16666666666, ans=0.5 2024-09-15 03:50:19,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=228845.5, ans=0.025 2024-09-15 03:50:28,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.064e+02 2.269e+02 2.553e+02 4.076e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-15 03:50:43,597 INFO [train.py:1198] (1/2) Epoch 13, batch 4100, loss[loss=0.2499, ctc_loss=0.1717, cr_loss=0.3911, over 20981.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1767, cr_loss=0.3926, over 4107290.65 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:50:54,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-15 03:50:57,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228930.5, ans=0.1 2024-09-15 03:51:08,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228930.5, ans=0.1 2024-09-15 03:51:20,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=228958.83333333334, ans=0.125 2024-09-15 03:52:03,890 INFO [train.py:1198] (1/2) Epoch 13, batch 4150, loss[loss=0.2513, ctc_loss=0.1728, cr_loss=0.3922, over 20759.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1763, cr_loss=0.3919, over 4115335.26 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2024-09-15 03:52:07,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=229043.83333333334, ans=0.025 2024-09-15 03:52:14,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=229043.83333333334, ans=0.0 2024-09-15 03:53:01,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-15 03:53:03,884 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 1.981e+02 2.107e+02 2.340e+02 3.732e+02, threshold=4.214e+02, percent-clipped=0.0 2024-09-15 03:53:17,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229185.5, ans=0.125 2024-09-15 03:53:19,033 INFO [train.py:1198] (1/2) Epoch 13, batch 4200, loss[loss=0.2875, ctc_loss=0.2004, cr_loss=0.4354, over 20841.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1763, cr_loss=0.3925, over 4118612.82 frames. ], batch size: 65, lr: 6.43e-03, grad_scale: 32.0 2024-09-15 03:54:22,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=229298.83333333334, ans=0.125 2024-09-15 03:54:34,189 INFO [train.py:1198] (1/2) Epoch 13, batch 4250, loss[loss=0.257, ctc_loss=0.1768, cr_loss=0.4006, over 20937.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.176, cr_loss=0.3922, over 4116599.26 frames. ], batch size: 60, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:54:36,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-09-15 03:54:43,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=229327.16666666666, ans=0.02 2024-09-15 03:54:48,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=229355.5, ans=0.125 2024-09-15 03:55:14,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2024-09-15 03:55:20,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-09-15 03:55:36,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.023e+02 2.160e+02 2.370e+02 3.150e+02, threshold=4.320e+02, percent-clipped=0.0 2024-09-15 03:55:50,491 INFO [train.py:1198] (1/2) Epoch 13, batch 4300, loss[loss=0.2288, ctc_loss=0.1575, cr_loss=0.3565, over 20987.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1745, cr_loss=0.3902, over 4117947.42 frames. ], batch size: 55, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:55:56,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=229468.83333333334, ans=0.2 2024-09-15 03:56:25,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=229525.5, ans=0.0 2024-09-15 03:56:27,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=229525.5, ans=0.0 2024-09-15 03:57:08,882 INFO [train.py:1198] (1/2) Epoch 13, batch 4350, loss[loss=0.2556, ctc_loss=0.1738, cr_loss=0.4089, over 21038.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1739, cr_loss=0.3897, over 4123206.56 frames. ], batch size: 62, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:57:15,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=229610.5, ans=0.0 2024-09-15 03:57:21,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229610.5, ans=0.125 2024-09-15 03:57:31,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229638.83333333334, ans=0.1 2024-09-15 03:58:13,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.054e+02 2.239e+02 2.432e+02 3.142e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-15 03:58:17,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=229723.83333333334, ans=0.09899494936611666 2024-09-15 03:58:26,471 INFO [train.py:1198] (1/2) Epoch 13, batch 4400, loss[loss=0.2171, ctc_loss=0.1467, cr_loss=0.352, over 20994.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1738, cr_loss=0.3896, over 4121276.15 frames. ], batch size: 51, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 03:58:26,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=229752.16666666666, ans=0.035 2024-09-15 03:58:30,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229752.16666666666, ans=0.1 2024-09-15 03:58:44,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=229780.5, ans=0.125 2024-09-15 03:58:47,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=229780.5, ans=0.0 2024-09-15 03:59:35,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=229865.5, ans=0.0 2024-09-15 03:59:37,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=229865.5, ans=0.2 2024-09-15 03:59:41,610 INFO [train.py:1198] (1/2) Epoch 13, batch 4450, loss[loss=0.254, ctc_loss=0.1761, cr_loss=0.3895, over 21014.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1745, cr_loss=0.3898, over 4110944.75 frames. ], batch size: 63, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 03:59:44,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=229893.83333333334, ans=0.015 2024-09-15 04:00:30,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=229978.83333333334, ans=0.025 2024-09-15 04:00:43,867 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.099e+02 2.285e+02 2.613e+02 3.993e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-15 04:00:57,435 INFO [train.py:1198] (1/2) Epoch 13, batch 4500, loss[loss=0.2678, ctc_loss=0.1821, cr_loss=0.4281, over 21068.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1745, cr_loss=0.3893, over 4101205.67 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:01:33,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=230092.16666666666, ans=0.05 2024-09-15 04:02:12,521 INFO [train.py:1198] (1/2) Epoch 13, batch 4550, loss[loss=0.2494, ctc_loss=0.1681, cr_loss=0.4063, over 20890.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1752, cr_loss=0.3902, over 4100124.07 frames. ], batch size: 54, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:02:34,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-15 04:02:36,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=230205.5, ans=0.04949747468305833 2024-09-15 04:03:16,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.711e+02 2.085e+02 2.275e+02 2.544e+02 3.349e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-15 04:03:20,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=230290.5, ans=0.2 2024-09-15 04:03:33,382 INFO [train.py:1198] (1/2) Epoch 13, batch 4600, loss[loss=0.2433, ctc_loss=0.167, cr_loss=0.3812, over 20979.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1761, cr_loss=0.3911, over 4098346.98 frames. ], batch size: 48, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:03:33,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=230318.83333333334, ans=0.0 2024-09-15 04:03:39,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230318.83333333334, ans=0.125 2024-09-15 04:03:51,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=230347.16666666666, ans=0.05 2024-09-15 04:04:14,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=230375.5, ans=0.125 2024-09-15 04:04:32,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=230432.16666666666, ans=0.015 2024-09-15 04:04:48,250 INFO [train.py:1198] (1/2) Epoch 13, batch 4650, loss[loss=0.2705, ctc_loss=0.1852, cr_loss=0.4264, over 20675.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1768, cr_loss=0.3923, over 4087103.49 frames. ], batch size: 68, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:05:09,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230488.83333333334, ans=0.1 2024-09-15 04:05:44,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=230545.5, ans=0.0 2024-09-15 04:05:50,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.016e+02 2.167e+02 2.350e+02 3.389e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 04:05:56,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=230573.83333333334, ans=0.2 2024-09-15 04:06:01,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=230573.83333333334, ans=22.5 2024-09-15 04:06:02,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=230602.16666666666, ans=0.0 2024-09-15 04:06:03,793 INFO [train.py:1198] (1/2) Epoch 13, batch 4700, loss[loss=0.2462, ctc_loss=0.1697, cr_loss=0.3823, over 21081.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1771, cr_loss=0.3929, over 4089974.78 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:07:18,826 INFO [train.py:1198] (1/2) Epoch 13, batch 4750, loss[loss=0.2236, ctc_loss=0.1519, cr_loss=0.3584, over 20954.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1761, cr_loss=0.3917, over 4088017.91 frames. ], batch size: 48, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:07:19,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230743.83333333334, ans=0.125 2024-09-15 04:07:28,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=230743.83333333334, ans=0.04949747468305833 2024-09-15 04:07:37,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=230772.16666666666, ans=0.2 2024-09-15 04:07:50,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=230800.5, ans=0.015 2024-09-15 04:08:06,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=230828.83333333334, ans=0.1 2024-09-15 04:08:21,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=230857.16666666666, ans=0.2 2024-09-15 04:08:22,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.050e+02 2.167e+02 2.373e+02 4.077e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 04:08:36,402 INFO [train.py:1198] (1/2) Epoch 13, batch 4800, loss[loss=0.2745, ctc_loss=0.1866, cr_loss=0.4391, over 21036.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1766, cr_loss=0.3924, over 4086832.50 frames. ], batch size: 62, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:09:06,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=230942.16666666666, ans=0.025 2024-09-15 04:09:24,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230970.5, ans=0.1 2024-09-15 04:09:28,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=230970.5, ans=10.0 2024-09-15 04:09:35,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=230970.5, ans=0.125 2024-09-15 04:09:42,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=230998.83333333334, ans=0.0 2024-09-15 04:09:48,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=230998.83333333334, ans=0.025 2024-09-15 04:09:54,125 INFO [train.py:1198] (1/2) Epoch 13, batch 4850, loss[loss=0.3223, ctc_loss=0.2355, cr_loss=0.4336, over 14616.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1763, cr_loss=0.3915, over 4077783.50 frames. ], batch size: 149, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:09:59,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=231027.16666666666, ans=0.125 2024-09-15 04:10:11,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=231055.5, ans=0.2 2024-09-15 04:10:12,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=231055.5, ans=0.125 2024-09-15 04:10:19,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=231055.5, ans=0.125 2024-09-15 04:10:42,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231112.16666666666, ans=0.1 2024-09-15 04:10:54,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-15 04:10:55,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.691e+02 2.070e+02 2.229e+02 2.492e+02 4.047e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 04:11:08,626 INFO [train.py:1198] (1/2) Epoch 13, batch 4900, loss[loss=0.2481, ctc_loss=0.1716, cr_loss=0.3824, over 20280.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1769, cr_loss=0.3922, over 4087077.47 frames. ], batch size: 74, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:11:23,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=231197.16666666666, ans=0.02 2024-09-15 04:11:57,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-09-15 04:11:59,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=231253.83333333334, ans=0.125 2024-09-15 04:12:02,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=231253.83333333334, ans=0.2 2024-09-15 04:12:15,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=231282.16666666666, ans=0.0 2024-09-15 04:12:22,478 INFO [train.py:1198] (1/2) Epoch 13, batch 4950, loss[loss=0.2708, ctc_loss=0.185, cr_loss=0.4291, over 20958.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.177, cr_loss=0.3927, over 4088615.75 frames. ], batch size: 64, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:12:33,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=231310.5, ans=0.0 2024-09-15 04:12:34,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=231310.5, ans=0.04949747468305833 2024-09-15 04:12:39,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-09-15 04:12:43,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=231338.83333333334, ans=0.0 2024-09-15 04:12:48,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=231338.83333333334, ans=0.0 2024-09-15 04:12:55,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231367.16666666666, ans=0.1 2024-09-15 04:13:23,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.061e+02 2.191e+02 2.435e+02 4.036e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 04:13:26,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=231423.83333333334, ans=0.0 2024-09-15 04:13:36,785 INFO [train.py:1198] (1/2) Epoch 13, batch 5000, loss[loss=0.2276, ctc_loss=0.1542, cr_loss=0.367, over 19922.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1777, cr_loss=0.3935, over 4082191.70 frames. ], batch size: 44, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:13:46,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231452.16666666666, ans=0.125 2024-09-15 04:13:56,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=231480.5, ans=0.2 2024-09-15 04:14:16,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=231508.83333333334, ans=0.025 2024-09-15 04:14:51,157 INFO [train.py:1198] (1/2) Epoch 13, batch 5050, loss[loss=0.2941, ctc_loss=0.2068, cr_loss=0.4364, over 20844.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1777, cr_loss=0.3934, over 4091683.11 frames. ], batch size: 65, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:14:52,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=231593.83333333334, ans=0.125 2024-09-15 04:14:55,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=231593.83333333334, ans=10.0 2024-09-15 04:15:08,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231622.16666666666, ans=0.1 2024-09-15 04:15:35,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=231650.5, ans=0.125 2024-09-15 04:15:37,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=231678.83333333334, ans=0.125 2024-09-15 04:15:54,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.652e+02 2.020e+02 2.145e+02 2.341e+02 4.925e+02, threshold=4.290e+02, percent-clipped=1.0 2024-09-15 04:15:55,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231707.16666666666, ans=0.1 2024-09-15 04:16:04,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=231707.16666666666, ans=0.0 2024-09-15 04:16:08,601 INFO [train.py:1198] (1/2) Epoch 13, batch 5100, loss[loss=0.2679, ctc_loss=0.1857, cr_loss=0.411, over 20844.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1777, cr_loss=0.3936, over 4090030.26 frames. ], batch size: 65, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:16:12,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-15 04:16:17,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231735.5, ans=0.1 2024-09-15 04:16:25,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=231763.83333333334, ans=0.2 2024-09-15 04:16:26,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=231763.83333333334, ans=0.125 2024-09-15 04:16:50,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=231792.16666666666, ans=0.2 2024-09-15 04:17:00,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=231820.5, ans=0.125 2024-09-15 04:17:11,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=231848.83333333334, ans=0.125 2024-09-15 04:17:24,835 INFO [train.py:1198] (1/2) Epoch 13, batch 5150, loss[loss=0.2081, ctc_loss=0.1397, cr_loss=0.3421, over 20949.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1778, cr_loss=0.3932, over 4087967.16 frames. ], batch size: 48, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:17:52,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=231933.83333333334, ans=0.125 2024-09-15 04:18:25,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.103e+02 2.398e+02 2.600e+02 4.837e+02, threshold=4.796e+02, percent-clipped=2.0 2024-09-15 04:18:38,899 INFO [train.py:1198] (1/2) Epoch 13, batch 5200, loss[loss=0.2514, ctc_loss=0.1739, cr_loss=0.3875, over 20975.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1772, cr_loss=0.3925, over 4084750.30 frames. ], batch size: 52, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:18:46,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=232018.83333333334, ans=0.125 2024-09-15 04:18:55,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2024-09-15 04:19:11,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=232075.5, ans=0.0 2024-09-15 04:19:36,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=232132.16666666666, ans=0.2 2024-09-15 04:19:49,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=232132.16666666666, ans=0.125 2024-09-15 04:19:52,373 INFO [train.py:1198] (1/2) Epoch 13, batch 5250, loss[loss=0.1968, ctc_loss=0.1304, cr_loss=0.3322, over 20947.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.176, cr_loss=0.3911, over 4087443.78 frames. ], batch size: 51, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:20:22,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=232217.16666666666, ans=0.125 2024-09-15 04:20:45,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-09-15 04:20:51,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=232273.83333333334, ans=0.125 2024-09-15 04:20:51,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=232273.83333333334, ans=0.0 2024-09-15 04:20:52,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.002e+02 2.168e+02 2.399e+02 3.136e+02, threshold=4.335e+02, percent-clipped=0.0 2024-09-15 04:20:55,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=232273.83333333334, ans=0.125 2024-09-15 04:21:05,950 INFO [train.py:1198] (1/2) Epoch 13, batch 5300, loss[loss=0.2499, ctc_loss=0.1718, cr_loss=0.3907, over 20866.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1763, cr_loss=0.3911, over 4090927.29 frames. ], batch size: 57, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:21:25,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=232330.5, ans=0.125 2024-09-15 04:21:35,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=232358.83333333334, ans=0.125 2024-09-15 04:21:41,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=232358.83333333334, ans=0.015 2024-09-15 04:21:41,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232358.83333333334, ans=0.1 2024-09-15 04:22:06,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=8.0 2024-09-15 04:22:06,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=232415.5, ans=0.0 2024-09-15 04:22:19,830 INFO [train.py:1198] (1/2) Epoch 13, batch 5350, loss[loss=0.2009, ctc_loss=0.1345, cr_loss=0.3322, over 20962.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.176, cr_loss=0.39, over 4088904.35 frames. ], batch size: 48, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:22:22,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-15 04:23:14,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=22.5 2024-09-15 04:23:19,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.073e+02 2.241e+02 2.493e+02 6.265e+02, threshold=4.482e+02, percent-clipped=2.0 2024-09-15 04:23:33,057 INFO [train.py:1198] (1/2) Epoch 13, batch 5400, loss[loss=0.3069, ctc_loss=0.2193, cr_loss=0.4382, over 20071.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.176, cr_loss=0.3909, over 4093144.98 frames. ], batch size: 80, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:23:34,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-09-15 04:23:39,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-15 04:23:55,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=232613.83333333334, ans=0.125 2024-09-15 04:24:26,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=232670.5, ans=0.0 2024-09-15 04:24:40,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=232698.83333333334, ans=0.125 2024-09-15 04:24:48,893 INFO [train.py:1198] (1/2) Epoch 13, batch 5450, loss[loss=0.2897, ctc_loss=0.2038, cr_loss=0.4293, over 20835.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1758, cr_loss=0.3911, over 4097757.70 frames. ], batch size: 65, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:25:04,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-09-15 04:25:24,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=232783.83333333334, ans=0.125 2024-09-15 04:25:45,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-15 04:25:49,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.039e+02 2.194e+02 2.418e+02 3.248e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 04:26:02,164 INFO [train.py:1198] (1/2) Epoch 13, batch 5500, loss[loss=0.3101, ctc_loss=0.2303, cr_loss=0.3986, over 14144.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1748, cr_loss=0.3901, over 4105678.64 frames. ], batch size: 149, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:26:09,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232868.83333333334, ans=0.1 2024-09-15 04:26:20,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232897.16666666666, ans=0.125 2024-09-15 04:26:23,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=232897.16666666666, ans=0.0 2024-09-15 04:26:31,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=232897.16666666666, ans=0.2 2024-09-15 04:27:18,467 INFO [train.py:1198] (1/2) Epoch 13, batch 5550, loss[loss=0.2672, ctc_loss=0.1906, cr_loss=0.3829, over 20838.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1753, cr_loss=0.391, over 4110080.62 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:27:29,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=22.5 2024-09-15 04:27:38,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=233038.83333333334, ans=0.125 2024-09-15 04:27:45,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233038.83333333334, ans=0.1 2024-09-15 04:27:45,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=233038.83333333334, ans=0.125 2024-09-15 04:27:48,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=233067.16666666666, ans=0.025 2024-09-15 04:27:48,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=233067.16666666666, ans=0.0 2024-09-15 04:28:13,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 04:28:18,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.073e+02 2.221e+02 2.441e+02 3.591e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 04:28:31,923 INFO [train.py:1198] (1/2) Epoch 13, batch 5600, loss[loss=0.2804, ctc_loss=0.195, cr_loss=0.4271, over 20680.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1756, cr_loss=0.3914, over 4114010.17 frames. ], batch size: 66, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:28:32,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233152.16666666666, ans=0.1 2024-09-15 04:28:39,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=233152.16666666666, ans=0.125 2024-09-15 04:28:42,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233152.16666666666, ans=0.1 2024-09-15 04:28:59,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=233180.5, ans=22.5 2024-09-15 04:29:42,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=233265.5, ans=0.125 2024-09-15 04:29:45,345 INFO [train.py:1198] (1/2) Epoch 13, batch 5650, loss[loss=0.2433, ctc_loss=0.1719, cr_loss=0.3567, over 19432.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1747, cr_loss=0.3895, over 4121047.51 frames. ], batch size: 90, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:30:05,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-15 04:30:46,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 1.990e+02 2.120e+02 2.287e+02 3.734e+02, threshold=4.241e+02, percent-clipped=0.0 2024-09-15 04:30:59,517 INFO [train.py:1198] (1/2) Epoch 13, batch 5700, loss[loss=0.2134, ctc_loss=0.1433, cr_loss=0.3507, over 20947.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1748, cr_loss=0.3902, over 4127805.65 frames. ], batch size: 51, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:30:59,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=233435.5, ans=0.05 2024-09-15 04:31:29,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233492.16666666666, ans=0.125 2024-09-15 04:31:33,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=233492.16666666666, ans=0.0 2024-09-15 04:31:51,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=233520.5, ans=0.0 2024-09-15 04:31:54,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=233520.5, ans=0.0 2024-09-15 04:31:59,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233548.83333333334, ans=0.1 2024-09-15 04:32:13,579 INFO [train.py:1198] (1/2) Epoch 13, batch 5750, loss[loss=0.2779, ctc_loss=0.1945, cr_loss=0.417, over 20284.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1747, cr_loss=0.3908, over 4126253.84 frames. ], batch size: 74, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:32:30,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2024-09-15 04:32:31,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233605.5, ans=0.1 2024-09-15 04:32:31,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-15 04:32:39,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=233605.5, ans=0.125 2024-09-15 04:32:50,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=233633.83333333334, ans=6.0 2024-09-15 04:32:59,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=15.0 2024-09-15 04:33:01,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=233662.16666666666, ans=0.125 2024-09-15 04:33:14,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.064e+02 2.192e+02 2.396e+02 3.460e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 04:33:30,313 INFO [train.py:1198] (1/2) Epoch 13, batch 5800, loss[loss=0.2592, ctc_loss=0.1831, cr_loss=0.3804, over 21004.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1739, cr_loss=0.3893, over 4129239.27 frames. ], batch size: 63, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:33:56,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=233747.16666666666, ans=0.125 2024-09-15 04:34:40,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=233832.16666666666, ans=0.2 2024-09-15 04:34:44,699 INFO [train.py:1198] (1/2) Epoch 13, batch 5850, loss[loss=0.2143, ctc_loss=0.1438, cr_loss=0.3525, over 20993.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1737, cr_loss=0.3887, over 4110074.64 frames. ], batch size: 48, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:34:44,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=233860.5, ans=0.125 2024-09-15 04:35:33,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-15 04:35:37,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233945.5, ans=0.1 2024-09-15 04:35:40,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233945.5, ans=0.125 2024-09-15 04:35:47,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.684e+02 2.052e+02 2.224e+02 2.433e+02 4.392e+02, threshold=4.449e+02, percent-clipped=1.0 2024-09-15 04:36:00,662 INFO [train.py:1198] (1/2) Epoch 13, batch 5900, loss[loss=0.3147, ctc_loss=0.2259, cr_loss=0.444, over 14396.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1745, cr_loss=0.3891, over 4099010.35 frames. ], batch size: 149, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:36:07,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-15 04:36:20,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-15 04:36:30,781 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:36:38,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-15 04:36:40,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=234058.83333333334, ans=0.125 2024-09-15 04:36:42,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234058.83333333334, ans=0.1 2024-09-15 04:37:14,405 INFO [train.py:1198] (1/2) Epoch 13, batch 5950, loss[loss=0.2234, ctc_loss=0.1519, cr_loss=0.3577, over 20939.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1753, cr_loss=0.3905, over 4103144.98 frames. ], batch size: 50, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:37:19,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=22.5 2024-09-15 04:37:50,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=234200.5, ans=0.125 2024-09-15 04:38:14,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=22.5 2024-09-15 04:38:15,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.071e+02 2.203e+02 2.385e+02 5.164e+02, threshold=4.405e+02, percent-clipped=1.0 2024-09-15 04:38:28,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2024-09-15 04:38:28,665 INFO [train.py:1198] (1/2) Epoch 13, batch 6000, loss[loss=0.25, ctc_loss=0.1726, cr_loss=0.3873, over 20949.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1756, cr_loss=0.3905, over 4094175.99 frames. ], batch size: 60, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:38:28,666 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 04:38:50,996 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.04815, ctc_loss=0.04815, cr_loss=9.906e-15, over 944034.00 frames. 2024-09-15 04:38:50,996 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 04:39:09,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-15 04:39:10,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-15 04:40:05,217 INFO [train.py:1198] (1/2) Epoch 13, batch 6050, loss[loss=0.2115, ctc_loss=0.1465, cr_loss=0.325, over 20970.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1761, cr_loss=0.3909, over 4083125.75 frames. ], batch size: 51, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:40:20,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234455.5, ans=0.1 2024-09-15 04:40:20,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=234455.5, ans=0.07 2024-09-15 04:40:29,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=234455.5, ans=0.125 2024-09-15 04:40:41,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=234483.83333333334, ans=0.0 2024-09-15 04:40:45,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2024-09-15 04:40:56,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234512.16666666666, ans=0.125 2024-09-15 04:41:01,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-15 04:41:06,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.090e+02 2.204e+02 2.375e+02 4.545e+02, threshold=4.407e+02, percent-clipped=1.0 2024-09-15 04:41:18,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=234540.5, ans=0.125 2024-09-15 04:41:21,323 INFO [train.py:1198] (1/2) Epoch 13, batch 6100, loss[loss=0.2637, ctc_loss=0.1828, cr_loss=0.4047, over 20659.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1763, cr_loss=0.3912, over 4084780.74 frames. ], batch size: 68, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:41:48,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=12.0 2024-09-15 04:41:51,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=234625.5, ans=0.0 2024-09-15 04:42:07,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-09-15 04:42:08,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=234653.83333333334, ans=0.0 2024-09-15 04:42:21,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=234682.16666666666, ans=0.0 2024-09-15 04:42:25,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=234682.16666666666, ans=0.2 2024-09-15 04:42:30,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=234682.16666666666, ans=0.0 2024-09-15 04:42:31,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=234682.16666666666, ans=0.125 2024-09-15 04:42:34,388 INFO [train.py:1198] (1/2) Epoch 13, batch 6150, loss[loss=0.2563, ctc_loss=0.1714, cr_loss=0.4245, over 20967.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1759, cr_loss=0.3909, over 4081314.06 frames. ], batch size: 55, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:42:34,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=234710.5, ans=0.125 2024-09-15 04:42:59,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2024-09-15 04:43:05,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=234767.16666666666, ans=0.2 2024-09-15 04:43:21,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=234795.5, ans=0.2 2024-09-15 04:43:35,982 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.073e+02 2.219e+02 2.414e+02 7.028e+02, threshold=4.438e+02, percent-clipped=2.0 2024-09-15 04:43:39,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234823.83333333334, ans=0.1 2024-09-15 04:43:48,885 INFO [train.py:1198] (1/2) Epoch 13, batch 6200, loss[loss=0.2451, ctc_loss=0.1643, cr_loss=0.4038, over 20955.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1764, cr_loss=0.3913, over 4072943.79 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:43:49,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=234852.16666666666, ans=0.2 2024-09-15 04:43:52,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=234852.16666666666, ans=0.05 2024-09-15 04:43:57,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=234852.16666666666, ans=0.2 2024-09-15 04:44:28,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=234908.83333333334, ans=0.0 2024-09-15 04:44:31,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=234937.16666666666, ans=0.125 2024-09-15 04:44:41,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234937.16666666666, ans=0.125 2024-09-15 04:44:50,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234965.5, ans=0.0 2024-09-15 04:44:52,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=22.5 2024-09-15 04:44:56,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=234965.5, ans=0.0 2024-09-15 04:45:01,608 INFO [train.py:1198] (1/2) Epoch 13, batch 6250, loss[loss=0.2795, ctc_loss=0.1938, cr_loss=0.4284, over 20016.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.179, cr_loss=0.3927, over 4004177.24 frames. ], batch size: 80, lr: 6.35e-03, grad_scale: 64.0 2024-09-15 04:45:33,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=235050.5, ans=0.125 2024-09-15 04:45:35,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=235050.5, ans=0.125 2024-09-15 04:45:41,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=235050.5, ans=0.0 2024-09-15 04:46:00,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.192e+02 2.367e+02 2.698e+02 3.520e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-15 04:46:12,894 INFO [train.py:1198] (1/2) Epoch 13, batch 6300, loss[loss=0.2779, ctc_loss=0.1961, cr_loss=0.4091, over 20225.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1848, cr_loss=0.3988, over 3924797.45 frames. ], batch size: 80, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:46:26,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=235163.83333333334, ans=0.125 2024-09-15 04:46:39,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:47:10,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-15 04:47:22,415 INFO [train.py:1198] (1/2) Epoch 13, batch 6350, loss[loss=0.2853, ctc_loss=0.2116, cr_loss=0.3685, over 14147.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.19, cr_loss=0.4022, over 3782098.36 frames. ], batch size: 151, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:49:06,942 INFO [train.py:1198] (1/2) Epoch 14, batch 0, loss[loss=0.2709, ctc_loss=0.1862, cr_loss=0.4234, over 20957.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1862, cr_loss=0.4234, over 20957.00 frames. ], batch size: 64, lr: 6.12e-03, grad_scale: 32.0 2024-09-15 04:49:06,942 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 04:49:25,008 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.04984, ctc_loss=0.04984, cr_loss=9.807e-15, over 944034.00 frames. 2024-09-15 04:49:25,008 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 04:49:26,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.389e+02 2.523e+02 2.822e+02 4.019e+02, threshold=5.045e+02, percent-clipped=0.0 2024-09-15 04:50:18,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=235478.33333333334, ans=0.0 2024-09-15 04:50:36,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=235506.66666666666, ans=0.035 2024-09-15 04:50:40,379 INFO [train.py:1198] (1/2) Epoch 14, batch 50, loss[loss=0.2537, ctc_loss=0.1758, cr_loss=0.3895, over 20877.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.172, cr_loss=0.3844, over 930232.54 frames. ], batch size: 57, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:50:55,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=235563.33333333334, ans=0.05 2024-09-15 04:51:11,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=235591.66666666666, ans=0.0 2024-09-15 04:51:45,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=235648.33333333334, ans=10.0 2024-09-15 04:51:50,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=235648.33333333334, ans=0.125 2024-09-15 04:51:55,707 INFO [train.py:1198] (1/2) Epoch 14, batch 100, loss[loss=0.2107, ctc_loss=0.1427, cr_loss=0.3399, over 19847.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.172, cr_loss=0.3863, over 1631966.01 frames. ], batch size: 44, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:51:56,100 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:51:57,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.024e+02 2.145e+02 2.322e+02 3.133e+02, threshold=4.290e+02, percent-clipped=0.0 2024-09-15 04:52:28,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=235733.33333333334, ans=15.0 2024-09-15 04:52:43,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-15 04:52:47,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235761.66666666666, ans=0.125 2024-09-15 04:53:13,800 INFO [train.py:1198] (1/2) Epoch 14, batch 150, loss[loss=0.2554, ctc_loss=0.1782, cr_loss=0.3862, over 20776.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1738, cr_loss=0.3907, over 2182850.26 frames. ], batch size: 56, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:53:18,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=235818.33333333334, ans=0.0 2024-09-15 04:53:24,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=235818.33333333334, ans=0.02 2024-09-15 04:53:26,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=235818.33333333334, ans=0.0 2024-09-15 04:53:43,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=235875.0, ans=0.125 2024-09-15 04:53:48,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235875.0, ans=0.1 2024-09-15 04:54:16,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=235931.66666666666, ans=0.125 2024-09-15 04:54:21,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235931.66666666666, ans=0.1 2024-09-15 04:54:24,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=235931.66666666666, ans=0.125 2024-09-15 04:54:27,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235960.0, ans=0.0 2024-09-15 04:54:28,541 INFO [train.py:1198] (1/2) Epoch 14, batch 200, loss[loss=0.2743, ctc_loss=0.1935, cr_loss=0.404, over 19964.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1743, cr_loss=0.3916, over 2598375.61 frames. ], batch size: 80, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:54:30,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.027e+02 2.156e+02 2.358e+02 4.767e+02, threshold=4.313e+02, percent-clipped=1.0 2024-09-15 04:54:47,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=235988.33333333334, ans=0.125 2024-09-15 04:54:56,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=235988.33333333334, ans=0.2 2024-09-15 04:55:39,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=236073.33333333334, ans=0.2 2024-09-15 04:55:46,404 INFO [train.py:1198] (1/2) Epoch 14, batch 250, loss[loss=0.2987, ctc_loss=0.2061, cr_loss=0.4631, over 20649.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1752, cr_loss=0.3918, over 2902156.63 frames. ], batch size: 66, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:55:47,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-15 04:56:38,063 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:56:41,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=236186.66666666666, ans=0.125 2024-09-15 04:56:50,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236215.0, ans=0.1 2024-09-15 04:56:57,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=236215.0, ans=0.07 2024-09-15 04:57:01,675 INFO [train.py:1198] (1/2) Epoch 14, batch 300, loss[loss=0.2535, ctc_loss=0.1745, cr_loss=0.3949, over 21069.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1759, cr_loss=0.3929, over 3159123.40 frames. ], batch size: 59, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 04:57:03,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.107e+02 2.203e+02 2.405e+02 3.359e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-15 04:57:06,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=236243.33333333334, ans=0.0 2024-09-15 04:57:06,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=236243.33333333334, ans=0.125 2024-09-15 04:57:13,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236243.33333333334, ans=0.125 2024-09-15 04:57:38,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-15 04:57:43,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=12.0 2024-09-15 04:58:07,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=236356.66666666666, ans=0.2 2024-09-15 04:58:19,366 INFO [train.py:1198] (1/2) Epoch 14, batch 350, loss[loss=0.2518, ctc_loss=0.1766, cr_loss=0.376, over 20827.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.177, cr_loss=0.3941, over 3351532.88 frames. ], batch size: 65, lr: 6.10e-03, grad_scale: 16.0 2024-09-15 04:58:30,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2024-09-15 04:58:39,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=236413.33333333334, ans=0.125 2024-09-15 04:58:47,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-15 04:59:12,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=236470.0, ans=0.125 2024-09-15 04:59:18,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=236498.33333333334, ans=0.0 2024-09-15 04:59:34,564 INFO [train.py:1198] (1/2) Epoch 14, batch 400, loss[loss=0.2106, ctc_loss=0.142, cr_loss=0.3428, over 20963.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1762, cr_loss=0.393, over 3517821.28 frames. ], batch size: 50, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 04:59:37,468 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.028e+02 2.148e+02 2.357e+02 3.906e+02, threshold=4.296e+02, percent-clipped=0.0 2024-09-15 04:59:56,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=236555.0, ans=6.0 2024-09-15 05:00:10,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=236583.33333333334, ans=0.125 2024-09-15 05:00:24,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236611.66666666666, ans=0.1 2024-09-15 05:00:52,943 INFO [train.py:1198] (1/2) Epoch 14, batch 450, loss[loss=0.2763, ctc_loss=0.1933, cr_loss=0.4148, over 20057.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1755, cr_loss=0.3921, over 3643609.56 frames. ], batch size: 80, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:00:54,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=236668.33333333334, ans=0.125 2024-09-15 05:01:21,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=236725.0, ans=0.125 2024-09-15 05:02:08,283 INFO [train.py:1198] (1/2) Epoch 14, batch 500, loss[loss=0.2497, ctc_loss=0.1702, cr_loss=0.3972, over 20966.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1749, cr_loss=0.3918, over 3740318.27 frames. ], batch size: 64, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:02:11,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.083e+02 2.195e+02 2.401e+02 5.537e+02, threshold=4.389e+02, percent-clipped=1.0 2024-09-15 05:02:13,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236810.0, ans=0.125 2024-09-15 05:02:19,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236810.0, ans=0.1 2024-09-15 05:02:22,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=236838.33333333334, ans=0.125 2024-09-15 05:02:46,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=236866.66666666666, ans=0.0 2024-09-15 05:03:06,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2024-09-15 05:03:19,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=236923.33333333334, ans=0.0 2024-09-15 05:03:22,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=236923.33333333334, ans=0.125 2024-09-15 05:03:26,740 INFO [train.py:1198] (1/2) Epoch 14, batch 550, loss[loss=0.2571, ctc_loss=0.1751, cr_loss=0.41, over 21015.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1748, cr_loss=0.3899, over 3803144.61 frames. ], batch size: 61, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:04:41,119 INFO [train.py:1198] (1/2) Epoch 14, batch 600, loss[loss=0.2973, ctc_loss=0.2152, cr_loss=0.4101, over 14302.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1755, cr_loss=0.3912, over 3859717.62 frames. ], batch size: 149, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:04:44,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.005e+02 2.152e+02 2.363e+02 2.998e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 05:04:57,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=237121.66666666666, ans=0.0 2024-09-15 05:05:19,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=237150.0, ans=0.0 2024-09-15 05:05:32,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237178.33333333334, ans=0.125 2024-09-15 05:05:41,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2024-09-15 05:05:53,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237206.66666666666, ans=0.1 2024-09-15 05:05:56,092 INFO [train.py:1198] (1/2) Epoch 14, batch 650, loss[loss=0.2711, ctc_loss=0.1862, cr_loss=0.4245, over 20666.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1752, cr_loss=0.3909, over 3915765.27 frames. ], batch size: 68, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:06:56,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2024-09-15 05:07:15,604 INFO [train.py:1198] (1/2) Epoch 14, batch 700, loss[loss=0.2761, ctc_loss=0.1887, cr_loss=0.4368, over 20700.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1758, cr_loss=0.3922, over 3952456.99 frames. ], batch size: 71, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:07:18,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.061e+02 2.179e+02 2.353e+02 3.627e+02, threshold=4.359e+02, percent-clipped=0.0 2024-09-15 05:07:29,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-15 05:07:45,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=237433.33333333334, ans=0.125 2024-09-15 05:08:21,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=237490.0, ans=0.125 2024-09-15 05:08:30,847 INFO [train.py:1198] (1/2) Epoch 14, batch 750, loss[loss=0.2593, ctc_loss=0.1809, cr_loss=0.3922, over 19408.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1743, cr_loss=0.3899, over 3993751.69 frames. ], batch size: 90, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:08:43,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=237518.33333333334, ans=0.5 2024-09-15 05:09:10,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=237575.0, ans=0.025 2024-09-15 05:09:28,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=237603.33333333334, ans=0.0 2024-09-15 05:09:39,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=237631.66666666666, ans=0.125 2024-09-15 05:09:48,991 INFO [train.py:1198] (1/2) Epoch 14, batch 800, loss[loss=0.2749, ctc_loss=0.1896, cr_loss=0.427, over 19567.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1747, cr_loss=0.3906, over 4011377.40 frames. ], batch size: 90, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:09:53,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.029e+02 2.193e+02 2.419e+02 2.978e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 05:09:53,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=237660.0, ans=0.125 2024-09-15 05:11:03,914 INFO [train.py:1198] (1/2) Epoch 14, batch 850, loss[loss=0.2645, ctc_loss=0.1825, cr_loss=0.4098, over 21054.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1741, cr_loss=0.3902, over 4040836.12 frames. ], batch size: 59, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:11:27,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237830.0, ans=0.1 2024-09-15 05:12:23,309 INFO [train.py:1198] (1/2) Epoch 14, batch 900, loss[loss=0.2533, ctc_loss=0.172, cr_loss=0.4065, over 20804.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1732, cr_loss=0.3898, over 4067248.99 frames. ], batch size: 53, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:12:27,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.079e+02 2.233e+02 2.451e+02 3.481e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-15 05:12:36,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 05:12:49,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=237971.66666666666, ans=0.125 2024-09-15 05:13:20,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=238028.33333333334, ans=0.09899494936611666 2024-09-15 05:13:34,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-15 05:13:39,340 INFO [train.py:1198] (1/2) Epoch 14, batch 950, loss[loss=0.2368, ctc_loss=0.1599, cr_loss=0.3849, over 20872.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1742, cr_loss=0.3907, over 4055392.29 frames. ], batch size: 54, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:14:05,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-15 05:14:26,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=238170.0, ans=0.2 2024-09-15 05:14:40,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=238170.0, ans=0.125 2024-09-15 05:14:46,520 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:14:58,402 INFO [train.py:1198] (1/2) Epoch 14, batch 1000, loss[loss=0.271, ctc_loss=0.1884, cr_loss=0.4128, over 20960.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1745, cr_loss=0.3918, over 4069257.93 frames. ], batch size: 67, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:15:02,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-15 05:15:03,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.087e+02 2.235e+02 2.382e+02 7.832e+02, threshold=4.470e+02, percent-clipped=1.0 2024-09-15 05:15:14,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-15 05:15:28,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=238283.33333333334, ans=0.125 2024-09-15 05:15:29,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=238283.33333333334, ans=0.04949747468305833 2024-09-15 05:15:34,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238283.33333333334, ans=0.125 2024-09-15 05:15:45,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=238311.66666666666, ans=0.0 2024-09-15 05:16:11,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=238340.0, ans=0.0 2024-09-15 05:16:13,917 INFO [train.py:1198] (1/2) Epoch 14, batch 1050, loss[loss=0.2139, ctc_loss=0.1428, cr_loss=0.3553, over 20981.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1747, cr_loss=0.3917, over 4087629.32 frames. ], batch size: 52, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:16:21,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238368.33333333334, ans=0.1 2024-09-15 05:16:47,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=238425.0, ans=10.0 2024-09-15 05:17:02,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=238453.33333333334, ans=0.025 2024-09-15 05:17:32,103 INFO [train.py:1198] (1/2) Epoch 14, batch 1100, loss[loss=0.2138, ctc_loss=0.1434, cr_loss=0.3522, over 20968.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1753, cr_loss=0.3928, over 4083965.63 frames. ], batch size: 49, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:17:36,637 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.070e+02 2.185e+02 2.490e+02 3.373e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 05:17:49,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=238538.33333333334, ans=0.2 2024-09-15 05:18:14,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238566.66666666666, ans=0.1 2024-09-15 05:18:41,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=238623.33333333334, ans=0.2 2024-09-15 05:18:46,764 INFO [train.py:1198] (1/2) Epoch 14, batch 1150, loss[loss=0.269, ctc_loss=0.1855, cr_loss=0.4175, over 20936.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1753, cr_loss=0.3924, over 4081107.73 frames. ], batch size: 60, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:19:09,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=238680.0, ans=0.125 2024-09-15 05:19:18,572 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:19:26,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-09-15 05:19:41,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-09-15 05:19:51,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=238765.0, ans=0.09899494936611666 2024-09-15 05:20:01,941 INFO [train.py:1198] (1/2) Epoch 14, batch 1200, loss[loss=0.2713, ctc_loss=0.1865, cr_loss=0.4243, over 20832.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.175, cr_loss=0.3913, over 4093199.75 frames. ], batch size: 59, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:20:09,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.610e+02 2.055e+02 2.194e+02 2.371e+02 5.778e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 05:20:10,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=238793.33333333334, ans=0.07 2024-09-15 05:20:12,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=238793.33333333334, ans=0.125 2024-09-15 05:20:17,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=238793.33333333334, ans=0.0 2024-09-15 05:21:19,928 INFO [train.py:1198] (1/2) Epoch 14, batch 1250, loss[loss=0.2361, ctc_loss=0.1633, cr_loss=0.3643, over 20884.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.175, cr_loss=0.3915, over 4102799.39 frames. ], batch size: 54, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:21:38,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=238963.33333333334, ans=0.0 2024-09-15 05:22:35,388 INFO [train.py:1198] (1/2) Epoch 14, batch 1300, loss[loss=0.2672, ctc_loss=0.1856, cr_loss=0.4078, over 21004.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1751, cr_loss=0.3913, over 4091119.45 frames. ], batch size: 63, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:22:40,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.698e+02 2.077e+02 2.243e+02 2.476e+02 4.179e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-15 05:22:43,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=239076.66666666666, ans=0.0 2024-09-15 05:23:04,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=239105.0, ans=0.07 2024-09-15 05:23:34,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239161.66666666666, ans=0.1 2024-09-15 05:23:43,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239190.0, ans=0.125 2024-09-15 05:23:53,706 INFO [train.py:1198] (1/2) Epoch 14, batch 1350, loss[loss=0.2288, ctc_loss=0.1559, cr_loss=0.3649, over 20884.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.175, cr_loss=0.3923, over 4096031.82 frames. ], batch size: 54, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:23:55,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=239218.33333333334, ans=0.0 2024-09-15 05:24:40,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239303.33333333334, ans=0.1 2024-09-15 05:24:47,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=239303.33333333334, ans=0.2 2024-09-15 05:25:08,710 INFO [train.py:1198] (1/2) Epoch 14, batch 1400, loss[loss=0.2671, ctc_loss=0.1839, cr_loss=0.4159, over 20967.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.174, cr_loss=0.3911, over 4099884.48 frames. ], batch size: 58, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:25:10,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239360.0, ans=0.1 2024-09-15 05:25:13,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.109e+02 2.226e+02 2.450e+02 4.273e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 05:25:25,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=239388.33333333334, ans=0.125 2024-09-15 05:25:25,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=239388.33333333334, ans=0.04949747468305833 2024-09-15 05:25:26,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=239388.33333333334, ans=0.025 2024-09-15 05:25:39,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-15 05:26:01,774 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:26:09,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=239445.0, ans=0.0 2024-09-15 05:26:13,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239473.33333333334, ans=0.1 2024-09-15 05:26:26,843 INFO [train.py:1198] (1/2) Epoch 14, batch 1450, loss[loss=0.2756, ctc_loss=0.193, cr_loss=0.4126, over 21037.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1745, cr_loss=0.3914, over 4105179.61 frames. ], batch size: 62, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:27:15,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=239586.66666666666, ans=0.0 2024-09-15 05:27:42,255 INFO [train.py:1198] (1/2) Epoch 14, batch 1500, loss[loss=0.2866, ctc_loss=0.1995, cr_loss=0.4356, over 20019.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1743, cr_loss=0.3912, over 4113126.26 frames. ], batch size: 80, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:27:46,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.067e+02 2.258e+02 2.457e+02 5.718e+02, threshold=4.515e+02, percent-clipped=1.0 2024-09-15 05:27:49,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2024-09-15 05:27:54,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=239643.33333333334, ans=0.125 2024-09-15 05:27:57,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=239671.66666666666, ans=0.025 2024-09-15 05:28:43,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-15 05:28:44,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=239756.66666666666, ans=0.2 2024-09-15 05:29:00,729 INFO [train.py:1198] (1/2) Epoch 14, batch 1550, loss[loss=0.2951, ctc_loss=0.2072, cr_loss=0.4393, over 18207.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1751, cr_loss=0.3923, over 4100172.09 frames. ], batch size: 108, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:29:02,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=239785.0, ans=0.0 2024-09-15 05:29:49,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=239870.0, ans=0.125 2024-09-15 05:30:16,016 INFO [train.py:1198] (1/2) Epoch 14, batch 1600, loss[loss=0.2509, ctc_loss=0.1745, cr_loss=0.3821, over 21052.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1746, cr_loss=0.3916, over 4101161.21 frames. ], batch size: 53, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:30:20,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.103e+02 2.239e+02 2.541e+02 3.639e+02, threshold=4.478e+02, percent-clipped=0.0 2024-09-15 05:30:29,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239955.0, ans=0.125 2024-09-15 05:30:40,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=239955.0, ans=0.0 2024-09-15 05:30:51,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239983.33333333334, ans=0.125 2024-09-15 05:31:01,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=240011.66666666666, ans=0.2 2024-09-15 05:31:15,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=240040.0, ans=0.0 2024-09-15 05:31:22,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240040.0, ans=0.125 2024-09-15 05:31:34,264 INFO [train.py:1198] (1/2) Epoch 14, batch 1650, loss[loss=0.2323, ctc_loss=0.1596, cr_loss=0.3635, over 21016.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1741, cr_loss=0.3915, over 4113238.40 frames. ], batch size: 61, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:31:37,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=240068.33333333334, ans=0.09899494936611666 2024-09-15 05:31:43,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=240068.33333333334, ans=0.05 2024-09-15 05:31:48,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=240096.66666666666, ans=0.0 2024-09-15 05:32:49,543 INFO [train.py:1198] (1/2) Epoch 14, batch 1700, loss[loss=0.2696, ctc_loss=0.1912, cr_loss=0.3923, over 20805.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1739, cr_loss=0.3908, over 4121405.77 frames. ], batch size: 59, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:32:53,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.028e+02 2.161e+02 2.311e+02 5.289e+02, threshold=4.322e+02, percent-clipped=1.0 2024-09-15 05:33:08,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2024-09-15 05:33:45,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=240295.0, ans=0.125 2024-09-15 05:34:04,945 INFO [train.py:1198] (1/2) Epoch 14, batch 1750, loss[loss=0.2553, ctc_loss=0.1742, cr_loss=0.4052, over 20962.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1735, cr_loss=0.3901, over 4123660.06 frames. ], batch size: 60, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:34:06,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=240351.66666666666, ans=0.125 2024-09-15 05:34:36,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=240408.33333333334, ans=0.2 2024-09-15 05:35:19,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240465.0, ans=0.125 2024-09-15 05:35:23,981 INFO [train.py:1198] (1/2) Epoch 14, batch 1800, loss[loss=0.2426, ctc_loss=0.1654, cr_loss=0.386, over 20768.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1736, cr_loss=0.3903, over 4117408.61 frames. ], batch size: 56, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:35:28,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.022e+02 2.173e+02 2.326e+02 2.841e+02, threshold=4.346e+02, percent-clipped=0.0 2024-09-15 05:35:41,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240521.66666666666, ans=0.0 2024-09-15 05:36:00,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=240550.0, ans=0.0 2024-09-15 05:36:02,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=240550.0, ans=0.125 2024-09-15 05:36:37,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240606.66666666666, ans=0.125 2024-09-15 05:36:40,181 INFO [train.py:1198] (1/2) Epoch 14, batch 1850, loss[loss=0.2212, ctc_loss=0.1504, cr_loss=0.354, over 20973.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1742, cr_loss=0.3909, over 4102944.78 frames. ], batch size: 48, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:36:53,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=240663.33333333334, ans=0.025 2024-09-15 05:37:01,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=240663.33333333334, ans=0.0 2024-09-15 05:37:10,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=240691.66666666666, ans=0.2 2024-09-15 05:37:10,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240691.66666666666, ans=0.125 2024-09-15 05:37:53,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=240748.33333333334, ans=0.0 2024-09-15 05:37:58,080 INFO [train.py:1198] (1/2) Epoch 14, batch 1900, loss[loss=0.3145, ctc_loss=0.2269, cr_loss=0.4382, over 14677.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1751, cr_loss=0.3915, over 4089664.51 frames. ], batch size: 149, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:38:02,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.220e+02 2.398e+02 2.948e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 05:38:15,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-15 05:38:33,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-15 05:38:37,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=240833.33333333334, ans=0.2 2024-09-15 05:38:51,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=240861.66666666666, ans=0.125 2024-09-15 05:38:52,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=240861.66666666666, ans=0.125 2024-09-15 05:39:07,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=240890.0, ans=0.0 2024-09-15 05:39:13,480 INFO [train.py:1198] (1/2) Epoch 14, batch 1950, loss[loss=0.2611, ctc_loss=0.178, cr_loss=0.4153, over 20630.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1751, cr_loss=0.3921, over 4100809.92 frames. ], batch size: 68, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:39:53,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-15 05:40:27,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=241031.66666666666, ans=0.125 2024-09-15 05:40:30,956 INFO [train.py:1198] (1/2) Epoch 14, batch 2000, loss[loss=0.269, ctc_loss=0.189, cr_loss=0.3997, over 19543.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1734, cr_loss=0.3895, over 4104677.42 frames. ], batch size: 90, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:40:35,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.043e+02 2.236e+02 2.518e+02 3.651e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-15 05:41:02,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=241116.66666666666, ans=0.125 2024-09-15 05:41:10,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=241116.66666666666, ans=0.2 2024-09-15 05:41:37,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=241173.33333333334, ans=0.2 2024-09-15 05:41:43,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=241173.33333333334, ans=0.0 2024-09-15 05:41:46,190 INFO [train.py:1198] (1/2) Epoch 14, batch 2050, loss[loss=0.2064, ctc_loss=0.1402, cr_loss=0.3309, over 20938.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1729, cr_loss=0.389, over 4100584.32 frames. ], batch size: 50, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:41:55,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=241201.66666666666, ans=0.0 2024-09-15 05:42:03,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=241230.0, ans=0.025 2024-09-15 05:42:19,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=241258.33333333334, ans=0.125 2024-09-15 05:42:19,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=241258.33333333334, ans=0.125 2024-09-15 05:43:02,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=241343.33333333334, ans=0.125 2024-09-15 05:43:04,061 INFO [train.py:1198] (1/2) Epoch 14, batch 2100, loss[loss=0.2858, ctc_loss=0.2008, cr_loss=0.4249, over 20654.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1724, cr_loss=0.3886, over 4101076.93 frames. ], batch size: 66, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:43:08,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 1.980e+02 2.120e+02 2.281e+02 2.836e+02, threshold=4.240e+02, percent-clipped=0.0 2024-09-15 05:43:11,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=241343.33333333334, ans=0.5 2024-09-15 05:43:33,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=241400.0, ans=0.125 2024-09-15 05:43:48,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-15 05:43:56,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=15.0 2024-09-15 05:44:00,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-09-15 05:44:19,156 INFO [train.py:1198] (1/2) Epoch 14, batch 2150, loss[loss=0.2634, ctc_loss=0.1794, cr_loss=0.4201, over 20837.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.174, cr_loss=0.3899, over 4076664.69 frames. ], batch size: 65, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:44:34,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-15 05:44:40,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=241513.33333333334, ans=0.125 2024-09-15 05:44:56,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2024-09-15 05:45:35,089 INFO [train.py:1198] (1/2) Epoch 14, batch 2200, loss[loss=0.2674, ctc_loss=0.1862, cr_loss=0.4062, over 21086.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1742, cr_loss=0.3896, over 4084987.36 frames. ], batch size: 59, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:45:39,532 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.074e+02 2.247e+02 2.504e+02 4.445e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-15 05:45:51,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-15 05:46:17,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=241683.33333333334, ans=0.0 2024-09-15 05:46:25,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2024-09-15 05:46:51,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=241768.33333333334, ans=0.125 2024-09-15 05:46:52,931 INFO [train.py:1198] (1/2) Epoch 14, batch 2250, loss[loss=0.2758, ctc_loss=0.1923, cr_loss=0.4173, over 18302.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1745, cr_loss=0.3894, over 4068984.25 frames. ], batch size: 108, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:47:56,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=22.5 2024-09-15 05:48:08,194 INFO [train.py:1198] (1/2) Epoch 14, batch 2300, loss[loss=0.2601, ctc_loss=0.1781, cr_loss=0.4098, over 21005.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1738, cr_loss=0.3894, over 4086043.93 frames. ], batch size: 61, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:48:11,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=241910.0, ans=0.2 2024-09-15 05:48:12,807 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.050e+02 2.168e+02 2.378e+02 4.598e+02, threshold=4.337e+02, percent-clipped=1.0 2024-09-15 05:48:38,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=241938.33333333334, ans=0.125 2024-09-15 05:49:26,729 INFO [train.py:1198] (1/2) Epoch 14, batch 2350, loss[loss=0.2771, ctc_loss=0.1952, cr_loss=0.4095, over 20927.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1734, cr_loss=0.3886, over 4087485.19 frames. ], batch size: 60, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:50:05,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=242108.33333333334, ans=0.0 2024-09-15 05:50:20,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242136.66666666666, ans=0.1 2024-09-15 05:50:23,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=242136.66666666666, ans=0.0 2024-09-15 05:50:36,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=242165.0, ans=0.125 2024-09-15 05:50:42,256 INFO [train.py:1198] (1/2) Epoch 14, batch 2400, loss[loss=0.2765, ctc_loss=0.193, cr_loss=0.4176, over 21078.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1754, cr_loss=0.3915, over 4086001.58 frames. ], batch size: 59, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:50:48,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.003e+02 2.156e+02 2.309e+02 2.945e+02, threshold=4.312e+02, percent-clipped=0.0 2024-09-15 05:52:00,233 INFO [train.py:1198] (1/2) Epoch 14, batch 2450, loss[loss=0.2253, ctc_loss=0.155, cr_loss=0.3517, over 20781.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1751, cr_loss=0.3911, over 4071300.63 frames. ], batch size: 53, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:52:05,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=242335.0, ans=0.0 2024-09-15 05:52:10,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-15 05:52:14,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=242363.33333333334, ans=0.015 2024-09-15 05:52:34,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-15 05:52:41,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-15 05:53:04,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=242448.33333333334, ans=0.125 2024-09-15 05:53:15,279 INFO [train.py:1198] (1/2) Epoch 14, batch 2500, loss[loss=0.2557, ctc_loss=0.1771, cr_loss=0.3931, over 20823.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1751, cr_loss=0.3916, over 4077163.48 frames. ], batch size: 59, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:53:21,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.683e+02 2.007e+02 2.160e+02 2.318e+02 3.852e+02, threshold=4.320e+02, percent-clipped=0.0 2024-09-15 05:53:24,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=242476.66666666666, ans=0.2 2024-09-15 05:53:39,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242505.0, ans=0.1 2024-09-15 05:53:56,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=242533.33333333334, ans=0.2 2024-09-15 05:54:13,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=242561.66666666666, ans=0.025 2024-09-15 05:54:32,629 INFO [train.py:1198] (1/2) Epoch 14, batch 2550, loss[loss=0.2626, ctc_loss=0.1807, cr_loss=0.4098, over 21054.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1752, cr_loss=0.3915, over 4074628.11 frames. ], batch size: 56, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:54:59,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=242646.66666666666, ans=0.125 2024-09-15 05:55:14,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=242675.0, ans=0.125 2024-09-15 05:55:34,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=242731.66666666666, ans=0.07 2024-09-15 05:55:35,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=242731.66666666666, ans=0.125 2024-09-15 05:55:47,021 INFO [train.py:1198] (1/2) Epoch 14, batch 2600, loss[loss=0.2544, ctc_loss=0.1766, cr_loss=0.3888, over 21074.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1757, cr_loss=0.3918, over 4085997.04 frames. ], batch size: 56, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:55:53,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.049e+02 2.221e+02 2.410e+02 3.900e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 05:56:05,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-15 05:56:49,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=242873.33333333334, ans=0.0 2024-09-15 05:57:02,603 INFO [train.py:1198] (1/2) Epoch 14, batch 2650, loss[loss=0.2594, ctc_loss=0.1789, cr_loss=0.4022, over 20321.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1755, cr_loss=0.3919, over 4080251.98 frames. ], batch size: 74, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:57:14,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=242901.66666666666, ans=0.025 2024-09-15 05:57:20,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=242930.0, ans=0.125 2024-09-15 05:57:37,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=242958.33333333334, ans=0.125 2024-09-15 05:58:02,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-15 05:58:03,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242986.66666666666, ans=0.1 2024-09-15 05:58:20,790 INFO [train.py:1198] (1/2) Epoch 14, batch 2700, loss[loss=0.2011, ctc_loss=0.1346, cr_loss=0.3323, over 20978.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1747, cr_loss=0.3914, over 4085943.14 frames. ], batch size: 51, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:58:26,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.056e+02 2.186e+02 2.405e+02 3.188e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-15 05:58:48,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243071.66666666666, ans=0.1 2024-09-15 05:58:49,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243100.0, ans=0.1 2024-09-15 05:59:07,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243128.33333333334, ans=0.1 2024-09-15 05:59:18,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=243128.33333333334, ans=0.04949747468305833 2024-09-15 05:59:20,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-15 05:59:21,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-15 05:59:36,327 INFO [train.py:1198] (1/2) Epoch 14, batch 2750, loss[loss=0.2566, ctc_loss=0.1768, cr_loss=0.3989, over 21028.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.175, cr_loss=0.3922, over 4091699.90 frames. ], batch size: 62, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:59:38,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243185.0, ans=0.1 2024-09-15 05:59:45,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243185.0, ans=0.0 2024-09-15 06:00:01,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=243213.33333333334, ans=0.0 2024-09-15 06:00:24,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=243270.0, ans=0.05 2024-09-15 06:00:35,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=243270.0, ans=0.125 2024-09-15 06:00:35,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.64 vs. limit=10.0 2024-09-15 06:00:38,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=243298.33333333334, ans=0.0 2024-09-15 06:00:48,662 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:00:54,161 INFO [train.py:1198] (1/2) Epoch 14, batch 2800, loss[loss=0.2605, ctc_loss=0.1798, cr_loss=0.4034, over 20657.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1738, cr_loss=0.3901, over 4103648.93 frames. ], batch size: 68, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 06:01:00,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 1.994e+02 2.164e+02 2.389e+02 3.554e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-15 06:01:38,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=243411.66666666666, ans=0.025 2024-09-15 06:01:49,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-15 06:01:57,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243440.0, ans=0.1 2024-09-15 06:02:09,094 INFO [train.py:1198] (1/2) Epoch 14, batch 2850, loss[loss=0.2057, ctc_loss=0.1406, cr_loss=0.3254, over 20960.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1727, cr_loss=0.3884, over 4114375.99 frames. ], batch size: 52, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:03:09,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=243581.66666666666, ans=0.05 2024-09-15 06:03:26,362 INFO [train.py:1198] (1/2) Epoch 14, batch 2900, loss[loss=0.284, ctc_loss=0.1992, cr_loss=0.4244, over 20672.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1724, cr_loss=0.388, over 4112102.59 frames. ], batch size: 66, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:03:32,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.084e+02 2.234e+02 2.381e+02 7.796e+02, threshold=4.469e+02, percent-clipped=1.0 2024-09-15 06:03:51,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=243638.33333333334, ans=0.07 2024-09-15 06:03:54,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=243666.66666666666, ans=0.035 2024-09-15 06:04:32,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=243723.33333333334, ans=0.07 2024-09-15 06:04:35,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243723.33333333334, ans=0.1 2024-09-15 06:04:37,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=243723.33333333334, ans=0.2 2024-09-15 06:04:38,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=243723.33333333334, ans=0.0 2024-09-15 06:04:41,259 INFO [train.py:1198] (1/2) Epoch 14, batch 2950, loss[loss=0.3046, ctc_loss=0.2217, cr_loss=0.4142, over 14913.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1735, cr_loss=0.3894, over 4101291.14 frames. ], batch size: 150, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:04:48,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-15 06:05:05,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=243780.0, ans=0.0 2024-09-15 06:05:14,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=243808.33333333334, ans=0.125 2024-09-15 06:05:29,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=243836.66666666666, ans=10.0 2024-09-15 06:05:29,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=243836.66666666666, ans=0.0 2024-09-15 06:05:49,138 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:05:53,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=243865.0, ans=0.0 2024-09-15 06:05:59,125 INFO [train.py:1198] (1/2) Epoch 14, batch 3000, loss[loss=0.2368, ctc_loss=0.166, cr_loss=0.3538, over 20955.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1738, cr_loss=0.3897, over 4090296.99 frames. ], batch size: 60, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:05:59,125 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 06:06:30,071 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.04865, ctc_loss=0.04865, cr_loss=9.82e-15, over 944034.00 frames. 2024-09-15 06:06:30,071 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 06:06:36,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.009e+02 2.131e+02 2.391e+02 3.022e+02, threshold=4.261e+02, percent-clipped=0.0 2024-09-15 06:06:47,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243921.66666666666, ans=0.125 2024-09-15 06:06:55,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243921.66666666666, ans=0.1 2024-09-15 06:07:07,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=243950.0, ans=0.125 2024-09-15 06:07:12,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=243950.0, ans=0.2 2024-09-15 06:07:34,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244006.66666666666, ans=0.125 2024-09-15 06:07:40,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=244006.66666666666, ans=0.125 2024-09-15 06:07:45,162 INFO [train.py:1198] (1/2) Epoch 14, batch 3050, loss[loss=0.2404, ctc_loss=0.1672, cr_loss=0.3662, over 20882.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1747, cr_loss=0.3909, over 4085772.35 frames. ], batch size: 54, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:08:00,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=244063.33333333334, ans=0.025 2024-09-15 06:08:10,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=244063.33333333334, ans=0.025 2024-09-15 06:08:16,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=244091.66666666666, ans=0.0 2024-09-15 06:08:19,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=244091.66666666666, ans=0.0 2024-09-15 06:08:49,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-15 06:09:02,763 INFO [train.py:1198] (1/2) Epoch 14, batch 3100, loss[loss=0.2561, ctc_loss=0.1801, cr_loss=0.3796, over 20932.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1753, cr_loss=0.3913, over 4074834.07 frames. ], batch size: 60, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:09:08,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.054e+02 2.242e+02 2.513e+02 3.967e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-15 06:09:15,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2024-09-15 06:09:19,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-15 06:10:17,626 INFO [train.py:1198] (1/2) Epoch 14, batch 3150, loss[loss=0.2544, ctc_loss=0.1766, cr_loss=0.3892, over 20904.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1761, cr_loss=0.3924, over 4068852.36 frames. ], batch size: 54, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:10:39,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=244346.66666666666, ans=0.125 2024-09-15 06:11:20,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=244431.66666666666, ans=0.0 2024-09-15 06:11:25,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-15 06:11:26,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244431.66666666666, ans=0.1 2024-09-15 06:11:35,539 INFO [train.py:1198] (1/2) Epoch 14, batch 3200, loss[loss=0.3406, ctc_loss=0.2523, cr_loss=0.4417, over 14297.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1769, cr_loss=0.3935, over 4069150.23 frames. ], batch size: 150, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:11:41,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.096e+02 2.276e+02 2.509e+02 3.651e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-15 06:12:25,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-15 06:12:37,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244573.33333333334, ans=0.125 2024-09-15 06:12:50,351 INFO [train.py:1198] (1/2) Epoch 14, batch 3250, loss[loss=0.2465, ctc_loss=0.1692, cr_loss=0.3868, over 21086.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.177, cr_loss=0.3935, over 4064453.73 frames. ], batch size: 59, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:13:12,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=244630.0, ans=0.0 2024-09-15 06:14:05,048 INFO [train.py:1198] (1/2) Epoch 14, batch 3300, loss[loss=0.2071, ctc_loss=0.1381, cr_loss=0.3447, over 20964.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1772, cr_loss=0.3938, over 4071659.51 frames. ], batch size: 50, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:14:11,010 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.081e+02 2.220e+02 2.406e+02 3.394e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 06:14:11,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=244743.33333333334, ans=0.025 2024-09-15 06:14:21,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=244771.66666666666, ans=0.0 2024-09-15 06:14:22,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=244771.66666666666, ans=0.2 2024-09-15 06:14:32,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=244771.66666666666, ans=0.2 2024-09-15 06:14:41,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-15 06:14:46,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244800.0, ans=0.1 2024-09-15 06:15:08,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=244856.66666666666, ans=0.0 2024-09-15 06:15:23,400 INFO [train.py:1198] (1/2) Epoch 14, batch 3350, loss[loss=0.3052, ctc_loss=0.2186, cr_loss=0.4326, over 14357.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1756, cr_loss=0.3914, over 4071054.54 frames. ], batch size: 149, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:15:49,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244913.33333333334, ans=0.1 2024-09-15 06:15:55,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=244941.66666666666, ans=0.0 2024-09-15 06:16:20,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=244970.0, ans=0.0 2024-09-15 06:16:41,361 INFO [train.py:1198] (1/2) Epoch 14, batch 3400, loss[loss=0.2183, ctc_loss=0.1453, cr_loss=0.3648, over 20953.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1762, cr_loss=0.3932, over 4063221.02 frames. ], batch size: 48, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:16:47,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.093e+02 2.255e+02 2.468e+02 3.691e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-15 06:16:59,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=245055.0, ans=0.125 2024-09-15 06:17:20,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=245083.33333333334, ans=0.2 2024-09-15 06:17:37,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245111.66666666666, ans=0.1 2024-09-15 06:17:52,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=245140.0, ans=0.125 2024-09-15 06:17:56,728 INFO [train.py:1198] (1/2) Epoch 14, batch 3450, loss[loss=0.2162, ctc_loss=0.1449, cr_loss=0.3563, over 21057.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1763, cr_loss=0.3928, over 4066431.59 frames. ], batch size: 53, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:18:07,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=245168.33333333334, ans=0.125 2024-09-15 06:18:07,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=245168.33333333334, ans=0.125 2024-09-15 06:18:30,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245225.0, ans=0.0 2024-09-15 06:18:48,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-15 06:18:57,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=245281.66666666666, ans=0.125 2024-09-15 06:19:04,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=245281.66666666666, ans=0.0 2024-09-15 06:19:04,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-15 06:19:11,772 INFO [train.py:1198] (1/2) Epoch 14, batch 3500, loss[loss=0.3065, ctc_loss=0.2172, cr_loss=0.4462, over 20074.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1754, cr_loss=0.3915, over 4067563.59 frames. ], batch size: 80, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:19:17,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.025e+02 2.149e+02 2.398e+02 4.057e+02, threshold=4.299e+02, percent-clipped=0.0 2024-09-15 06:20:00,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-15 06:20:07,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=245395.0, ans=0.07 2024-09-15 06:20:29,959 INFO [train.py:1198] (1/2) Epoch 14, batch 3550, loss[loss=0.2486, ctc_loss=0.1702, cr_loss=0.392, over 21033.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1756, cr_loss=0.3927, over 4079450.47 frames. ], batch size: 63, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:20:31,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=245451.66666666666, ans=0.125 2024-09-15 06:20:48,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=245480.0, ans=0.125 2024-09-15 06:21:20,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=245536.66666666666, ans=0.04949747468305833 2024-09-15 06:21:44,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245593.33333333334, ans=0.1 2024-09-15 06:21:45,368 INFO [train.py:1198] (1/2) Epoch 14, batch 3600, loss[loss=0.2836, ctc_loss=0.1987, cr_loss=0.4246, over 20669.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1742, cr_loss=0.3901, over 4093006.79 frames. ], batch size: 68, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:21:47,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245593.33333333334, ans=0.1 2024-09-15 06:21:51,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.011e+02 2.147e+02 2.307e+02 3.614e+02, threshold=4.295e+02, percent-clipped=0.0 2024-09-15 06:22:03,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=245621.66666666666, ans=0.0 2024-09-15 06:22:05,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=245621.66666666666, ans=0.0 2024-09-15 06:22:47,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245706.66666666666, ans=0.1 2024-09-15 06:22:53,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=245706.66666666666, ans=0.0 2024-09-15 06:23:03,203 INFO [train.py:1198] (1/2) Epoch 14, batch 3650, loss[loss=0.2639, ctc_loss=0.1766, cr_loss=0.4368, over 20667.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.174, cr_loss=0.3899, over 4100730.14 frames. ], batch size: 66, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:23:21,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=245763.33333333334, ans=0.035 2024-09-15 06:23:21,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=245763.33333333334, ans=0.125 2024-09-15 06:23:42,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=245791.66666666666, ans=0.125 2024-09-15 06:23:53,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-15 06:23:56,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=245820.0, ans=0.025 2024-09-15 06:24:18,498 INFO [train.py:1198] (1/2) Epoch 14, batch 3700, loss[loss=0.2643, ctc_loss=0.1847, cr_loss=0.398, over 20669.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1746, cr_loss=0.3903, over 4085611.85 frames. ], batch size: 68, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:24:26,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.125e+02 2.305e+02 2.570e+02 9.865e+02, threshold=4.609e+02, percent-clipped=1.0 2024-09-15 06:24:39,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=245905.0, ans=0.0 2024-09-15 06:24:45,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=245905.0, ans=0.125 2024-09-15 06:24:52,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-15 06:24:59,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-15 06:25:08,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-09-15 06:25:33,982 INFO [train.py:1198] (1/2) Epoch 14, batch 3750, loss[loss=0.2959, ctc_loss=0.2095, cr_loss=0.4318, over 18106.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1745, cr_loss=0.3907, over 4086915.63 frames. ], batch size: 108, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:25:58,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=246046.66666666666, ans=0.125 2024-09-15 06:26:04,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=246046.66666666666, ans=0.125 2024-09-15 06:26:14,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=246075.0, ans=0.2 2024-09-15 06:26:46,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 06:26:48,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=246131.66666666666, ans=0.0 2024-09-15 06:26:52,942 INFO [train.py:1198] (1/2) Epoch 14, batch 3800, loss[loss=0.2813, ctc_loss=0.1976, cr_loss=0.4187, over 20967.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1744, cr_loss=0.3906, over 4091605.19 frames. ], batch size: 64, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:26:53,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=246160.0, ans=0.0 2024-09-15 06:27:00,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.653e+02 2.046e+02 2.162e+02 2.407e+02 1.230e+03, threshold=4.325e+02, percent-clipped=1.0 2024-09-15 06:27:14,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=246188.33333333334, ans=0.125 2024-09-15 06:27:51,603 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:27:52,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-09-15 06:27:53,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=246245.0, ans=0.125 2024-09-15 06:28:10,467 INFO [train.py:1198] (1/2) Epoch 14, batch 3850, loss[loss=0.2142, ctc_loss=0.1451, cr_loss=0.3455, over 20881.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1752, cr_loss=0.3919, over 4090100.52 frames. ], batch size: 54, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:28:19,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=246301.66666666666, ans=0.125 2024-09-15 06:28:27,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-15 06:29:11,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=246415.0, ans=0.125 2024-09-15 06:29:26,110 INFO [train.py:1198] (1/2) Epoch 14, batch 3900, loss[loss=0.2556, ctc_loss=0.1735, cr_loss=0.4106, over 20879.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1745, cr_loss=0.3913, over 4100742.36 frames. ], batch size: 54, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:29:27,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-15 06:29:27,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=246443.33333333334, ans=0.025 2024-09-15 06:29:29,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=246443.33333333334, ans=0.95 2024-09-15 06:29:33,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.090e+02 2.335e+02 2.590e+02 3.484e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-15 06:30:41,072 INFO [train.py:1198] (1/2) Epoch 14, batch 3950, loss[loss=0.245, ctc_loss=0.1651, cr_loss=0.3994, over 20940.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1746, cr_loss=0.3909, over 4089209.51 frames. ], batch size: 48, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:30:47,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246585.0, ans=0.1 2024-09-15 06:30:59,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=246613.33333333334, ans=0.125 2024-09-15 06:31:17,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=246641.66666666666, ans=0.025 2024-09-15 06:31:40,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=246670.0, ans=0.125 2024-09-15 06:31:41,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-15 06:31:51,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.30 vs. limit=10.0 2024-09-15 06:31:58,570 INFO [train.py:1198] (1/2) Epoch 14, batch 4000, loss[loss=0.2667, ctc_loss=0.1847, cr_loss=0.4099, over 18413.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1746, cr_loss=0.3915, over 4075341.51 frames. ], batch size: 108, lr: 5.97e-03, grad_scale: 32.0 2024-09-15 06:32:06,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.001e+02 2.127e+02 2.354e+02 3.662e+02, threshold=4.253e+02, percent-clipped=0.0 2024-09-15 06:32:42,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=246811.66666666666, ans=0.125 2024-09-15 06:33:00,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-15 06:33:13,720 INFO [train.py:1198] (1/2) Epoch 14, batch 4050, loss[loss=0.2865, ctc_loss=0.2019, cr_loss=0.4231, over 19286.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1738, cr_loss=0.3904, over 4078605.57 frames. ], batch size: 90, lr: 5.97e-03, grad_scale: 32.0 2024-09-15 06:33:15,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=246868.33333333334, ans=0.125 2024-09-15 06:33:24,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=246868.33333333334, ans=0.125 2024-09-15 06:33:26,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=246868.33333333334, ans=0.025 2024-09-15 06:33:27,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246868.33333333334, ans=0.0 2024-09-15 06:33:46,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=246925.0, ans=0.125 2024-09-15 06:33:48,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=246925.0, ans=0.2 2024-09-15 06:33:59,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-09-15 06:34:31,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-09-15 06:34:31,830 INFO [train.py:1198] (1/2) Epoch 14, batch 4100, loss[loss=0.2248, ctc_loss=0.1527, cr_loss=0.3603, over 20785.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1726, cr_loss=0.3892, over 4089505.05 frames. ], batch size: 56, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:34:40,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.018e+02 2.160e+02 2.373e+02 3.553e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 06:34:41,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-15 06:35:05,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=247066.66666666666, ans=0.125 2024-09-15 06:35:05,532 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:35:08,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=247066.66666666666, ans=0.125 2024-09-15 06:35:22,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-15 06:35:45,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=247123.33333333334, ans=0.125 2024-09-15 06:35:47,584 INFO [train.py:1198] (1/2) Epoch 14, batch 4150, loss[loss=0.2971, ctc_loss=0.2075, cr_loss=0.448, over 20344.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1722, cr_loss=0.3881, over 4086975.85 frames. ], batch size: 74, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:36:07,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=247180.0, ans=0.2 2024-09-15 06:36:30,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=247208.33333333334, ans=0.2 2024-09-15 06:36:33,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=247236.66666666666, ans=0.125 2024-09-15 06:36:36,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=247236.66666666666, ans=0.09899494936611666 2024-09-15 06:37:02,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=247265.0, ans=0.02 2024-09-15 06:37:06,382 INFO [train.py:1198] (1/2) Epoch 14, batch 4200, loss[loss=0.2534, ctc_loss=0.1738, cr_loss=0.398, over 21003.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1728, cr_loss=0.3887, over 4083287.43 frames. ], batch size: 63, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:37:10,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-15 06:37:15,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.044e+02 2.236e+02 2.474e+02 3.327e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-15 06:37:24,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=247321.66666666666, ans=0.125 2024-09-15 06:37:35,171 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:37:44,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=247350.0, ans=0.125 2024-09-15 06:37:44,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-15 06:37:56,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247378.33333333334, ans=0.1 2024-09-15 06:37:59,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=247378.33333333334, ans=0.0 2024-09-15 06:38:09,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=247406.66666666666, ans=0.125 2024-09-15 06:38:20,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=22.5 2024-09-15 06:38:21,417 INFO [train.py:1198] (1/2) Epoch 14, batch 4250, loss[loss=0.2426, ctc_loss=0.1671, cr_loss=0.3778, over 20932.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1726, cr_loss=0.3879, over 4078332.42 frames. ], batch size: 60, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:38:23,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=247435.0, ans=0.0 2024-09-15 06:39:04,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=247491.66666666666, ans=0.125 2024-09-15 06:39:04,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=247491.66666666666, ans=0.125 2024-09-15 06:39:16,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=247520.0, ans=0.125 2024-09-15 06:39:39,259 INFO [train.py:1198] (1/2) Epoch 14, batch 4300, loss[loss=0.2572, ctc_loss=0.1763, cr_loss=0.4046, over 20831.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1731, cr_loss=0.389, over 4087112.89 frames. ], batch size: 59, lr: 5.96e-03, grad_scale: 16.0 2024-09-15 06:39:48,234 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.070e+02 2.227e+02 2.452e+02 3.530e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-15 06:40:09,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=247633.33333333334, ans=0.125 2024-09-15 06:40:15,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=247633.33333333334, ans=0.05 2024-09-15 06:40:27,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=247661.66666666666, ans=0.07 2024-09-15 06:40:29,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247661.66666666666, ans=0.1 2024-09-15 06:40:33,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=247661.66666666666, ans=0.0 2024-09-15 06:40:48,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=247690.0, ans=0.125 2024-09-15 06:40:54,175 INFO [train.py:1198] (1/2) Epoch 14, batch 4350, loss[loss=0.2499, ctc_loss=0.1715, cr_loss=0.3922, over 20855.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1733, cr_loss=0.3899, over 4079188.22 frames. ], batch size: 65, lr: 5.96e-03, grad_scale: 16.0 2024-09-15 06:40:54,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=247718.33333333334, ans=0.035 2024-09-15 06:41:18,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=247746.66666666666, ans=0.2 2024-09-15 06:41:47,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=247803.33333333334, ans=0.0 2024-09-15 06:42:09,608 INFO [train.py:1198] (1/2) Epoch 14, batch 4400, loss[loss=0.2974, ctc_loss=0.22, cr_loss=0.3873, over 13535.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1749, cr_loss=0.391, over 4052807.12 frames. ], batch size: 149, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:42:18,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.032e+02 2.155e+02 2.365e+02 1.102e+03, threshold=4.310e+02, percent-clipped=1.0 2024-09-15 06:42:32,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=247888.33333333334, ans=0.125 2024-09-15 06:42:35,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=247888.33333333334, ans=0.2 2024-09-15 06:42:50,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=247916.66666666666, ans=0.2 2024-09-15 06:42:59,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247945.0, ans=0.1 2024-09-15 06:43:04,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=22.5 2024-09-15 06:43:05,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-15 06:43:23,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247973.33333333334, ans=0.1 2024-09-15 06:43:27,594 INFO [train.py:1198] (1/2) Epoch 14, batch 4450, loss[loss=0.268, ctc_loss=0.1881, cr_loss=0.3998, over 20045.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1766, cr_loss=0.3932, over 4040195.18 frames. ], batch size: 80, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:43:49,276 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:44:07,201 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:44:11,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=248086.66666666666, ans=0.05 2024-09-15 06:44:45,685 INFO [train.py:1198] (1/2) Epoch 14, batch 4500, loss[loss=0.2931, ctc_loss=0.2088, cr_loss=0.4218, over 20346.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1768, cr_loss=0.3944, over 4057974.87 frames. ], batch size: 74, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:44:54,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.109e+02 2.256e+02 2.511e+02 3.955e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-15 06:45:02,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248171.66666666666, ans=0.125 2024-09-15 06:45:19,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=248200.0, ans=0.0 2024-09-15 06:45:35,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-15 06:45:48,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=248256.66666666666, ans=0.125 2024-09-15 06:45:57,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=248256.66666666666, ans=0.125 2024-09-15 06:46:01,302 INFO [train.py:1198] (1/2) Epoch 14, batch 4550, loss[loss=0.2373, ctc_loss=0.1617, cr_loss=0.3779, over 20965.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1757, cr_loss=0.3925, over 4066627.85 frames. ], batch size: 64, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:46:03,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=248285.0, ans=0.125 2024-09-15 06:46:23,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-15 06:46:41,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=8.0 2024-09-15 06:46:46,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=248370.0, ans=0.125 2024-09-15 06:47:14,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248398.33333333334, ans=0.125 2024-09-15 06:47:16,755 INFO [train.py:1198] (1/2) Epoch 14, batch 4600, loss[loss=0.2152, ctc_loss=0.1441, cr_loss=0.3555, over 20952.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1748, cr_loss=0.3917, over 4074217.93 frames. ], batch size: 52, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:47:25,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.058e+02 2.228e+02 2.403e+02 3.695e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-15 06:47:27,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=248426.66666666666, ans=0.04949747468305833 2024-09-15 06:47:28,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-15 06:47:35,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=248455.0, ans=0.0 2024-09-15 06:48:29,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2024-09-15 06:48:33,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=248568.33333333334, ans=0.125 2024-09-15 06:48:34,270 INFO [train.py:1198] (1/2) Epoch 14, batch 4650, loss[loss=0.2709, ctc_loss=0.1866, cr_loss=0.4213, over 20782.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1744, cr_loss=0.3911, over 4080628.06 frames. ], batch size: 56, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:48:40,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=248568.33333333334, ans=0.2 2024-09-15 06:48:51,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=248596.66666666666, ans=0.2 2024-09-15 06:49:14,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-15 06:49:47,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=248681.66666666666, ans=0.5 2024-09-15 06:49:47,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=248681.66666666666, ans=0.0 2024-09-15 06:49:50,411 INFO [train.py:1198] (1/2) Epoch 14, batch 4700, loss[loss=0.2844, ctc_loss=0.1987, cr_loss=0.4288, over 20046.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1737, cr_loss=0.3897, over 4090765.49 frames. ], batch size: 80, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:49:53,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=248710.0, ans=0.2 2024-09-15 06:49:59,261 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.024e+02 2.186e+02 2.369e+02 4.541e+02, threshold=4.371e+02, percent-clipped=1.0 2024-09-15 06:50:10,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=248738.33333333334, ans=0.125 2024-09-15 06:50:31,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-15 06:50:54,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-15 06:51:01,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-15 06:51:08,853 INFO [train.py:1198] (1/2) Epoch 14, batch 4750, loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3733, over 19395.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1728, cr_loss=0.3882, over 4096655.09 frames. ], batch size: 43, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:51:19,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248851.66666666666, ans=0.1 2024-09-15 06:51:27,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=248880.0, ans=0.2 2024-09-15 06:51:50,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248908.33333333334, ans=0.125 2024-09-15 06:52:23,997 INFO [train.py:1198] (1/2) Epoch 14, batch 4800, loss[loss=0.259, ctc_loss=0.1803, cr_loss=0.3937, over 20666.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.173, cr_loss=0.3886, over 4096628.30 frames. ], batch size: 68, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:52:33,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.079e+02 2.188e+02 2.473e+02 4.707e+02, threshold=4.376e+02, percent-clipped=1.0 2024-09-15 06:52:36,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=248993.33333333334, ans=0.125 2024-09-15 06:52:47,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2024-09-15 06:53:04,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2024-09-15 06:53:40,005 INFO [train.py:1198] (1/2) Epoch 14, batch 4850, loss[loss=0.3247, ctc_loss=0.2355, cr_loss=0.446, over 14510.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1735, cr_loss=0.3893, over 4098724.14 frames. ], batch size: 149, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:53:40,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=249135.0, ans=0.09899494936611666 2024-09-15 06:53:52,415 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:54:04,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=249163.33333333334, ans=0.125 2024-09-15 06:54:08,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249191.66666666666, ans=0.125 2024-09-15 06:54:57,545 INFO [train.py:1198] (1/2) Epoch 14, batch 4900, loss[loss=0.2578, ctc_loss=0.178, cr_loss=0.3991, over 18175.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1735, cr_loss=0.3889, over 4081079.30 frames. ], batch size: 108, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:55:06,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 1.997e+02 2.129e+02 2.287e+02 3.273e+02, threshold=4.258e+02, percent-clipped=0.0 2024-09-15 06:55:22,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-15 06:55:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=249361.66666666666, ans=0.0 2024-09-15 06:55:49,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=249361.66666666666, ans=0.125 2024-09-15 06:55:53,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=249361.66666666666, ans=0.125 2024-09-15 06:56:12,686 INFO [train.py:1198] (1/2) Epoch 14, batch 4950, loss[loss=0.2124, ctc_loss=0.1427, cr_loss=0.3487, over 20971.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1734, cr_loss=0.3892, over 4085358.08 frames. ], batch size: 50, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:56:45,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=249475.0, ans=0.05 2024-09-15 06:56:51,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=249475.0, ans=0.0 2024-09-15 06:57:18,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=249531.66666666666, ans=0.2 2024-09-15 06:57:29,706 INFO [train.py:1198] (1/2) Epoch 14, batch 5000, loss[loss=0.2264, ctc_loss=0.1526, cr_loss=0.3689, over 20237.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1738, cr_loss=0.3898, over 4087677.92 frames. ], batch size: 45, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:57:40,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.060e+02 2.176e+02 2.400e+02 3.518e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 06:58:07,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=249616.66666666666, ans=0.0 2024-09-15 06:58:07,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=249616.66666666666, ans=0.2 2024-09-15 06:58:24,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=249645.0, ans=0.0 2024-09-15 06:58:41,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=249673.33333333334, ans=0.125 2024-09-15 06:58:43,844 INFO [train.py:1198] (1/2) Epoch 14, batch 5050, loss[loss=0.2754, ctc_loss=0.1919, cr_loss=0.4172, over 20974.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.173, cr_loss=0.3887, over 4096949.46 frames. ], batch size: 58, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 06:58:57,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=249730.0, ans=0.125 2024-09-15 06:58:57,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=249730.0, ans=0.125 2024-09-15 06:59:22,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249758.33333333334, ans=0.125 2024-09-15 06:59:22,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=249758.33333333334, ans=0.125 2024-09-15 06:59:32,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=249786.66666666666, ans=0.5 2024-09-15 06:59:34,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=249786.66666666666, ans=0.125 2024-09-15 06:59:36,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=249786.66666666666, ans=0.2 2024-09-15 06:59:44,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249815.0, ans=0.125 2024-09-15 06:59:53,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.68 vs. limit=10.0 2024-09-15 06:59:57,828 INFO [train.py:1198] (1/2) Epoch 14, batch 5100, loss[loss=0.2143, ctc_loss=0.1454, cr_loss=0.3443, over 21051.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1736, cr_loss=0.3896, over 4092190.76 frames. ], batch size: 53, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 07:00:07,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.085e+02 2.221e+02 2.455e+02 4.461e+02, threshold=4.442e+02, percent-clipped=1.0 2024-09-15 07:00:26,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249900.0, ans=0.1 2024-09-15 07:00:29,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=249900.0, ans=0.125 2024-09-15 07:00:54,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=249928.33333333334, ans=0.0 2024-09-15 07:01:12,415 INFO [train.py:1198] (1/2) Epoch 14, batch 5150, loss[loss=0.2533, ctc_loss=0.1721, cr_loss=0.406, over 21023.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1734, cr_loss=0.3893, over 4090620.78 frames. ], batch size: 61, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 07:01:45,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250041.66666666666, ans=0.125 2024-09-15 07:02:26,338 INFO [train.py:1198] (1/2) Epoch 14, batch 5200, loss[loss=0.2598, ctc_loss=0.1809, cr_loss=0.3948, over 21092.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1728, cr_loss=0.3884, over 4095653.02 frames. ], batch size: 59, lr: 5.93e-03, grad_scale: 32.0 2024-09-15 07:02:36,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.076e+02 2.222e+02 2.520e+02 3.444e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-15 07:02:49,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=250155.0, ans=0.0 2024-09-15 07:02:58,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=250183.33333333334, ans=0.5 2024-09-15 07:03:03,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=250183.33333333334, ans=0.125 2024-09-15 07:03:09,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=250211.66666666666, ans=0.125 2024-09-15 07:03:18,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-15 07:03:28,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=250240.0, ans=0.125 2024-09-15 07:03:40,821 INFO [train.py:1198] (1/2) Epoch 14, batch 5250, loss[loss=0.259, ctc_loss=0.1725, cr_loss=0.4325, over 21017.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1729, cr_loss=0.3886, over 4092612.09 frames. ], batch size: 61, lr: 5.93e-03, grad_scale: 32.0 2024-09-15 07:03:49,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=250268.33333333334, ans=0.125 2024-09-15 07:03:51,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250268.33333333334, ans=0.1 2024-09-15 07:03:55,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=250296.66666666666, ans=0.125 2024-09-15 07:04:03,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=250296.66666666666, ans=0.2 2024-09-15 07:04:16,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=250325.0, ans=0.125 2024-09-15 07:04:57,655 INFO [train.py:1198] (1/2) Epoch 14, batch 5300, loss[loss=0.2391, ctc_loss=0.1623, cr_loss=0.3839, over 20951.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1732, cr_loss=0.39, over 4092006.88 frames. ], batch size: 51, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:05:05,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=250410.0, ans=0.2 2024-09-15 07:05:09,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.047e+02 2.170e+02 2.342e+02 4.334e+02, threshold=4.340e+02, percent-clipped=0.0 2024-09-15 07:05:15,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=250438.33333333334, ans=0.125 2024-09-15 07:05:20,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250438.33333333334, ans=0.1 2024-09-15 07:05:21,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=250438.33333333334, ans=0.0 2024-09-15 07:05:25,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=250466.66666666666, ans=0.125 2024-09-15 07:05:38,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=250466.66666666666, ans=0.2 2024-09-15 07:05:48,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=250495.0, ans=0.125 2024-09-15 07:06:11,617 INFO [train.py:1198] (1/2) Epoch 14, batch 5350, loss[loss=0.2906, ctc_loss=0.2064, cr_loss=0.4209, over 18410.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1729, cr_loss=0.389, over 4093234.70 frames. ], batch size: 108, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:06:30,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2024-09-15 07:06:46,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=250608.33333333334, ans=0.0 2024-09-15 07:06:59,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.68 vs. limit=10.0 2024-09-15 07:07:01,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250636.66666666666, ans=0.1 2024-09-15 07:07:28,155 INFO [train.py:1198] (1/2) Epoch 14, batch 5400, loss[loss=0.2473, ctc_loss=0.1698, cr_loss=0.3873, over 21092.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1726, cr_loss=0.389, over 4092220.74 frames. ], batch size: 59, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:07:40,097 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.016e+02 2.163e+02 2.322e+02 4.968e+02, threshold=4.325e+02, percent-clipped=1.0 2024-09-15 07:07:40,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=250693.33333333334, ans=0.125 2024-09-15 07:07:41,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=250721.66666666666, ans=0.125 2024-09-15 07:08:05,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250750.0, ans=0.125 2024-09-15 07:08:14,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=250778.33333333334, ans=0.0 2024-09-15 07:08:31,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=250806.66666666666, ans=0.125 2024-09-15 07:08:38,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=250806.66666666666, ans=0.0 2024-09-15 07:08:41,489 INFO [train.py:1198] (1/2) Epoch 14, batch 5450, loss[loss=0.2512, ctc_loss=0.1711, cr_loss=0.4002, over 20777.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1729, cr_loss=0.3905, over 4102382.97 frames. ], batch size: 71, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:08:50,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250835.0, ans=0.1 2024-09-15 07:09:55,818 INFO [train.py:1198] (1/2) Epoch 14, batch 5500, loss[loss=0.3071, ctc_loss=0.2174, cr_loss=0.4485, over 19468.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1742, cr_loss=0.3926, over 4100574.44 frames. ], batch size: 90, lr: 5.92e-03, grad_scale: 16.0 2024-09-15 07:10:07,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.084e+02 2.215e+02 2.384e+02 4.753e+02, threshold=4.429e+02, percent-clipped=1.0 2024-09-15 07:10:15,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=251005.0, ans=0.125 2024-09-15 07:10:31,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-15 07:10:33,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=22.5 2024-09-15 07:10:39,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=251061.66666666666, ans=0.125 2024-09-15 07:11:09,774 INFO [train.py:1198] (1/2) Epoch 14, batch 5550, loss[loss=0.2435, ctc_loss=0.1673, cr_loss=0.3813, over 21010.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1742, cr_loss=0.3925, over 4102784.49 frames. ], batch size: 63, lr: 5.92e-03, grad_scale: 16.0 2024-09-15 07:11:13,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=251118.33333333334, ans=0.04949747468305833 2024-09-15 07:11:17,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=251118.33333333334, ans=0.0 2024-09-15 07:11:30,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=251146.66666666666, ans=0.125 2024-09-15 07:11:45,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=251175.0, ans=0.125 2024-09-15 07:11:51,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=251175.0, ans=0.05 2024-09-15 07:11:55,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=251203.33333333334, ans=0.2 2024-09-15 07:12:01,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251203.33333333334, ans=0.1 2024-09-15 07:12:23,637 INFO [train.py:1198] (1/2) Epoch 14, batch 5600, loss[loss=0.2734, ctc_loss=0.1889, cr_loss=0.4226, over 20218.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1744, cr_loss=0.3933, over 4097657.58 frames. ], batch size: 74, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:12:35,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.071e+02 2.210e+02 2.433e+02 4.468e+02, threshold=4.420e+02, percent-clipped=1.0 2024-09-15 07:12:37,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251288.33333333334, ans=0.125 2024-09-15 07:12:40,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251288.33333333334, ans=0.125 2024-09-15 07:12:42,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=251288.33333333334, ans=0.125 2024-09-15 07:12:44,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=251288.33333333334, ans=0.0 2024-09-15 07:13:05,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=251316.66666666666, ans=0.125 2024-09-15 07:13:13,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251345.0, ans=0.1 2024-09-15 07:13:40,038 INFO [train.py:1198] (1/2) Epoch 14, batch 5650, loss[loss=0.233, ctc_loss=0.1591, cr_loss=0.3695, over 20999.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1737, cr_loss=0.3925, over 4108159.38 frames. ], batch size: 49, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:13:43,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=251401.66666666666, ans=0.025 2024-09-15 07:14:17,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251458.33333333334, ans=0.0 2024-09-15 07:14:20,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=251458.33333333334, ans=0.125 2024-09-15 07:14:27,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=251486.66666666666, ans=0.2 2024-09-15 07:14:27,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=251486.66666666666, ans=0.2 2024-09-15 07:14:44,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=251515.0, ans=0.125 2024-09-15 07:14:49,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.06 vs. limit=22.5 2024-09-15 07:14:54,264 INFO [train.py:1198] (1/2) Epoch 14, batch 5700, loss[loss=0.2531, ctc_loss=0.1781, cr_loss=0.3749, over 19345.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1737, cr_loss=0.3916, over 4096549.14 frames. ], batch size: 90, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:15:05,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.065e+02 2.164e+02 2.402e+02 3.994e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 07:15:07,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251571.66666666666, ans=0.1 2024-09-15 07:15:09,237 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 07:15:28,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251600.0, ans=0.1 2024-09-15 07:15:42,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-09-15 07:15:49,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-15 07:16:09,348 INFO [train.py:1198] (1/2) Epoch 14, batch 5750, loss[loss=0.2387, ctc_loss=0.1651, cr_loss=0.3677, over 20873.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1725, cr_loss=0.3885, over 4088109.34 frames. ], batch size: 57, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:16:17,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2024-09-15 07:16:35,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251713.33333333334, ans=0.1 2024-09-15 07:16:53,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=12.0 2024-09-15 07:17:22,954 INFO [train.py:1198] (1/2) Epoch 14, batch 5800, loss[loss=0.2681, ctc_loss=0.1841, cr_loss=0.4201, over 20314.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1733, cr_loss=0.39, over 4091556.94 frames. ], batch size: 74, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:17:34,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.056e+02 2.207e+02 2.375e+02 3.164e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-15 07:17:34,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=251826.66666666666, ans=0.0 2024-09-15 07:18:08,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251911.66666666666, ans=0.0 2024-09-15 07:18:13,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=251911.66666666666, ans=0.0 2024-09-15 07:18:36,475 INFO [train.py:1198] (1/2) Epoch 14, batch 5850, loss[loss=0.2943, ctc_loss=0.2084, cr_loss=0.4294, over 19979.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1744, cr_loss=0.3918, over 4091577.94 frames. ], batch size: 80, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:19:50,228 INFO [train.py:1198] (1/2) Epoch 14, batch 5900, loss[loss=0.2281, ctc_loss=0.1554, cr_loss=0.3632, over 20968.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1744, cr_loss=0.392, over 4095077.65 frames. ], batch size: 55, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:20:03,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.085e+02 2.209e+02 2.390e+02 5.116e+02, threshold=4.418e+02, percent-clipped=1.0 2024-09-15 07:20:06,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=252138.33333333334, ans=0.0 2024-09-15 07:20:07,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-15 07:20:14,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=252138.33333333334, ans=0.0 2024-09-15 07:20:19,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-09-15 07:20:36,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=252195.0, ans=0.0 2024-09-15 07:21:03,927 INFO [train.py:1198] (1/2) Epoch 14, batch 5950, loss[loss=0.2985, ctc_loss=0.2134, cr_loss=0.4255, over 18497.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1742, cr_loss=0.3919, over 4090884.09 frames. ], batch size: 108, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:21:41,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=252308.33333333334, ans=0.0 2024-09-15 07:21:57,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=252336.66666666666, ans=0.125 2024-09-15 07:22:01,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-15 07:22:19,591 INFO [train.py:1198] (1/2) Epoch 14, batch 6000, loss[loss=0.2695, ctc_loss=0.1858, cr_loss=0.4184, over 20885.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1735, cr_loss=0.3908, over 4099010.86 frames. ], batch size: 57, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:22:19,591 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 07:22:46,105 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.04719, ctc_loss=0.04719, cr_loss=9.826e-15, over 944034.00 frames. 2024-09-15 07:22:46,106 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 07:23:00,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-15 07:23:01,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.076e+02 2.229e+02 2.452e+02 3.696e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 07:23:45,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=252478.33333333334, ans=0.0 2024-09-15 07:24:02,646 INFO [train.py:1198] (1/2) Epoch 14, batch 6050, loss[loss=0.2509, ctc_loss=0.1719, cr_loss=0.3949, over 20870.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1719, cr_loss=0.388, over 4102445.04 frames. ], batch size: 54, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:24:32,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-15 07:24:44,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=252591.66666666666, ans=0.0 2024-09-15 07:24:54,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=252620.0, ans=0.125 2024-09-15 07:25:16,315 INFO [train.py:1198] (1/2) Epoch 14, batch 6100, loss[loss=0.3312, ctc_loss=0.2467, cr_loss=0.4225, over 13828.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1712, cr_loss=0.3861, over 4096035.76 frames. ], batch size: 149, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:25:20,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=252676.66666666666, ans=0.2 2024-09-15 07:25:29,244 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.689e+02 2.050e+02 2.187e+02 2.369e+02 4.237e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-15 07:25:51,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=252733.33333333334, ans=0.0 2024-09-15 07:25:51,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252733.33333333334, ans=0.1 2024-09-15 07:26:06,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252761.66666666666, ans=0.125 2024-09-15 07:26:09,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=252761.66666666666, ans=0.125 2024-09-15 07:26:12,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252761.66666666666, ans=0.125 2024-09-15 07:26:23,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=252790.0, ans=0.025 2024-09-15 07:26:29,717 INFO [train.py:1198] (1/2) Epoch 14, batch 6150, loss[loss=0.2759, ctc_loss=0.1928, cr_loss=0.4155, over 20697.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.171, cr_loss=0.3848, over 4067807.06 frames. ], batch size: 71, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:26:44,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2024-09-15 07:27:11,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=252875.0, ans=0.125 2024-09-15 07:27:38,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=252931.66666666666, ans=0.05 2024-09-15 07:27:42,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252960.0, ans=0.1 2024-09-15 07:27:43,174 INFO [train.py:1198] (1/2) Epoch 14, batch 6200, loss[loss=0.2759, ctc_loss=0.1905, cr_loss=0.4267, over 20717.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1707, cr_loss=0.3853, over 4071845.73 frames. ], batch size: 71, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:27:46,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252960.0, ans=0.1 2024-09-15 07:27:49,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252960.0, ans=0.125 2024-09-15 07:27:56,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.054e+02 2.224e+02 2.506e+02 4.746e+02, threshold=4.447e+02, percent-clipped=2.0 2024-09-15 07:28:07,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-09-15 07:28:15,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=253016.66666666666, ans=0.125 2024-09-15 07:28:17,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=253016.66666666666, ans=0.125 2024-09-15 07:28:18,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=253016.66666666666, ans=0.0 2024-09-15 07:28:25,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.03 vs. limit=15.0 2024-09-15 07:28:31,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=253045.0, ans=0.0 2024-09-15 07:28:37,943 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 07:28:49,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=253073.33333333334, ans=0.125 2024-09-15 07:28:57,450 INFO [train.py:1198] (1/2) Epoch 14, batch 6250, loss[loss=0.2852, ctc_loss=0.198, cr_loss=0.4357, over 20699.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1705, cr_loss=0.3848, over 4051542.38 frames. ], batch size: 71, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:29:08,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=253101.66666666666, ans=6.0 2024-09-15 07:29:27,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=253158.33333333334, ans=0.125 2024-09-15 07:29:38,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-09-15 07:30:11,502 INFO [train.py:1198] (1/2) Epoch 14, batch 6300, loss[loss=0.2676, ctc_loss=0.1882, cr_loss=0.3971, over 21043.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1714, cr_loss=0.3855, over 4044309.81 frames. ], batch size: 62, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:30:25,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.051e+02 2.204e+02 2.467e+02 4.348e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-15 07:30:51,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=22.5 2024-09-15 07:31:11,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=253356.66666666666, ans=0.0 2024-09-15 07:31:22,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-09-15 07:31:25,664 INFO [train.py:1198] (1/2) Epoch 14, batch 6350, loss[loss=0.322, ctc_loss=0.2375, cr_loss=0.4222, over 14206.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1762, cr_loss=0.3899, over 3967845.72 frames. ], batch size: 149, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:31:31,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=253385.0, ans=0.0 2024-09-15 07:32:02,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=253441.66666666666, ans=0.125 2024-09-15 07:33:10,806 INFO [train.py:1198] (1/2) Epoch 15, batch 0, loss[loss=0.2445, ctc_loss=0.1669, cr_loss=0.3876, over 21045.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1669, cr_loss=0.3876, over 21045.00 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:33:10,807 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 07:33:16,871 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9814, 5.6187, 5.5766, 5.8172], device='cuda:1') 2024-09-15 07:33:18,471 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0869, 4.6496, 3.5213, 4.1414], device='cuda:1') 2024-09-15 07:33:29,097 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.04831, ctc_loss=0.04831, cr_loss=9.472e-15, over 944034.00 frames. 2024-09-15 07:33:29,098 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 07:33:30,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=253501.16666666666, ans=0.125 2024-09-15 07:33:32,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253501.16666666666, ans=0.125 2024-09-15 07:33:33,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=253501.16666666666, ans=0.0 2024-09-15 07:33:37,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.54 vs. limit=22.5 2024-09-15 07:33:43,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-09-15 07:33:56,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.310e+02 2.553e+02 2.783e+02 4.142e+02, threshold=5.106e+02, percent-clipped=0.0 2024-09-15 07:34:01,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=253557.83333333334, ans=0.035 2024-09-15 07:34:05,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=253557.83333333334, ans=0.125 2024-09-15 07:34:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253557.83333333334, ans=0.1 2024-09-15 07:34:24,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=253586.16666666666, ans=0.125 2024-09-15 07:34:32,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=253614.5, ans=0.025 2024-09-15 07:34:41,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=253614.5, ans=0.2 2024-09-15 07:34:46,211 INFO [train.py:1198] (1/2) Epoch 15, batch 50, loss[loss=0.2359, ctc_loss=0.1601, cr_loss=0.3794, over 20941.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1742, cr_loss=0.3955, over 936131.50 frames. ], batch size: 60, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:34:49,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=253642.83333333334, ans=0.125 2024-09-15 07:34:55,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=253642.83333333334, ans=12.0 2024-09-15 07:35:19,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=253699.5, ans=0.125 2024-09-15 07:35:40,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=253727.83333333334, ans=0.125 2024-09-15 07:36:00,857 INFO [train.py:1198] (1/2) Epoch 15, batch 100, loss[loss=0.2096, ctc_loss=0.1417, cr_loss=0.3393, over 19860.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1752, cr_loss=0.3939, over 1628599.13 frames. ], batch size: 44, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:36:13,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-09-15 07:36:15,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=253812.83333333334, ans=0.125 2024-09-15 07:36:27,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.058e+02 2.257e+02 2.510e+02 3.786e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-15 07:37:14,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-15 07:37:18,679 INFO [train.py:1198] (1/2) Epoch 15, batch 150, loss[loss=0.2578, ctc_loss=0.1774, cr_loss=0.4019, over 19982.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1732, cr_loss=0.3928, over 2184396.40 frames. ], batch size: 80, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:37:50,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=253982.83333333334, ans=0.125 2024-09-15 07:38:02,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=254011.16666666666, ans=0.125 2024-09-15 07:38:07,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=254011.16666666666, ans=0.0 2024-09-15 07:38:11,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=254011.16666666666, ans=0.125 2024-09-15 07:38:19,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=254039.5, ans=0.0 2024-09-15 07:38:34,104 INFO [train.py:1198] (1/2) Epoch 15, batch 200, loss[loss=0.2766, ctc_loss=0.1959, cr_loss=0.4038, over 20619.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1737, cr_loss=0.3928, over 2597750.14 frames. ], batch size: 66, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:38:38,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=254067.83333333334, ans=0.125 2024-09-15 07:38:50,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254096.16666666666, ans=0.125 2024-09-15 07:38:59,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254096.16666666666, ans=0.125 2024-09-15 07:39:00,994 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 1.990e+02 2.141e+02 2.354e+02 4.678e+02, threshold=4.282e+02, percent-clipped=1.0 2024-09-15 07:39:03,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-15 07:39:13,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=254124.5, ans=0.0 2024-09-15 07:39:13,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=254124.5, ans=0.125 2024-09-15 07:39:48,702 INFO [train.py:1198] (1/2) Epoch 15, batch 250, loss[loss=0.2646, ctc_loss=0.1822, cr_loss=0.4119, over 20684.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1743, cr_loss=0.3926, over 2927381.06 frames. ], batch size: 66, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:40:27,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=254266.16666666666, ans=0.0 2024-09-15 07:40:46,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=254294.5, ans=22.5 2024-09-15 07:40:49,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=254294.5, ans=0.125 2024-09-15 07:41:07,330 INFO [train.py:1198] (1/2) Epoch 15, batch 300, loss[loss=0.2789, ctc_loss=0.1935, cr_loss=0.4269, over 19591.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1731, cr_loss=0.3916, over 3195001.09 frames. ], batch size: 90, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:41:09,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254351.16666666666, ans=0.1 2024-09-15 07:41:21,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=254379.5, ans=10.0 2024-09-15 07:41:31,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=254379.5, ans=0.125 2024-09-15 07:41:34,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.029e+02 2.143e+02 2.306e+02 3.853e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 07:41:41,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-15 07:42:06,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254464.5, ans=0.1 2024-09-15 07:42:22,478 INFO [train.py:1198] (1/2) Epoch 15, batch 350, loss[loss=0.2768, ctc_loss=0.1944, cr_loss=0.4122, over 20664.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1737, cr_loss=0.3926, over 3392096.85 frames. ], batch size: 66, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:42:25,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=254492.83333333334, ans=0.125 2024-09-15 07:42:35,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-09-15 07:43:19,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=254577.83333333334, ans=0.125 2024-09-15 07:43:23,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254577.83333333334, ans=0.125 2024-09-15 07:43:40,458 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.034e-02 2024-09-15 07:43:41,531 INFO [train.py:1198] (1/2) Epoch 15, batch 400, loss[loss=0.2494, ctc_loss=0.1704, cr_loss=0.3948, over 21034.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1731, cr_loss=0.3912, over 3545915.78 frames. ], batch size: 62, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:43:56,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=254662.83333333334, ans=0.2 2024-09-15 07:44:08,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.048e+02 2.187e+02 2.410e+02 6.467e+02, threshold=4.375e+02, percent-clipped=1.0 2024-09-15 07:44:56,512 INFO [train.py:1198] (1/2) Epoch 15, batch 450, loss[loss=0.247, ctc_loss=0.17, cr_loss=0.3852, over 20334.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1722, cr_loss=0.3885, over 3639344.83 frames. ], batch size: 74, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:45:14,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=254804.5, ans=0.125 2024-09-15 07:45:27,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=254832.83333333334, ans=0.05 2024-09-15 07:45:35,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=254832.83333333334, ans=0.125 2024-09-15 07:46:02,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=254889.5, ans=0.125 2024-09-15 07:46:14,415 INFO [train.py:1198] (1/2) Epoch 15, batch 500, loss[loss=0.2135, ctc_loss=0.1469, cr_loss=0.3331, over 20966.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1723, cr_loss=0.3876, over 3730341.11 frames. ], batch size: 49, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:46:29,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=254946.16666666666, ans=0.09899494936611666 2024-09-15 07:46:41,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.092e+02 2.215e+02 2.567e+02 3.534e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-15 07:46:46,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=254974.5, ans=10.0 2024-09-15 07:46:59,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=255002.83333333334, ans=0.2 2024-09-15 07:47:06,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=255002.83333333334, ans=0.125 2024-09-15 07:47:12,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255002.83333333334, ans=0.125 2024-09-15 07:47:20,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-15 07:47:30,109 INFO [train.py:1198] (1/2) Epoch 15, batch 550, loss[loss=0.2438, ctc_loss=0.1666, cr_loss=0.3856, over 20666.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1706, cr_loss=0.3855, over 3815774.30 frames. ], batch size: 68, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:47:36,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255059.5, ans=0.125 2024-09-15 07:47:41,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=255059.5, ans=0.0 2024-09-15 07:47:41,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-15 07:48:05,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=255116.16666666666, ans=0.0 2024-09-15 07:48:06,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=255116.16666666666, ans=0.125 2024-09-15 07:48:31,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=255172.83333333334, ans=0.125 2024-09-15 07:48:48,606 INFO [train.py:1198] (1/2) Epoch 15, batch 600, loss[loss=0.2779, ctc_loss=0.1929, cr_loss=0.4255, over 20671.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1701, cr_loss=0.3854, over 3881468.11 frames. ], batch size: 71, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:49:13,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=255229.5, ans=0.2 2024-09-15 07:49:16,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 1.988e+02 2.133e+02 2.327e+02 2.983e+02, threshold=4.266e+02, percent-clipped=0.0 2024-09-15 07:49:28,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=255257.83333333334, ans=0.125 2024-09-15 07:49:46,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=255286.16666666666, ans=22.5 2024-09-15 07:49:50,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=255314.5, ans=0.2 2024-09-15 07:49:53,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=255314.5, ans=0.0 2024-09-15 07:50:03,867 INFO [train.py:1198] (1/2) Epoch 15, batch 650, loss[loss=0.2319, ctc_loss=0.1592, cr_loss=0.3633, over 21003.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.171, cr_loss=0.3866, over 3926753.93 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:50:44,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2024-09-15 07:50:47,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=255427.83333333334, ans=0.025 2024-09-15 07:50:59,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=255427.83333333334, ans=0.2 2024-09-15 07:51:12,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-15 07:51:17,974 INFO [train.py:1198] (1/2) Epoch 15, batch 700, loss[loss=0.2431, ctc_loss=0.1683, cr_loss=0.374, over 20763.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1718, cr_loss=0.3881, over 3968455.82 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:51:48,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.037e+02 2.195e+02 2.389e+02 6.615e+02, threshold=4.391e+02, percent-clipped=1.0 2024-09-15 07:52:09,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255569.5, ans=0.0 2024-09-15 07:52:28,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=255597.83333333334, ans=0.125 2024-09-15 07:52:36,128 INFO [train.py:1198] (1/2) Epoch 15, batch 750, loss[loss=0.2226, ctc_loss=0.1522, cr_loss=0.3519, over 21020.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1728, cr_loss=0.3891, over 3985653.73 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:52:37,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=255626.16666666666, ans=0.5 2024-09-15 07:52:39,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=255626.16666666666, ans=0.09899494936611666 2024-09-15 07:52:44,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=255626.16666666666, ans=12.0 2024-09-15 07:53:02,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-15 07:53:08,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=255682.83333333334, ans=10.0 2024-09-15 07:53:15,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255682.83333333334, ans=0.125 2024-09-15 07:53:22,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-09-15 07:53:51,478 INFO [train.py:1198] (1/2) Epoch 15, batch 800, loss[loss=0.2569, ctc_loss=0.1792, cr_loss=0.3883, over 21040.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1721, cr_loss=0.388, over 4013089.79 frames. ], batch size: 62, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:53:59,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=255767.83333333334, ans=0.04949747468305833 2024-09-15 07:54:00,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=255767.83333333334, ans=0.0 2024-09-15 07:54:21,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.653e+02 2.007e+02 2.109e+02 2.286e+02 3.573e+02, threshold=4.219e+02, percent-clipped=0.0 2024-09-15 07:54:40,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=255852.83333333334, ans=0.0 2024-09-15 07:54:46,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=22.5 2024-09-15 07:55:08,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=255909.5, ans=0.0 2024-09-15 07:55:09,704 INFO [train.py:1198] (1/2) Epoch 15, batch 850, loss[loss=0.242, ctc_loss=0.1689, cr_loss=0.3655, over 20828.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1707, cr_loss=0.3863, over 4043639.31 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:55:13,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=255909.5, ans=0.0 2024-09-15 07:55:25,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255937.83333333334, ans=0.0 2024-09-15 07:55:56,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=255994.5, ans=0.2 2024-09-15 07:56:20,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-15 07:56:24,574 INFO [train.py:1198] (1/2) Epoch 15, batch 900, loss[loss=0.2248, ctc_loss=0.1517, cr_loss=0.3652, over 20978.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1702, cr_loss=0.3854, over 4052350.36 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:56:51,856 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.068e+02 2.217e+02 2.381e+02 3.265e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-15 07:56:52,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256079.5, ans=0.1 2024-09-15 07:57:21,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=256136.16666666666, ans=0.025 2024-09-15 07:57:23,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-15 07:57:43,554 INFO [train.py:1198] (1/2) Epoch 15, batch 950, loss[loss=0.2638, ctc_loss=0.1855, cr_loss=0.3912, over 20621.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1705, cr_loss=0.3861, over 4071161.74 frames. ], batch size: 68, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 07:57:45,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-15 07:58:42,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=256306.16666666666, ans=0.125 2024-09-15 07:58:58,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-15 07:58:59,015 INFO [train.py:1198] (1/2) Epoch 15, batch 1000, loss[loss=0.2687, ctc_loss=0.1861, cr_loss=0.4125, over 21060.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.171, cr_loss=0.3867, over 4071443.70 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 07:59:06,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=256334.5, ans=0.0 2024-09-15 07:59:14,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=256362.83333333334, ans=0.95 2024-09-15 07:59:26,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.055e+02 2.232e+02 2.541e+02 4.564e+02, threshold=4.464e+02, percent-clipped=1.0 2024-09-15 07:59:38,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=256391.16666666666, ans=0.125 2024-09-15 07:59:38,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256391.16666666666, ans=0.125 2024-09-15 07:59:47,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-09-15 08:00:00,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=256447.83333333334, ans=0.0 2024-09-15 08:00:04,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-15 08:00:16,217 INFO [train.py:1198] (1/2) Epoch 15, batch 1050, loss[loss=0.2276, ctc_loss=0.1568, cr_loss=0.3538, over 20810.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1719, cr_loss=0.3882, over 4073344.18 frames. ], batch size: 53, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:00:28,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=256476.16666666666, ans=0.025 2024-09-15 08:01:03,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.92 vs. limit=10.0 2024-09-15 08:01:07,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256561.16666666666, ans=0.1 2024-09-15 08:01:27,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-15 08:01:31,363 INFO [train.py:1198] (1/2) Epoch 15, batch 1100, loss[loss=0.255, ctc_loss=0.1738, cr_loss=0.4058, over 20691.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.172, cr_loss=0.3885, over 4078112.64 frames. ], batch size: 71, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:01:51,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=256646.16666666666, ans=0.125 2024-09-15 08:01:51,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=256646.16666666666, ans=0.125 2024-09-15 08:01:52,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=256646.16666666666, ans=0.025 2024-09-15 08:01:58,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.042e+02 2.228e+02 2.394e+02 4.223e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-15 08:02:21,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=256702.83333333334, ans=0.125 2024-09-15 08:02:26,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256702.83333333334, ans=0.1 2024-09-15 08:02:34,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-09-15 08:02:36,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=256731.16666666666, ans=0.5 2024-09-15 08:02:46,889 INFO [train.py:1198] (1/2) Epoch 15, batch 1150, loss[loss=0.251, ctc_loss=0.1764, cr_loss=0.3731, over 21058.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1724, cr_loss=0.3891, over 4087485.84 frames. ], batch size: 63, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:02:50,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=256759.5, ans=0.2 2024-09-15 08:02:54,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=256759.5, ans=0.035 2024-09-15 08:03:47,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256844.5, ans=0.125 2024-09-15 08:03:54,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=256872.83333333334, ans=0.125 2024-09-15 08:04:00,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256872.83333333334, ans=0.125 2024-09-15 08:04:04,725 INFO [train.py:1198] (1/2) Epoch 15, batch 1200, loss[loss=0.2772, ctc_loss=0.1964, cr_loss=0.4038, over 19425.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1724, cr_loss=0.3898, over 4088666.72 frames. ], batch size: 90, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:04:04,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=256901.16666666666, ans=0.015 2024-09-15 08:04:31,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.078e+02 2.265e+02 2.538e+02 5.320e+02, threshold=4.531e+02, percent-clipped=1.0 2024-09-15 08:04:42,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256957.83333333334, ans=0.1 2024-09-15 08:05:19,819 INFO [train.py:1198] (1/2) Epoch 15, batch 1250, loss[loss=0.2028, ctc_loss=0.1381, cr_loss=0.3232, over 20934.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1729, cr_loss=0.3903, over 4085153.73 frames. ], batch size: 49, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:05:31,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=257042.83333333334, ans=6.0 2024-09-15 08:05:34,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-15 08:06:03,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=257099.5, ans=0.125 2024-09-15 08:06:08,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257127.83333333334, ans=0.1 2024-09-15 08:06:37,933 INFO [train.py:1198] (1/2) Epoch 15, batch 1300, loss[loss=0.3533, ctc_loss=0.2634, cr_loss=0.4493, over 14803.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1733, cr_loss=0.3908, over 4069691.50 frames. ], batch size: 150, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:07:03,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=257212.83333333334, ans=0.2 2024-09-15 08:07:04,974 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.016e+02 2.233e+02 2.486e+02 3.513e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-15 08:07:14,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=257241.16666666666, ans=0.125 2024-09-15 08:07:19,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=22.5 2024-09-15 08:07:45,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=257297.83333333334, ans=0.025 2024-09-15 08:07:53,154 INFO [train.py:1198] (1/2) Epoch 15, batch 1350, loss[loss=0.2172, ctc_loss=0.1458, cr_loss=0.3572, over 19840.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1732, cr_loss=0.39, over 4078080.51 frames. ], batch size: 44, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:08:00,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-09-15 08:08:20,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=257354.5, ans=0.0 2024-09-15 08:08:54,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=257411.16666666666, ans=0.125 2024-09-15 08:08:58,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=257439.5, ans=0.125 2024-09-15 08:09:11,578 INFO [train.py:1198] (1/2) Epoch 15, batch 1400, loss[loss=0.2638, ctc_loss=0.1825, cr_loss=0.4068, over 20661.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1725, cr_loss=0.3891, over 4084310.36 frames. ], batch size: 68, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:09:17,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=257467.83333333334, ans=0.2 2024-09-15 08:09:38,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.041e+02 2.205e+02 2.399e+02 3.235e+02, threshold=4.410e+02, percent-clipped=0.0 2024-09-15 08:09:40,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=257524.5, ans=0.0 2024-09-15 08:09:40,548 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:10:26,211 INFO [train.py:1198] (1/2) Epoch 15, batch 1450, loss[loss=0.2667, ctc_loss=0.182, cr_loss=0.4235, over 21074.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1723, cr_loss=0.3894, over 4089735.54 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 64.0 2024-09-15 08:10:31,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=257609.5, ans=0.125 2024-09-15 08:10:44,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257637.83333333334, ans=0.125 2024-09-15 08:11:14,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=257694.5, ans=0.0 2024-09-15 08:11:31,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-15 08:11:41,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-15 08:11:44,373 INFO [train.py:1198] (1/2) Epoch 15, batch 1500, loss[loss=0.2659, ctc_loss=0.1845, cr_loss=0.4067, over 21009.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.172, cr_loss=0.3892, over 4090879.59 frames. ], batch size: 63, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:12:12,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.042e+02 2.175e+02 2.452e+02 3.722e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 08:12:59,190 INFO [train.py:1198] (1/2) Epoch 15, batch 1550, loss[loss=0.2327, ctc_loss=0.1594, cr_loss=0.3663, over 21058.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1729, cr_loss=0.3899, over 4083257.54 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:13:10,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257892.83333333334, ans=0.1 2024-09-15 08:13:11,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=257892.83333333334, ans=0.0 2024-09-15 08:13:11,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-15 08:13:29,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257949.5, ans=0.125 2024-09-15 08:13:39,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=257949.5, ans=10.0 2024-09-15 08:13:51,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.63 vs. limit=10.0 2024-09-15 08:14:16,802 INFO [train.py:1198] (1/2) Epoch 15, batch 1600, loss[loss=0.2308, ctc_loss=0.1577, cr_loss=0.3653, over 21048.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1719, cr_loss=0.3884, over 4088426.78 frames. ], batch size: 53, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:14:38,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-15 08:14:45,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.047e+02 2.165e+02 2.331e+02 3.063e+02, threshold=4.331e+02, percent-clipped=0.0 2024-09-15 08:14:53,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=258091.16666666666, ans=0.125 2024-09-15 08:15:02,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-15 08:15:11,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:15:16,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=258147.83333333334, ans=15.0 2024-09-15 08:15:19,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258147.83333333334, ans=0.125 2024-09-15 08:15:32,754 INFO [train.py:1198] (1/2) Epoch 15, batch 1650, loss[loss=0.2784, ctc_loss=0.1965, cr_loss=0.4099, over 20073.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1708, cr_loss=0.3867, over 4095713.08 frames. ], batch size: 80, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:16:15,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=258261.16666666666, ans=0.0 2024-09-15 08:16:21,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-09-15 08:16:25,138 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.972e-03 2024-09-15 08:16:28,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=258261.16666666666, ans=0.125 2024-09-15 08:16:32,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-09-15 08:16:47,117 INFO [train.py:1198] (1/2) Epoch 15, batch 1700, loss[loss=0.29, ctc_loss=0.2011, cr_loss=0.4445, over 20679.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1718, cr_loss=0.3882, over 4095640.97 frames. ], batch size: 66, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:17:18,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.071e+02 2.208e+02 2.414e+02 4.241e+02, threshold=4.417e+02, percent-clipped=0.0 2024-09-15 08:17:36,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=258402.83333333334, ans=0.035 2024-09-15 08:17:53,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=258431.16666666666, ans=0.0 2024-09-15 08:18:03,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=258459.5, ans=0.125 2024-09-15 08:18:04,974 INFO [train.py:1198] (1/2) Epoch 15, batch 1750, loss[loss=0.2406, ctc_loss=0.1652, cr_loss=0.3769, over 20773.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1718, cr_loss=0.3892, over 4106513.23 frames. ], batch size: 56, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:18:57,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:19:03,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=258544.5, ans=0.2 2024-09-15 08:19:20,807 INFO [train.py:1198] (1/2) Epoch 15, batch 1800, loss[loss=0.2442, ctc_loss=0.1692, cr_loss=0.3749, over 21062.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1719, cr_loss=0.3894, over 4115121.13 frames. ], batch size: 59, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:19:31,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=258601.16666666666, ans=0.0 2024-09-15 08:19:49,451 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.042e+02 2.182e+02 2.362e+02 3.275e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 08:20:28,757 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:20:36,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=258714.5, ans=0.125 2024-09-15 08:20:37,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=258742.83333333334, ans=0.125 2024-09-15 08:20:38,881 INFO [train.py:1198] (1/2) Epoch 15, batch 1850, loss[loss=0.245, ctc_loss=0.1699, cr_loss=0.3757, over 20799.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1722, cr_loss=0.3894, over 4109420.98 frames. ], batch size: 53, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:20:43,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=258742.83333333334, ans=0.0 2024-09-15 08:20:57,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2024-09-15 08:20:58,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=258771.16666666666, ans=0.0 2024-09-15 08:21:15,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-15 08:21:33,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=258827.83333333334, ans=0.125 2024-09-15 08:21:43,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258856.16666666666, ans=0.1 2024-09-15 08:21:47,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258856.16666666666, ans=0.125 2024-09-15 08:21:54,076 INFO [train.py:1198] (1/2) Epoch 15, batch 1900, loss[loss=0.271, ctc_loss=0.1847, cr_loss=0.4313, over 20870.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1721, cr_loss=0.3893, over 4114178.12 frames. ], batch size: 65, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:22:02,037 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:22:04,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=258884.5, ans=0.0 2024-09-15 08:22:06,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=258884.5, ans=0.2 2024-09-15 08:22:22,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.066e+02 2.248e+02 2.501e+02 3.115e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-15 08:22:55,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258997.83333333334, ans=0.1 2024-09-15 08:23:05,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-15 08:23:11,930 INFO [train.py:1198] (1/2) Epoch 15, batch 1950, loss[loss=0.283, ctc_loss=0.1995, cr_loss=0.4172, over 18098.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1722, cr_loss=0.3889, over 4106579.90 frames. ], batch size: 108, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:23:21,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=259026.16666666666, ans=10.0 2024-09-15 08:23:25,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=259054.5, ans=0.07 2024-09-15 08:23:56,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=259111.16666666666, ans=0.0 2024-09-15 08:24:15,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2024-09-15 08:24:26,630 INFO [train.py:1198] (1/2) Epoch 15, batch 2000, loss[loss=0.2245, ctc_loss=0.1529, cr_loss=0.3579, over 20963.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1725, cr_loss=0.3897, over 4108155.25 frames. ], batch size: 58, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:24:54,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.039e+02 2.219e+02 2.423e+02 5.335e+02, threshold=4.438e+02, percent-clipped=1.0 2024-09-15 08:25:35,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=259281.16666666666, ans=0.125 2024-09-15 08:25:38,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=259281.16666666666, ans=0.0 2024-09-15 08:25:44,111 INFO [train.py:1198] (1/2) Epoch 15, batch 2050, loss[loss=0.2621, ctc_loss=0.1774, cr_loss=0.4238, over 20904.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1731, cr_loss=0.3914, over 4102844.38 frames. ], batch size: 54, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:26:01,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=259337.83333333334, ans=0.0 2024-09-15 08:26:59,098 INFO [train.py:1198] (1/2) Epoch 15, batch 2100, loss[loss=0.3013, ctc_loss=0.2148, cr_loss=0.4325, over 18452.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1732, cr_loss=0.3909, over 4091122.42 frames. ], batch size: 108, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:27:04,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-15 08:27:08,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-09-15 08:27:27,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.077e+02 2.200e+02 2.420e+02 5.649e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 08:27:39,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=259507.83333333334, ans=0.0 2024-09-15 08:28:14,521 INFO [train.py:1198] (1/2) Epoch 15, batch 2150, loss[loss=0.2738, ctc_loss=0.1917, cr_loss=0.4105, over 20646.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1714, cr_loss=0.3873, over 4090494.36 frames. ], batch size: 66, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:28:18,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-15 08:28:19,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=259592.83333333334, ans=0.0 2024-09-15 08:29:04,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=259677.83333333334, ans=0.125 2024-09-15 08:29:32,235 INFO [train.py:1198] (1/2) Epoch 15, batch 2200, loss[loss=0.2767, ctc_loss=0.189, cr_loss=0.4386, over 20945.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1725, cr_loss=0.3899, over 4096238.18 frames. ], batch size: 60, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:29:32,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=259734.5, ans=0.2 2024-09-15 08:29:53,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=259762.83333333334, ans=0.2 2024-09-15 08:30:00,977 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.042e+02 2.122e+02 2.305e+02 3.906e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-15 08:30:25,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259819.5, ans=0.1 2024-09-15 08:30:29,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259819.5, ans=0.1 2024-09-15 08:30:41,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259847.83333333334, ans=0.1 2024-09-15 08:30:46,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=259876.16666666666, ans=0.0 2024-09-15 08:30:47,714 INFO [train.py:1198] (1/2) Epoch 15, batch 2250, loss[loss=0.2604, ctc_loss=0.18, cr_loss=0.402, over 20994.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1724, cr_loss=0.3902, over 4099245.39 frames. ], batch size: 61, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:30:48,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=259876.16666666666, ans=0.125 2024-09-15 08:30:49,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=259876.16666666666, ans=0.0 2024-09-15 08:30:58,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-09-15 08:31:13,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=259904.5, ans=0.0 2024-09-15 08:31:22,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=259932.83333333334, ans=0.125 2024-09-15 08:31:25,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-15 08:31:26,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-09-15 08:31:40,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=259961.16666666666, ans=0.2 2024-09-15 08:31:57,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=259989.5, ans=0.125 2024-09-15 08:32:07,767 INFO [train.py:1198] (1/2) Epoch 15, batch 2300, loss[loss=0.2091, ctc_loss=0.1407, cr_loss=0.3419, over 20973.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.172, cr_loss=0.3901, over 4110603.48 frames. ], batch size: 49, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:32:29,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=260046.16666666666, ans=0.125 2024-09-15 08:32:32,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-15 08:32:36,366 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.068e+02 2.299e+02 2.502e+02 4.196e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-15 08:33:00,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-15 08:33:02,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-09-15 08:33:09,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=260131.16666666666, ans=0.025 2024-09-15 08:33:22,748 INFO [train.py:1198] (1/2) Epoch 15, batch 2350, loss[loss=0.2675, ctc_loss=0.1891, cr_loss=0.3919, over 19287.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1723, cr_loss=0.391, over 4113051.62 frames. ], batch size: 90, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:33:48,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=260187.83333333334, ans=0.125 2024-09-15 08:33:54,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260216.16666666666, ans=0.125 2024-09-15 08:34:41,280 INFO [train.py:1198] (1/2) Epoch 15, batch 2400, loss[loss=0.2243, ctc_loss=0.1535, cr_loss=0.3542, over 20961.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1714, cr_loss=0.3902, over 4115258.51 frames. ], batch size: 55, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:34:44,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=260301.16666666666, ans=0.04949747468305833 2024-09-15 08:34:54,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-15 08:34:55,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=260329.5, ans=0.125 2024-09-15 08:35:09,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.020e+02 2.145e+02 2.307e+02 3.011e+02, threshold=4.290e+02, percent-clipped=0.0 2024-09-15 08:35:46,346 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:35:56,280 INFO [train.py:1198] (1/2) Epoch 15, batch 2450, loss[loss=0.2131, ctc_loss=0.1433, cr_loss=0.3491, over 20991.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1702, cr_loss=0.3885, over 4116458.22 frames. ], batch size: 52, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:35:59,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=260442.83333333334, ans=0.125 2024-09-15 08:36:10,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=260471.16666666666, ans=0.1 2024-09-15 08:36:57,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=260556.16666666666, ans=0.0 2024-09-15 08:36:58,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-15 08:37:14,339 INFO [train.py:1198] (1/2) Epoch 15, batch 2500, loss[loss=0.2546, ctc_loss=0.174, cr_loss=0.4032, over 20929.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1698, cr_loss=0.388, over 4111677.77 frames. ], batch size: 49, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:37:30,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2024-09-15 08:37:42,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.723e+02 2.050e+02 2.265e+02 2.531e+02 4.236e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-15 08:38:30,656 INFO [train.py:1198] (1/2) Epoch 15, batch 2550, loss[loss=0.256, ctc_loss=0.1771, cr_loss=0.3942, over 20524.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3884, over 4109414.83 frames. ], batch size: 75, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:38:59,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=260782.83333333334, ans=0.125 2024-09-15 08:39:17,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=260811.16666666666, ans=0.125 2024-09-15 08:39:44,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260867.83333333334, ans=0.1 2024-09-15 08:39:45,562 INFO [train.py:1198] (1/2) Epoch 15, batch 2600, loss[loss=0.2626, ctc_loss=0.1849, cr_loss=0.3885, over 20952.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1697, cr_loss=0.3868, over 4106906.67 frames. ], batch size: 60, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:39:47,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=260867.83333333334, ans=0.125 2024-09-15 08:39:48,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=260867.83333333334, ans=0.0 2024-09-15 08:40:12,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=260896.16666666666, ans=0.125 2024-09-15 08:40:15,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=260896.16666666666, ans=0.125 2024-09-15 08:40:16,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.658e+02 2.096e+02 2.224e+02 2.407e+02 3.923e+02, threshold=4.448e+02, percent-clipped=0.0 2024-09-15 08:40:24,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=260924.5, ans=0.0 2024-09-15 08:40:30,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=260924.5, ans=0.125 2024-09-15 08:40:40,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=260952.83333333334, ans=0.04949747468305833 2024-09-15 08:40:49,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=260981.16666666666, ans=0.125 2024-09-15 08:41:02,502 INFO [train.py:1198] (1/2) Epoch 15, batch 2650, loss[loss=0.28, ctc_loss=0.194, cr_loss=0.4299, over 20928.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.171, cr_loss=0.3886, over 4102439.99 frames. ], batch size: 60, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:41:02,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261009.5, ans=0.1 2024-09-15 08:41:03,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.00 vs. limit=10.0 2024-09-15 08:41:20,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=261037.83333333334, ans=0.0 2024-09-15 08:41:28,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=261037.83333333334, ans=0.125 2024-09-15 08:42:17,942 INFO [train.py:1198] (1/2) Epoch 15, batch 2700, loss[loss=0.2532, ctc_loss=0.1712, cr_loss=0.4101, over 20776.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1695, cr_loss=0.3861, over 4107755.07 frames. ], batch size: 56, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:42:41,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.18 vs. limit=10.0 2024-09-15 08:42:46,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=261179.5, ans=0.125 2024-09-15 08:42:49,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.120e+02 2.293e+02 2.561e+02 3.584e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-15 08:42:49,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=261207.83333333334, ans=0.025 2024-09-15 08:43:25,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=261264.5, ans=0.125 2024-09-15 08:43:36,949 INFO [train.py:1198] (1/2) Epoch 15, batch 2750, loss[loss=0.2464, ctc_loss=0.17, cr_loss=0.3822, over 20759.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1696, cr_loss=0.3866, over 4111730.29 frames. ], batch size: 56, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:43:41,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=261292.83333333334, ans=0.2 2024-09-15 08:44:16,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=261349.5, ans=0.125 2024-09-15 08:44:19,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261349.5, ans=0.125 2024-09-15 08:44:28,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=261377.83333333334, ans=0.125 2024-09-15 08:44:52,131 INFO [train.py:1198] (1/2) Epoch 15, batch 2800, loss[loss=0.2129, ctc_loss=0.1448, cr_loss=0.3405, over 20962.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1704, cr_loss=0.3874, over 4107577.21 frames. ], batch size: 49, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:44:56,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=261434.5, ans=0.0 2024-09-15 08:44:58,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-15 08:44:58,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-15 08:45:20,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.044e+02 2.217e+02 2.446e+02 3.231e+02, threshold=4.433e+02, percent-clipped=0.0 2024-09-15 08:45:40,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=261519.5, ans=0.125 2024-09-15 08:45:43,994 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:46:10,647 INFO [train.py:1198] (1/2) Epoch 15, batch 2850, loss[loss=0.277, ctc_loss=0.1934, cr_loss=0.4182, over 19106.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1696, cr_loss=0.3861, over 4111709.65 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:46:12,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261576.16666666666, ans=0.1 2024-09-15 08:46:25,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=261604.5, ans=0.125 2024-09-15 08:46:31,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=261604.5, ans=0.125 2024-09-15 08:46:38,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-15 08:47:16,681 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:47:25,503 INFO [train.py:1198] (1/2) Epoch 15, batch 2900, loss[loss=0.2695, ctc_loss=0.1892, cr_loss=0.4018, over 20323.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1698, cr_loss=0.3869, over 4109907.35 frames. ], batch size: 74, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:47:47,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=261746.16666666666, ans=0.2 2024-09-15 08:47:53,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.036e+02 2.221e+02 2.380e+02 4.191e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 08:47:54,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=261774.5, ans=0.125 2024-09-15 08:47:57,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-09-15 08:48:02,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=261774.5, ans=0.0 2024-09-15 08:48:32,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=261831.16666666666, ans=0.125 2024-09-15 08:48:42,824 INFO [train.py:1198] (1/2) Epoch 15, batch 2950, loss[loss=0.2758, ctc_loss=0.1901, cr_loss=0.4287, over 20954.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1696, cr_loss=0.3864, over 4099031.07 frames. ], batch size: 60, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:48:47,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=261859.5, ans=0.0 2024-09-15 08:48:58,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-09-15 08:49:09,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2024-09-15 08:49:58,454 INFO [train.py:1198] (1/2) Epoch 15, batch 3000, loss[loss=0.2486, ctc_loss=0.1727, cr_loss=0.3795, over 21041.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1698, cr_loss=0.3866, over 4091715.31 frames. ], batch size: 61, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:49:58,455 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 08:50:07,264 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2386, 3.8933, 3.9538, 3.9899], device='cuda:1') 2024-09-15 08:50:08,047 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4468, 5.0483, 4.7551, 4.5972], device='cuda:1') 2024-09-15 08:50:17,950 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.04693, ctc_loss=0.04693, cr_loss=9.851e-15, over 944034.00 frames. 2024-09-15 08:50:17,951 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 08:50:24,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=262001.16666666666, ans=0.125 2024-09-15 08:50:34,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=22.5 2024-09-15 08:50:46,782 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.044e+02 2.249e+02 2.455e+02 4.170e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-15 08:50:51,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262057.83333333334, ans=0.1 2024-09-15 08:51:35,672 INFO [train.py:1198] (1/2) Epoch 15, batch 3050, loss[loss=0.2658, ctc_loss=0.1786, cr_loss=0.4361, over 21028.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1696, cr_loss=0.3868, over 4098630.79 frames. ], batch size: 62, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:51:37,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=262142.83333333334, ans=0.125 2024-09-15 08:51:56,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-09-15 08:52:19,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=262227.8333333333, ans=0.125 2024-09-15 08:52:31,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262227.8333333333, ans=0.1 2024-09-15 08:52:49,684 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:52:50,879 INFO [train.py:1198] (1/2) Epoch 15, batch 3100, loss[loss=0.2515, ctc_loss=0.1702, cr_loss=0.4069, over 20992.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1709, cr_loss=0.388, over 4087139.28 frames. ], batch size: 55, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:53:01,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=262284.5, ans=0.0 2024-09-15 08:53:19,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.058e+02 2.194e+02 2.429e+02 3.199e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 08:53:36,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262369.5, ans=0.1 2024-09-15 08:54:08,279 INFO [train.py:1198] (1/2) Epoch 15, batch 3150, loss[loss=0.2417, ctc_loss=0.1644, cr_loss=0.3863, over 20900.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1717, cr_loss=0.3887, over 4088348.44 frames. ], batch size: 54, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:54:42,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.52 vs. limit=22.5 2024-09-15 08:54:47,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262482.8333333333, ans=0.1 2024-09-15 08:54:56,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=262511.1666666667, ans=0.125 2024-09-15 08:55:18,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=262539.5, ans=0.125 2024-09-15 08:55:23,009 INFO [train.py:1198] (1/2) Epoch 15, batch 3200, loss[loss=0.2405, ctc_loss=0.1646, cr_loss=0.3797, over 20994.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1719, cr_loss=0.3892, over 4088419.16 frames. ], batch size: 55, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:55:37,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=22.5 2024-09-15 08:55:51,669 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 1.993e+02 2.115e+02 2.319e+02 3.282e+02, threshold=4.229e+02, percent-clipped=0.0 2024-09-15 08:55:57,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-15 08:56:05,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=262624.5, ans=0.0 2024-09-15 08:56:37,943 INFO [train.py:1198] (1/2) Epoch 15, batch 3250, loss[loss=0.2313, ctc_loss=0.1572, cr_loss=0.3705, over 20871.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.172, cr_loss=0.3894, over 4089701.19 frames. ], batch size: 57, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:56:50,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=262709.5, ans=0.025 2024-09-15 08:57:14,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=262766.1666666667, ans=0.0 2024-09-15 08:57:55,816 INFO [train.py:1198] (1/2) Epoch 15, batch 3300, loss[loss=0.2737, ctc_loss=0.1857, cr_loss=0.44, over 20614.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1713, cr_loss=0.3888, over 4089209.40 frames. ], batch size: 75, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:58:20,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=262879.5, ans=0.125 2024-09-15 08:58:24,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.013e+02 2.168e+02 2.340e+02 4.023e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 08:58:41,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=262936.1666666667, ans=0.025 2024-09-15 08:59:08,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=262964.5, ans=0.125 2024-09-15 08:59:10,872 INFO [train.py:1198] (1/2) Epoch 15, batch 3350, loss[loss=0.2321, ctc_loss=0.1559, cr_loss=0.381, over 20889.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1708, cr_loss=0.3889, over 4103332.83 frames. ], batch size: 57, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:59:23,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-15 08:59:29,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-09-15 09:00:18,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=263106.1666666667, ans=0.2 2024-09-15 09:00:28,413 INFO [train.py:1198] (1/2) Epoch 15, batch 3400, loss[loss=0.2362, ctc_loss=0.1613, cr_loss=0.3746, over 20756.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1712, cr_loss=0.3888, over 4094911.55 frames. ], batch size: 53, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 09:00:45,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=263162.8333333333, ans=0.0 2024-09-15 09:00:56,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.056e+02 2.235e+02 2.445e+02 8.480e+02, threshold=4.471e+02, percent-clipped=1.0 2024-09-15 09:01:07,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=263191.1666666667, ans=0.0 2024-09-15 09:01:28,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=263247.8333333333, ans=0.0 2024-09-15 09:01:43,512 INFO [train.py:1198] (1/2) Epoch 15, batch 3450, loss[loss=0.2641, ctc_loss=0.1799, cr_loss=0.421, over 20992.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1707, cr_loss=0.3886, over 4097080.62 frames. ], batch size: 61, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 09:01:52,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=263276.1666666667, ans=0.0 2024-09-15 09:01:58,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-15 09:02:05,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=263304.5, ans=0.2 2024-09-15 09:03:00,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=263417.8333333333, ans=0.5 2024-09-15 09:03:01,558 INFO [train.py:1198] (1/2) Epoch 15, batch 3500, loss[loss=0.2911, ctc_loss=0.2095, cr_loss=0.4079, over 14708.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.171, cr_loss=0.3891, over 4088472.90 frames. ], batch size: 149, lr: 5.59e-03, grad_scale: 64.0 2024-09-15 09:03:01,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=263417.8333333333, ans=0.025 2024-09-15 09:03:12,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=263417.8333333333, ans=0.125 2024-09-15 09:03:15,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=263446.1666666667, ans=0.125 2024-09-15 09:03:31,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.017e+02 2.226e+02 2.415e+02 4.174e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 09:03:35,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-09-15 09:03:37,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=263474.5, ans=0.0 2024-09-15 09:03:43,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=263474.5, ans=0.05 2024-09-15 09:03:51,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263502.8333333333, ans=0.125 2024-09-15 09:03:54,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=263502.8333333333, ans=0.0 2024-09-15 09:04:13,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=263531.1666666667, ans=0.125 2024-09-15 09:04:16,499 INFO [train.py:1198] (1/2) Epoch 15, batch 3550, loss[loss=0.2642, ctc_loss=0.1857, cr_loss=0.393, over 20782.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1714, cr_loss=0.3899, over 4088789.68 frames. ], batch size: 56, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:04:19,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-15 09:04:33,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=263587.8333333333, ans=0.1 2024-09-15 09:04:38,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=263587.8333333333, ans=0.0 2024-09-15 09:05:12,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=263644.5, ans=0.125 2024-09-15 09:05:34,763 INFO [train.py:1198] (1/2) Epoch 15, batch 3600, loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.384, over 21002.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.172, cr_loss=0.391, over 4087658.05 frames. ], batch size: 61, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:05:37,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-15 09:05:43,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2024-09-15 09:06:01,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=263729.5, ans=0.0 2024-09-15 09:06:05,311 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.071e+02 2.205e+02 2.391e+02 4.670e+02, threshold=4.411e+02, percent-clipped=1.0 2024-09-15 09:06:05,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263757.8333333333, ans=0.125 2024-09-15 09:06:50,291 INFO [train.py:1198] (1/2) Epoch 15, batch 3650, loss[loss=0.256, ctc_loss=0.1771, cr_loss=0.3944, over 20889.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1722, cr_loss=0.392, over 4102608.41 frames. ], batch size: 57, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:06:55,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=263842.8333333333, ans=0.07 2024-09-15 09:07:51,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263956.1666666667, ans=0.1 2024-09-15 09:07:51,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=263956.1666666667, ans=0.2 2024-09-15 09:07:51,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=263956.1666666667, ans=0.125 2024-09-15 09:08:05,009 INFO [train.py:1198] (1/2) Epoch 15, batch 3700, loss[loss=0.2864, ctc_loss=0.1979, cr_loss=0.4422, over 20031.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1719, cr_loss=0.3912, over 4114802.36 frames. ], batch size: 80, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:08:05,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-15 09:08:21,715 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:08:23,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=264012.8333333333, ans=0.2 2024-09-15 09:08:37,738 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.012e+02 2.130e+02 2.276e+02 5.003e+02, threshold=4.259e+02, percent-clipped=1.0 2024-09-15 09:08:44,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=264041.1666666667, ans=0.125 2024-09-15 09:09:05,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=264069.5, ans=0.07 2024-09-15 09:09:22,688 INFO [train.py:1198] (1/2) Epoch 15, batch 3750, loss[loss=0.2017, ctc_loss=0.1337, cr_loss=0.3403, over 20992.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1717, cr_loss=0.3906, over 4113997.17 frames. ], batch size: 52, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:09:25,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=264126.1666666667, ans=0.125 2024-09-15 09:09:54,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264182.8333333333, ans=0.125 2024-09-15 09:10:00,738 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:10:12,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=264211.1666666667, ans=0.07 2024-09-15 09:10:18,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=264211.1666666667, ans=0.125 2024-09-15 09:10:37,783 INFO [train.py:1198] (1/2) Epoch 15, batch 3800, loss[loss=0.235, ctc_loss=0.1611, cr_loss=0.3692, over 20963.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1717, cr_loss=0.3889, over 4093535.04 frames. ], batch size: 55, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:10:57,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=264296.1666666667, ans=0.125 2024-09-15 09:11:02,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=264296.1666666667, ans=0.125 2024-09-15 09:11:10,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.055e+02 2.289e+02 2.466e+02 4.211e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-15 09:11:18,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=264324.5, ans=0.0 2024-09-15 09:11:44,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-15 09:11:52,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264381.1666666667, ans=0.1 2024-09-15 09:11:53,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-15 09:11:55,256 INFO [train.py:1198] (1/2) Epoch 15, batch 3850, loss[loss=0.2909, ctc_loss=0.2135, cr_loss=0.3869, over 14384.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1722, cr_loss=0.39, over 4090011.46 frames. ], batch size: 151, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:11:57,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-09-15 09:12:05,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2024-09-15 09:12:29,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=264466.1666666667, ans=0.025 2024-09-15 09:12:30,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264466.1666666667, ans=0.1 2024-09-15 09:12:32,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-09-15 09:12:41,256 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:13:10,784 INFO [train.py:1198] (1/2) Epoch 15, batch 3900, loss[loss=0.2429, ctc_loss=0.1637, cr_loss=0.3964, over 20671.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1716, cr_loss=0.3899, over 4084833.92 frames. ], batch size: 68, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:13:11,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264551.1666666667, ans=0.1 2024-09-15 09:13:29,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=264579.5, ans=0.025 2024-09-15 09:13:33,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264579.5, ans=0.1 2024-09-15 09:13:41,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.094e+02 2.280e+02 2.508e+02 3.421e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-15 09:13:52,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264607.8333333333, ans=0.1 2024-09-15 09:14:08,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=264636.1666666667, ans=0.2 2024-09-15 09:14:29,225 INFO [train.py:1198] (1/2) Epoch 15, batch 3950, loss[loss=0.2756, ctc_loss=0.1899, cr_loss=0.4282, over 21043.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1715, cr_loss=0.3895, over 4088793.02 frames. ], batch size: 62, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:15:03,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-15 09:15:07,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=264749.5, ans=0.125 2024-09-15 09:15:08,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264749.5, ans=0.1 2024-09-15 09:15:19,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264777.8333333333, ans=0.1 2024-09-15 09:15:22,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=264777.8333333333, ans=0.0 2024-09-15 09:15:34,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=264806.1666666667, ans=0.2 2024-09-15 09:15:40,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.61 vs. limit=10.0 2024-09-15 09:15:44,087 INFO [train.py:1198] (1/2) Epoch 15, batch 4000, loss[loss=0.2784, ctc_loss=0.1956, cr_loss=0.4139, over 21034.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1719, cr_loss=0.3898, over 4094106.99 frames. ], batch size: 62, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:16:00,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-15 09:16:14,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.071e+02 2.223e+02 2.375e+02 3.814e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 09:16:14,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264891.1666666667, ans=0.1 2024-09-15 09:16:17,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264891.1666666667, ans=0.1 2024-09-15 09:16:28,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-15 09:17:02,172 INFO [train.py:1198] (1/2) Epoch 15, batch 4050, loss[loss=0.2587, ctc_loss=0.1751, cr_loss=0.4181, over 20830.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1719, cr_loss=0.3906, over 4096983.87 frames. ], batch size: 65, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:17:06,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=264976.1666666667, ans=0.07 2024-09-15 09:17:11,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=264976.1666666667, ans=0.125 2024-09-15 09:17:19,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=265004.5, ans=0.125 2024-09-15 09:17:59,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=265061.1666666667, ans=0.025 2024-09-15 09:18:17,683 INFO [train.py:1198] (1/2) Epoch 15, batch 4100, loss[loss=0.2474, ctc_loss=0.1678, cr_loss=0.3977, over 20979.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.171, cr_loss=0.3887, over 4094256.18 frames. ], batch size: 55, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:18:47,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.093e+02 2.224e+02 2.504e+02 3.147e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-15 09:19:03,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=265202.8333333333, ans=0.125 2024-09-15 09:19:21,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=265231.1666666667, ans=0.0 2024-09-15 09:19:36,580 INFO [train.py:1198] (1/2) Epoch 15, batch 4150, loss[loss=0.2473, ctc_loss=0.1683, cr_loss=0.395, over 21074.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1708, cr_loss=0.3892, over 4101749.31 frames. ], batch size: 59, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:19:50,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=265287.8333333333, ans=0.125 2024-09-15 09:20:02,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=265287.8333333333, ans=0.125 2024-09-15 09:20:23,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-15 09:20:51,800 INFO [train.py:1198] (1/2) Epoch 15, batch 4200, loss[loss=0.2567, ctc_loss=0.1825, cr_loss=0.3709, over 14566.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1711, cr_loss=0.3891, over 4088209.47 frames. ], batch size: 150, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:20:55,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=265401.1666666667, ans=0.125 2024-09-15 09:21:13,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2024-09-15 09:21:19,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=265429.5, ans=0.125 2024-09-15 09:21:21,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.038e+02 2.154e+02 2.360e+02 3.093e+02, threshold=4.308e+02, percent-clipped=0.0 2024-09-15 09:22:04,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 09:22:10,048 INFO [train.py:1198] (1/2) Epoch 15, batch 4250, loss[loss=0.2713, ctc_loss=0.1869, cr_loss=0.4219, over 20952.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1708, cr_loss=0.3888, over 4089302.86 frames. ], batch size: 64, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:22:43,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=265599.5, ans=0.125 2024-09-15 09:22:49,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265599.5, ans=0.125 2024-09-15 09:23:26,075 INFO [train.py:1198] (1/2) Epoch 15, batch 4300, loss[loss=0.2505, ctc_loss=0.1736, cr_loss=0.3842, over 21012.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1704, cr_loss=0.3885, over 4091118.65 frames. ], batch size: 63, lr: 5.56e-03, grad_scale: 16.0 2024-09-15 09:23:57,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.040e+02 2.249e+02 2.478e+02 3.601e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-15 09:23:59,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=265741.1666666667, ans=0.125 2024-09-15 09:24:00,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-09-15 09:24:02,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=265741.1666666667, ans=0.025 2024-09-15 09:24:03,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-15 09:24:16,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265769.5, ans=0.125 2024-09-15 09:24:22,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=265769.5, ans=0.0 2024-09-15 09:24:33,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=265797.8333333333, ans=6.0 2024-09-15 09:24:41,494 INFO [train.py:1198] (1/2) Epoch 15, batch 4350, loss[loss=0.2091, ctc_loss=0.1389, cr_loss=0.351, over 19857.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1696, cr_loss=0.3864, over 4092118.21 frames. ], batch size: 44, lr: 5.56e-03, grad_scale: 16.0 2024-09-15 09:25:04,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-15 09:25:05,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265854.5, ans=0.1 2024-09-15 09:25:51,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=265939.5, ans=0.95 2024-09-15 09:25:55,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-15 09:25:59,125 INFO [train.py:1198] (1/2) Epoch 15, batch 4400, loss[loss=0.279, ctc_loss=0.1955, cr_loss=0.4176, over 19286.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1696, cr_loss=0.3872, over 4102551.38 frames. ], batch size: 90, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:26:02,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=265967.8333333333, ans=0.0 2024-09-15 09:26:03,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-15 09:26:30,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.051e+02 2.145e+02 2.300e+02 3.280e+02, threshold=4.289e+02, percent-clipped=0.0 2024-09-15 09:26:43,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=266052.8333333333, ans=0.125 2024-09-15 09:27:14,520 INFO [train.py:1198] (1/2) Epoch 15, batch 4450, loss[loss=0.2052, ctc_loss=0.1353, cr_loss=0.3496, over 20944.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1699, cr_loss=0.3881, over 4109433.79 frames. ], batch size: 48, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:28:32,389 INFO [train.py:1198] (1/2) Epoch 15, batch 4500, loss[loss=0.2698, ctc_loss=0.1876, cr_loss=0.4113, over 20985.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1709, cr_loss=0.3895, over 4102294.54 frames. ], batch size: 63, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:28:56,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=266279.5, ans=0.125 2024-09-15 09:29:03,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.071e+02 2.234e+02 2.438e+02 3.164e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-15 09:29:23,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=266336.1666666667, ans=0.0 2024-09-15 09:29:43,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=266364.5, ans=0.125 2024-09-15 09:29:47,895 INFO [train.py:1198] (1/2) Epoch 15, batch 4550, loss[loss=0.2325, ctc_loss=0.1589, cr_loss=0.368, over 20993.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3886, over 4095982.03 frames. ], batch size: 48, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:30:09,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=266421.1666666667, ans=0.125 2024-09-15 09:30:44,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-15 09:31:05,927 INFO [train.py:1198] (1/2) Epoch 15, batch 4600, loss[loss=0.2279, ctc_loss=0.1532, cr_loss=0.3734, over 20794.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1704, cr_loss=0.3884, over 4092001.96 frames. ], batch size: 53, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:31:09,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-09-15 09:31:15,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=266534.5, ans=0.0 2024-09-15 09:31:18,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=266534.5, ans=0.125 2024-09-15 09:31:31,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=266562.8333333333, ans=0.2 2024-09-15 09:31:31,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266562.8333333333, ans=0.0 2024-09-15 09:31:37,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.069e+02 2.250e+02 2.460e+02 6.260e+02, threshold=4.500e+02, percent-clipped=2.0 2024-09-15 09:31:40,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=266591.1666666667, ans=0.0 2024-09-15 09:31:41,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 09:32:08,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=22.5 2024-09-15 09:32:21,219 INFO [train.py:1198] (1/2) Epoch 15, batch 4650, loss[loss=0.2377, ctc_loss=0.1586, cr_loss=0.3953, over 20786.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1699, cr_loss=0.3876, over 4091807.51 frames. ], batch size: 53, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:32:21,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=266676.1666666667, ans=0.2 2024-09-15 09:32:23,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=266676.1666666667, ans=0.025 2024-09-15 09:32:36,728 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:32:40,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=266704.5, ans=0.0 2024-09-15 09:33:39,287 INFO [train.py:1198] (1/2) Epoch 15, batch 4700, loss[loss=0.2193, ctc_loss=0.1486, cr_loss=0.3535, over 20818.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1712, cr_loss=0.3895, over 4080835.00 frames. ], batch size: 53, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:33:45,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=266817.8333333333, ans=0.07 2024-09-15 09:33:49,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=266817.8333333333, ans=0.125 2024-09-15 09:34:10,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.045e+02 2.175e+02 2.360e+02 3.270e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 09:34:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=266902.8333333333, ans=0.0 2024-09-15 09:34:54,480 INFO [train.py:1198] (1/2) Epoch 15, batch 4750, loss[loss=0.195, ctc_loss=0.132, cr_loss=0.3151, over 20986.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1699, cr_loss=0.3879, over 4101577.13 frames. ], batch size: 49, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:34:54,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=266959.5, ans=0.125 2024-09-15 09:35:13,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.33 vs. limit=10.0 2024-09-15 09:35:55,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=267072.8333333333, ans=0.04949747468305833 2024-09-15 09:36:09,993 INFO [train.py:1198] (1/2) Epoch 15, batch 4800, loss[loss=0.222, ctc_loss=0.1514, cr_loss=0.3531, over 20786.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.169, cr_loss=0.3864, over 4102834.63 frames. ], batch size: 53, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:36:45,031 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.038e+02 2.153e+02 2.343e+02 3.590e+02, threshold=4.306e+02, percent-clipped=0.0 2024-09-15 09:36:45,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=267157.8333333333, ans=0.0 2024-09-15 09:36:45,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=267157.8333333333, ans=0.125 2024-09-15 09:37:01,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=267186.1666666667, ans=0.0 2024-09-15 09:37:28,273 INFO [train.py:1198] (1/2) Epoch 15, batch 4850, loss[loss=0.223, ctc_loss=0.1526, cr_loss=0.3517, over 19831.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1693, cr_loss=0.3871, over 4091791.54 frames. ], batch size: 44, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:37:30,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=267242.8333333333, ans=0.2 2024-09-15 09:38:03,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=267299.5, ans=0.025 2024-09-15 09:38:20,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-15 09:38:44,250 INFO [train.py:1198] (1/2) Epoch 15, batch 4900, loss[loss=0.2526, ctc_loss=0.1772, cr_loss=0.3773, over 20831.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1698, cr_loss=0.3878, over 4096416.30 frames. ], batch size: 59, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:38:56,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267384.5, ans=0.125 2024-09-15 09:39:09,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267412.8333333333, ans=0.1 2024-09-15 09:39:15,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.047e+02 2.157e+02 2.313e+02 3.010e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 09:39:24,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267441.1666666667, ans=0.1 2024-09-15 09:39:25,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2024-09-15 09:39:30,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=8.0 2024-09-15 09:40:01,224 INFO [train.py:1198] (1/2) Epoch 15, batch 4950, loss[loss=0.253, ctc_loss=0.1738, cr_loss=0.3958, over 20882.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1713, cr_loss=0.3897, over 4088059.03 frames. ], batch size: 57, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:40:22,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=267554.5, ans=0.125 2024-09-15 09:40:37,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=267582.8333333333, ans=0.04949747468305833 2024-09-15 09:40:53,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=267611.1666666667, ans=0.025 2024-09-15 09:40:55,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=267611.1666666667, ans=0.125 2024-09-15 09:41:10,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=267639.5, ans=0.0 2024-09-15 09:41:15,867 INFO [train.py:1198] (1/2) Epoch 15, batch 5000, loss[loss=0.2011, ctc_loss=0.1374, cr_loss=0.3188, over 20380.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1701, cr_loss=0.388, over 4086205.91 frames. ], batch size: 45, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:41:22,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=267667.8333333333, ans=0.0 2024-09-15 09:41:46,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.086e+02 2.200e+02 2.479e+02 6.714e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 09:41:53,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267724.5, ans=0.125 2024-09-15 09:42:30,021 INFO [train.py:1198] (1/2) Epoch 15, batch 5050, loss[loss=0.2492, ctc_loss=0.1701, cr_loss=0.3954, over 20948.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1706, cr_loss=0.389, over 4093105.77 frames. ], batch size: 49, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:42:36,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=267809.5, ans=0.07 2024-09-15 09:42:42,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=267809.5, ans=0.025 2024-09-15 09:43:35,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-09-15 09:43:43,721 INFO [train.py:1198] (1/2) Epoch 15, batch 5100, loss[loss=0.2512, ctc_loss=0.1725, cr_loss=0.3934, over 20987.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1703, cr_loss=0.3875, over 4086809.91 frames. ], batch size: 55, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:44:05,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-15 09:44:15,089 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.040e+02 2.154e+02 2.335e+02 2.871e+02, threshold=4.309e+02, percent-clipped=0.0 2024-09-15 09:44:33,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=268036.1666666667, ans=0.125 2024-09-15 09:44:35,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=268036.1666666667, ans=0.025 2024-09-15 09:44:57,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268092.8333333333, ans=0.125 2024-09-15 09:44:58,382 INFO [train.py:1198] (1/2) Epoch 15, batch 5150, loss[loss=0.2633, ctc_loss=0.1827, cr_loss=0.4033, over 20966.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1711, cr_loss=0.3883, over 4073061.90 frames. ], batch size: 64, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:45:01,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268092.8333333333, ans=0.1 2024-09-15 09:45:12,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268121.1666666667, ans=0.125 2024-09-15 09:45:13,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=268121.1666666667, ans=0.025 2024-09-15 09:46:15,344 INFO [train.py:1198] (1/2) Epoch 15, batch 5200, loss[loss=0.2858, ctc_loss=0.197, cr_loss=0.4441, over 20974.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1699, cr_loss=0.3867, over 4077721.02 frames. ], batch size: 64, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:46:21,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=268234.5, ans=0.125 2024-09-15 09:46:47,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.031e+02 2.169e+02 2.299e+02 3.874e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 09:47:03,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=268319.5, ans=0.0 2024-09-15 09:47:11,765 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:47:14,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=268347.8333333333, ans=0.125 2024-09-15 09:47:14,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268347.8333333333, ans=0.125 2024-09-15 09:47:28,898 INFO [train.py:1198] (1/2) Epoch 15, batch 5250, loss[loss=0.2914, ctc_loss=0.2065, cr_loss=0.4243, over 18310.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1716, cr_loss=0.3879, over 4060628.68 frames. ], batch size: 108, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:48:10,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=268432.8333333333, ans=0.0 2024-09-15 09:48:21,433 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.660e-02 2024-09-15 09:48:28,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268489.5, ans=0.125 2024-09-15 09:48:43,304 INFO [train.py:1198] (1/2) Epoch 15, batch 5300, loss[loss=0.2289, ctc_loss=0.156, cr_loss=0.3645, over 20789.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.172, cr_loss=0.3889, over 4063789.26 frames. ], batch size: 53, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:49:18,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.036e+02 2.143e+02 2.347e+02 3.939e+02, threshold=4.286e+02, percent-clipped=0.0 2024-09-15 09:49:23,350 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:49:44,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-15 09:50:00,375 INFO [train.py:1198] (1/2) Epoch 15, batch 5350, loss[loss=0.2171, ctc_loss=0.147, cr_loss=0.3503, over 20232.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1705, cr_loss=0.3868, over 4081783.29 frames. ], batch size: 45, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:50:28,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=268716.1666666667, ans=0.0 2024-09-15 09:51:14,117 INFO [train.py:1198] (1/2) Epoch 15, batch 5400, loss[loss=0.2551, ctc_loss=0.1767, cr_loss=0.392, over 21019.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.17, cr_loss=0.3863, over 4087349.08 frames. ], batch size: 61, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:51:38,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=268829.5, ans=0.125 2024-09-15 09:51:46,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.037e+02 2.142e+02 2.320e+02 9.416e+02, threshold=4.283e+02, percent-clipped=1.0 2024-09-15 09:52:28,267 INFO [train.py:1198] (1/2) Epoch 15, batch 5450, loss[loss=0.2804, ctc_loss=0.1948, cr_loss=0.4277, over 19865.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1701, cr_loss=0.3867, over 4078882.17 frames. ], batch size: 80, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:53:35,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=269056.1666666667, ans=0.0 2024-09-15 09:53:42,733 INFO [train.py:1198] (1/2) Epoch 15, batch 5500, loss[loss=0.2644, ctc_loss=0.1834, cr_loss=0.4048, over 20973.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1702, cr_loss=0.3864, over 4078497.79 frames. ], batch size: 58, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:53:44,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=269084.5, ans=0.125 2024-09-15 09:54:15,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.056e+02 2.198e+02 2.340e+02 4.709e+02, threshold=4.396e+02, percent-clipped=1.0 2024-09-15 09:54:16,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=269141.1666666667, ans=0.0 2024-09-15 09:54:37,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=22.5 2024-09-15 09:54:44,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=269197.8333333333, ans=0.125 2024-09-15 09:54:44,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=269197.8333333333, ans=0.2 2024-09-15 09:54:59,156 INFO [train.py:1198] (1/2) Epoch 15, batch 5550, loss[loss=0.2206, ctc_loss=0.1504, cr_loss=0.3508, over 20328.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1701, cr_loss=0.3867, over 4085493.29 frames. ], batch size: 45, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:55:10,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-09-15 09:55:12,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269254.5, ans=0.1 2024-09-15 09:55:20,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=269254.5, ans=0.125 2024-09-15 09:55:23,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=269254.5, ans=0.0 2024-09-15 09:55:27,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=269282.8333333333, ans=0.2 2024-09-15 09:55:31,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-15 09:55:56,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=269311.1666666667, ans=0.125 2024-09-15 09:55:56,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269311.1666666667, ans=0.125 2024-09-15 09:56:13,563 INFO [train.py:1198] (1/2) Epoch 15, batch 5600, loss[loss=0.2263, ctc_loss=0.1546, cr_loss=0.3587, over 20970.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1705, cr_loss=0.387, over 4093788.11 frames. ], batch size: 51, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:56:38,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-15 09:56:46,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.116e+02 2.356e+02 2.676e+02 4.096e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-15 09:56:58,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=269452.8333333333, ans=0.0 2024-09-15 09:57:08,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=269452.8333333333, ans=0.125 2024-09-15 09:57:22,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=269481.1666666667, ans=0.125 2024-09-15 09:57:23,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=269481.1666666667, ans=0.125 2024-09-15 09:57:27,851 INFO [train.py:1198] (1/2) Epoch 15, batch 5650, loss[loss=0.2773, ctc_loss=0.1898, cr_loss=0.4372, over 20955.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1703, cr_loss=0.3878, over 4103504.74 frames. ], batch size: 60, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:58:43,631 INFO [train.py:1198] (1/2) Epoch 15, batch 5700, loss[loss=0.255, ctc_loss=0.1738, cr_loss=0.4063, over 20903.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1705, cr_loss=0.3885, over 4106566.64 frames. ], batch size: 54, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:59:14,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269707.8333333333, ans=0.1 2024-09-15 09:59:15,800 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.146e+02 2.356e+02 2.674e+02 3.246e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-15 09:59:57,460 INFO [train.py:1198] (1/2) Epoch 15, batch 5750, loss[loss=0.2287, ctc_loss=0.1544, cr_loss=0.3715, over 20875.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1706, cr_loss=0.3882, over 4099388.84 frames. ], batch size: 57, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:00:27,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=269849.5, ans=0.0 2024-09-15 10:00:33,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=269849.5, ans=0.2 2024-09-15 10:00:40,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=269877.8333333333, ans=0.0 2024-09-15 10:00:51,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-15 10:01:10,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=22.5 2024-09-15 10:01:10,910 INFO [train.py:1198] (1/2) Epoch 15, batch 5800, loss[loss=0.2029, ctc_loss=0.1348, cr_loss=0.3406, over 20983.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1708, cr_loss=0.3889, over 4098099.28 frames. ], batch size: 51, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:01:12,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=269934.5, ans=0.0 2024-09-15 10:01:43,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.003e+02 2.144e+02 2.325e+02 2.916e+02, threshold=4.289e+02, percent-clipped=0.0 2024-09-15 10:02:25,584 INFO [train.py:1198] (1/2) Epoch 15, batch 5850, loss[loss=0.2299, ctc_loss=0.1557, cr_loss=0.3714, over 21059.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1704, cr_loss=0.3888, over 4104485.29 frames. ], batch size: 53, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:02:25,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=270076.1666666667, ans=0.125 2024-09-15 10:02:27,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=270076.1666666667, ans=0.0 2024-09-15 10:02:33,306 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:02:43,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=270104.5, ans=0.125 2024-09-15 10:03:03,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270132.8333333333, ans=0.1 2024-09-15 10:03:21,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2024-09-15 10:03:42,170 INFO [train.py:1198] (1/2) Epoch 15, batch 5900, loss[loss=0.2234, ctc_loss=0.1485, cr_loss=0.3746, over 19050.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1706, cr_loss=0.3887, over 4105851.95 frames. ], batch size: 42, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:04:02,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=270246.1666666667, ans=0.125 2024-09-15 10:04:15,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.105e+02 2.306e+02 2.612e+02 5.347e+02, threshold=4.612e+02, percent-clipped=1.0 2024-09-15 10:04:16,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=270274.5, ans=0.0 2024-09-15 10:04:21,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270274.5, ans=0.1 2024-09-15 10:04:27,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=270302.8333333333, ans=0.125 2024-09-15 10:04:56,750 INFO [train.py:1198] (1/2) Epoch 15, batch 5950, loss[loss=0.2368, ctc_loss=0.1614, cr_loss=0.3773, over 20870.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1704, cr_loss=0.3883, over 4098876.96 frames. ], batch size: 57, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:05:38,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=270416.1666666667, ans=0.125 2024-09-15 10:05:40,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=270444.5, ans=0.125 2024-09-15 10:05:41,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270444.5, ans=0.1 2024-09-15 10:05:44,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=270444.5, ans=0.2 2024-09-15 10:05:49,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-15 10:05:51,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-09-15 10:05:59,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=270472.8333333333, ans=0.0 2024-09-15 10:06:10,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=270501.1666666667, ans=0.125 2024-09-15 10:06:11,393 INFO [train.py:1198] (1/2) Epoch 15, batch 6000, loss[loss=0.253, ctc_loss=0.1728, cr_loss=0.4011, over 21019.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1715, cr_loss=0.3897, over 4093002.84 frames. ], batch size: 63, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:06:11,393 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 10:06:31,968 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.04684, ctc_loss=0.04684, cr_loss=1.007e-14, over 944034.00 frames. 2024-09-15 10:06:31,969 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 10:06:38,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270501.1666666667, ans=0.125 2024-09-15 10:06:50,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=270529.5, ans=0.0 2024-09-15 10:06:53,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=270529.5, ans=0.0 2024-09-15 10:07:04,359 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.048e+02 2.187e+02 2.421e+02 3.370e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-15 10:07:18,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=270586.1666666667, ans=0.0 2024-09-15 10:07:37,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=270614.5, ans=0.0 2024-09-15 10:07:45,964 INFO [train.py:1198] (1/2) Epoch 15, batch 6050, loss[loss=0.2382, ctc_loss=0.1602, cr_loss=0.3901, over 20966.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1713, cr_loss=0.3898, over 4086751.93 frames. ], batch size: 49, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:08:34,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=270727.8333333333, ans=0.125 2024-09-15 10:08:40,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=270727.8333333333, ans=0.5 2024-09-15 10:09:00,482 INFO [train.py:1198] (1/2) Epoch 15, batch 6100, loss[loss=0.2727, ctc_loss=0.1888, cr_loss=0.4195, over 20694.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1722, cr_loss=0.39, over 4062997.94 frames. ], batch size: 68, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:09:15,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=270812.8333333333, ans=0.0 2024-09-15 10:09:32,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.165e+02 2.332e+02 2.616e+02 4.520e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-15 10:09:33,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2024-09-15 10:09:45,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=270869.5, ans=0.0 2024-09-15 10:10:10,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-15 10:10:14,439 INFO [train.py:1198] (1/2) Epoch 15, batch 6150, loss[loss=0.2445, ctc_loss=0.1656, cr_loss=0.3945, over 20968.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1713, cr_loss=0.3883, over 4063949.50 frames. ], batch size: 58, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:11:24,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=271039.5, ans=0.125 2024-09-15 10:11:28,278 INFO [train.py:1198] (1/2) Epoch 15, batch 6200, loss[loss=0.2593, ctc_loss=0.1794, cr_loss=0.3999, over 20944.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1726, cr_loss=0.3893, over 4042207.15 frames. ], batch size: 67, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:11:50,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271096.1666666667, ans=0.1 2024-09-15 10:11:51,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271096.1666666667, ans=0.0 2024-09-15 10:12:01,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.098e+02 2.237e+02 2.445e+02 4.631e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-15 10:12:41,276 INFO [train.py:1198] (1/2) Epoch 15, batch 6250, loss[loss=0.2409, ctc_loss=0.167, cr_loss=0.3696, over 20953.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1727, cr_loss=0.3889, over 4032252.53 frames. ], batch size: 50, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:13:00,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=271237.8333333333, ans=0.0 2024-09-15 10:13:18,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=271266.1666666667, ans=0.125 2024-09-15 10:13:27,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=22.5 2024-09-15 10:13:36,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271294.5, ans=0.125 2024-09-15 10:13:38,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271322.8333333333, ans=0.125 2024-09-15 10:13:51,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=271322.8333333333, ans=0.0 2024-09-15 10:13:54,228 INFO [train.py:1198] (1/2) Epoch 15, batch 6300, loss[loss=0.2836, ctc_loss=0.2006, cr_loss=0.4151, over 21035.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1727, cr_loss=0.3876, over 4000944.75 frames. ], batch size: 62, lr: 5.50e-03, grad_scale: 16.0 2024-09-15 10:13:58,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=271351.1666666667, ans=0.125 2024-09-15 10:14:27,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.189e+02 2.351e+02 2.723e+02 3.733e+02, threshold=4.703e+02, percent-clipped=0.0 2024-09-15 10:14:37,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.97 vs. limit=10.0 2024-09-15 10:15:06,339 INFO [train.py:1198] (1/2) Epoch 15, batch 6350, loss[loss=0.2994, ctc_loss=0.2195, cr_loss=0.3996, over 14698.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1787, cr_loss=0.391, over 3804440.28 frames. ], batch size: 150, lr: 5.50e-03, grad_scale: 16.0 2024-09-15 10:15:22,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=271521.1666666667, ans=0.025 2024-09-15 10:15:47,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=271577.8333333333, ans=0.2 2024-09-15 10:15:54,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2024-09-15 10:16:53,444 INFO [train.py:1198] (1/2) Epoch 16, batch 0, loss[loss=0.2777, ctc_loss=0.2012, cr_loss=0.3826, over 14800.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.2012, cr_loss=0.3826, over 14800.00 frames. ], batch size: 151, lr: 5.33e-03, grad_scale: 32.0 2024-09-15 10:16:53,444 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 10:17:10,729 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.6723, 5.3743, 5.3106, 5.5857], device='cuda:1') 2024-09-15 10:17:11,511 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.04762, ctc_loss=0.04762, cr_loss=1.01e-14, over 944034.00 frames. 2024-09-15 10:17:11,512 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 10:17:28,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=271637.3333333333, ans=0.125 2024-09-15 10:17:39,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-15 10:18:00,338 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.342e+02 2.569e+02 2.810e+02 3.882e+02, threshold=5.139e+02, percent-clipped=0.0 2024-09-15 10:18:05,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-09-15 10:18:27,542 INFO [train.py:1198] (1/2) Epoch 16, batch 50, loss[loss=0.2533, ctc_loss=0.1753, cr_loss=0.39, over 20971.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1699, cr_loss=0.3888, over 927017.73 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:19:35,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=271864.0, ans=0.0 2024-09-15 10:19:45,704 INFO [train.py:1198] (1/2) Epoch 16, batch 100, loss[loss=0.2382, ctc_loss=0.1634, cr_loss=0.3742, over 21058.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3885, over 1623079.43 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:20:19,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=271949.0, ans=0.125 2024-09-15 10:20:34,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 1.999e+02 2.139e+02 2.310e+02 4.100e+02, threshold=4.277e+02, percent-clipped=0.0 2024-09-15 10:20:58,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-15 10:21:02,486 INFO [train.py:1198] (1/2) Epoch 16, batch 150, loss[loss=0.2418, ctc_loss=0.1652, cr_loss=0.3828, over 20788.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1691, cr_loss=0.387, over 2174709.61 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:21:07,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=22.5 2024-09-15 10:21:25,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=272062.3333333333, ans=0.125 2024-09-15 10:21:49,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=272119.0, ans=0.07 2024-09-15 10:22:17,837 INFO [train.py:1198] (1/2) Epoch 16, batch 200, loss[loss=0.2626, ctc_loss=0.1808, cr_loss=0.4089, over 20970.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1697, cr_loss=0.3882, over 2600231.72 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:22:36,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-15 10:22:42,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=272204.0, ans=0.0 2024-09-15 10:22:42,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=272204.0, ans=0.025 2024-09-15 10:22:53,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-15 10:23:08,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=272260.6666666667, ans=0.125 2024-09-15 10:23:09,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.114e+02 2.225e+02 2.373e+02 3.012e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 10:23:36,530 INFO [train.py:1198] (1/2) Epoch 16, batch 250, loss[loss=0.2299, ctc_loss=0.1548, cr_loss=0.3758, over 20899.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1697, cr_loss=0.3892, over 2948165.40 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:23:38,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=272317.3333333333, ans=0.0 2024-09-15 10:24:02,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272345.6666666667, ans=0.125 2024-09-15 10:24:02,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272345.6666666667, ans=0.1 2024-09-15 10:24:11,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=272374.0, ans=10.0 2024-09-15 10:24:22,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=272402.3333333333, ans=0.125 2024-09-15 10:24:32,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=272402.3333333333, ans=0.125 2024-09-15 10:24:54,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-15 10:24:54,972 INFO [train.py:1198] (1/2) Epoch 16, batch 300, loss[loss=0.2574, ctc_loss=0.1768, cr_loss=0.403, over 20875.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1696, cr_loss=0.3887, over 3208562.62 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 16.0 2024-09-15 10:25:05,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=272459.0, ans=0.125 2024-09-15 10:25:44,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.092e+02 2.244e+02 2.386e+02 3.188e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-15 10:26:03,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=272572.3333333333, ans=0.04949747468305833 2024-09-15 10:26:10,436 INFO [train.py:1198] (1/2) Epoch 16, batch 350, loss[loss=0.2579, ctc_loss=0.1765, cr_loss=0.407, over 19978.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1696, cr_loss=0.3888, over 3397073.52 frames. ], batch size: 80, lr: 5.32e-03, grad_scale: 16.0 2024-09-15 10:27:25,108 INFO [train.py:1198] (1/2) Epoch 16, batch 400, loss[loss=0.2923, ctc_loss=0.2094, cr_loss=0.4147, over 14703.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1702, cr_loss=0.3893, over 3552860.30 frames. ], batch size: 149, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:27:37,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:27:38,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=272770.6666666667, ans=0.0 2024-09-15 10:28:18,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.028e+02 2.168e+02 2.321e+02 3.493e+02, threshold=4.336e+02, percent-clipped=0.0 2024-09-15 10:28:18,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=272827.3333333333, ans=0.5 2024-09-15 10:28:25,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=272827.3333333333, ans=0.0 2024-09-15 10:28:43,423 INFO [train.py:1198] (1/2) Epoch 16, batch 450, loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3768, over 21053.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1705, cr_loss=0.3895, over 3661841.12 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:28:45,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=272884.0, ans=0.125 2024-09-15 10:28:58,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=272912.3333333333, ans=0.0 2024-09-15 10:29:47,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=22.5 2024-09-15 10:29:55,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272997.3333333333, ans=0.125 2024-09-15 10:29:58,295 INFO [train.py:1198] (1/2) Epoch 16, batch 500, loss[loss=0.248, ctc_loss=0.1666, cr_loss=0.4072, over 20968.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1704, cr_loss=0.389, over 3752399.17 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:30:28,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273054.0, ans=0.125 2024-09-15 10:30:50,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.028e+02 2.192e+02 2.340e+02 4.949e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 10:31:16,492 INFO [train.py:1198] (1/2) Epoch 16, batch 550, loss[loss=0.2809, ctc_loss=0.1904, cr_loss=0.4525, over 20833.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1704, cr_loss=0.3892, over 3832281.97 frames. ], batch size: 65, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:31:21,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=273167.3333333333, ans=0.2 2024-09-15 10:31:21,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=273167.3333333333, ans=0.025 2024-09-15 10:32:32,325 INFO [train.py:1198] (1/2) Epoch 16, batch 600, loss[loss=0.2187, ctc_loss=0.1502, cr_loss=0.3424, over 20886.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1692, cr_loss=0.3875, over 3888786.71 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:32:39,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2024-09-15 10:32:43,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-15 10:33:24,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.071e+02 2.221e+02 2.482e+02 4.369e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-15 10:33:30,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273394.0, ans=0.125 2024-09-15 10:33:48,835 INFO [train.py:1198] (1/2) Epoch 16, batch 650, loss[loss=0.2429, ctc_loss=0.1672, cr_loss=0.3788, over 20944.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1687, cr_loss=0.3869, over 3944480.85 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 16.0 2024-09-15 10:33:52,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=273450.6666666667, ans=0.07 2024-09-15 10:34:04,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=273450.6666666667, ans=0.125 2024-09-15 10:34:06,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273479.0, ans=0.125 2024-09-15 10:34:13,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273479.0, ans=0.1 2024-09-15 10:34:46,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=273535.6666666667, ans=0.2 2024-09-15 10:34:49,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273535.6666666667, ans=0.1 2024-09-15 10:35:07,466 INFO [train.py:1198] (1/2) Epoch 16, batch 700, loss[loss=0.2616, ctc_loss=0.1799, cr_loss=0.4086, over 20417.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1683, cr_loss=0.3862, over 3979651.85 frames. ], batch size: 74, lr: 5.31e-03, grad_scale: 16.0 2024-09-15 10:35:25,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273620.6666666667, ans=0.1 2024-09-15 10:35:41,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=22.5 2024-09-15 10:35:46,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=273649.0, ans=0.025 2024-09-15 10:35:47,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=273649.0, ans=15.0 2024-09-15 10:35:58,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.060e+02 2.179e+02 2.435e+02 5.198e+02, threshold=4.358e+02, percent-clipped=1.0 2024-09-15 10:36:25,823 INFO [train.py:1198] (1/2) Epoch 16, batch 750, loss[loss=0.2358, ctc_loss=0.1584, cr_loss=0.3869, over 20975.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1684, cr_loss=0.3867, over 4013872.50 frames. ], batch size: 48, lr: 5.30e-03, grad_scale: 16.0 2024-09-15 10:36:41,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=273762.3333333333, ans=0.125 2024-09-15 10:36:56,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=273790.6666666667, ans=0.05 2024-09-15 10:37:08,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273790.6666666667, ans=0.1 2024-09-15 10:37:14,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273819.0, ans=0.1 2024-09-15 10:37:33,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=273847.3333333333, ans=0.025 2024-09-15 10:37:40,944 INFO [train.py:1198] (1/2) Epoch 16, batch 800, loss[loss=0.2387, ctc_loss=0.1626, cr_loss=0.3807, over 21024.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1681, cr_loss=0.3864, over 4036009.42 frames. ], batch size: 62, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:37:52,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=22.5 2024-09-15 10:37:53,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-15 10:38:15,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273932.3333333333, ans=0.1 2024-09-15 10:38:18,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=273932.3333333333, ans=0.125 2024-09-15 10:38:20,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=273932.3333333333, ans=0.07 2024-09-15 10:38:32,340 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.024e+02 2.175e+02 2.334e+02 3.062e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-15 10:38:37,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=273960.6666666667, ans=0.2 2024-09-15 10:38:54,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=274017.3333333333, ans=0.0 2024-09-15 10:38:56,007 INFO [train.py:1198] (1/2) Epoch 16, batch 850, loss[loss=0.2764, ctc_loss=0.191, cr_loss=0.4273, over 20765.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1687, cr_loss=0.3873, over 4051945.57 frames. ], batch size: 71, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:39:10,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-15 10:39:58,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-15 10:40:14,864 INFO [train.py:1198] (1/2) Epoch 16, batch 900, loss[loss=0.2645, ctc_loss=0.1858, cr_loss=0.3935, over 20980.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1701, cr_loss=0.3882, over 4053767.55 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:40:31,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274187.3333333333, ans=0.1 2024-09-15 10:40:47,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.64 vs. limit=10.0 2024-09-15 10:41:06,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.058e+02 2.209e+02 2.546e+02 4.940e+02, threshold=4.417e+02, percent-clipped=2.0 2024-09-15 10:41:30,598 INFO [train.py:1198] (1/2) Epoch 16, batch 950, loss[loss=0.2261, ctc_loss=0.1523, cr_loss=0.3689, over 20972.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1696, cr_loss=0.3875, over 4077933.83 frames. ], batch size: 48, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:41:34,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=274300.6666666667, ans=0.0 2024-09-15 10:42:28,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=274385.6666666667, ans=0.025 2024-09-15 10:42:43,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274414.0, ans=0.125 2024-09-15 10:42:48,874 INFO [train.py:1198] (1/2) Epoch 16, batch 1000, loss[loss=0.2041, ctc_loss=0.135, cr_loss=0.3459, over 20970.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1693, cr_loss=0.3871, over 4080707.79 frames. ], batch size: 49, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:43:07,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=274470.6666666667, ans=0.5 2024-09-15 10:43:09,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-15 10:43:36,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=274527.3333333333, ans=0.125 2024-09-15 10:43:39,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.082e+02 2.262e+02 2.543e+02 4.405e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-15 10:43:41,560 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:43:41,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=22.5 2024-09-15 10:44:04,051 INFO [train.py:1198] (1/2) Epoch 16, batch 1050, loss[loss=0.2089, ctc_loss=0.1448, cr_loss=0.3203, over 20982.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1697, cr_loss=0.3876, over 4077951.69 frames. ], batch size: 48, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:44:28,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=12.0 2024-09-15 10:44:46,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=274640.6666666667, ans=0.2 2024-09-15 10:44:57,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=274669.0, ans=10.0 2024-09-15 10:44:58,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=274669.0, ans=0.125 2024-09-15 10:44:59,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=274669.0, ans=0.125 2024-09-15 10:45:00,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=12.0 2024-09-15 10:45:02,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274697.3333333333, ans=0.125 2024-09-15 10:45:05,947 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:45:16,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=274697.3333333333, ans=0.0 2024-09-15 10:45:21,813 INFO [train.py:1198] (1/2) Epoch 16, batch 1100, loss[loss=0.2208, ctc_loss=0.1498, cr_loss=0.3552, over 21074.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1696, cr_loss=0.3872, over 4079414.95 frames. ], batch size: 53, lr: 5.30e-03, grad_scale: 16.0 2024-09-15 10:45:22,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=274725.6666666667, ans=0.0 2024-09-15 10:45:42,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=274754.0, ans=0.125 2024-09-15 10:46:14,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.064e+02 2.211e+02 2.356e+02 4.647e+02, threshold=4.423e+02, percent-clipped=1.0 2024-09-15 10:46:37,234 INFO [train.py:1198] (1/2) Epoch 16, batch 1150, loss[loss=0.2797, ctc_loss=0.1932, cr_loss=0.4326, over 20826.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1696, cr_loss=0.3871, over 4086691.49 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 16.0 2024-09-15 10:46:50,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.40 vs. limit=10.0 2024-09-15 10:46:55,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=274895.6666666667, ans=0.125 2024-09-15 10:46:55,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=22.5 2024-09-15 10:47:12,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-15 10:47:18,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=274924.0, ans=0.09899494936611666 2024-09-15 10:47:36,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=274952.3333333333, ans=0.0 2024-09-15 10:47:55,697 INFO [train.py:1198] (1/2) Epoch 16, batch 1200, loss[loss=0.2965, ctc_loss=0.2144, cr_loss=0.4107, over 14348.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1688, cr_loss=0.3856, over 4089734.08 frames. ], batch size: 149, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:47:56,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-15 10:48:21,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=275037.3333333333, ans=0.0 2024-09-15 10:48:27,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275065.6666666667, ans=0.125 2024-09-15 10:48:30,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275065.6666666667, ans=0.125 2024-09-15 10:48:32,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275065.6666666667, ans=0.1 2024-09-15 10:48:41,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=275094.0, ans=0.0 2024-09-15 10:48:42,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=275094.0, ans=0.2 2024-09-15 10:48:43,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-15 10:48:48,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.711e+02 2.017e+02 2.151e+02 2.331e+02 3.562e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-15 10:48:53,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=275094.0, ans=0.0 2024-09-15 10:48:54,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=275094.0, ans=22.5 2024-09-15 10:48:55,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=275122.3333333333, ans=0.125 2024-09-15 10:49:11,228 INFO [train.py:1198] (1/2) Epoch 16, batch 1250, loss[loss=0.2337, ctc_loss=0.1598, cr_loss=0.3693, over 20935.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1695, cr_loss=0.3866, over 4095374.04 frames. ], batch size: 60, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:49:17,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=275150.6666666667, ans=0.125 2024-09-15 10:49:31,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=275179.0, ans=0.125 2024-09-15 10:49:51,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-15 10:49:53,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=275207.3333333333, ans=0.125 2024-09-15 10:49:57,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=275235.6666666667, ans=0.025 2024-09-15 10:50:09,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-09-15 10:50:16,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=275264.0, ans=0.0 2024-09-15 10:50:27,021 INFO [train.py:1198] (1/2) Epoch 16, batch 1300, loss[loss=0.2024, ctc_loss=0.1342, cr_loss=0.3407, over 20920.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1691, cr_loss=0.3861, over 4080903.40 frames. ], batch size: 49, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:50:57,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=275320.6666666667, ans=0.0 2024-09-15 10:51:17,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=275377.3333333333, ans=0.2 2024-09-15 10:51:20,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=275377.3333333333, ans=0.125 2024-09-15 10:51:22,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.146e+02 2.311e+02 2.483e+02 3.541e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-15 10:51:23,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=275377.3333333333, ans=0.025 2024-09-15 10:51:23,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-15 10:51:25,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-15 10:51:35,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275405.6666666667, ans=0.1 2024-09-15 10:51:45,665 INFO [train.py:1198] (1/2) Epoch 16, batch 1350, loss[loss=0.2551, ctc_loss=0.1736, cr_loss=0.4077, over 20933.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1687, cr_loss=0.3853, over 4082932.68 frames. ], batch size: 60, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:52:07,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=275462.3333333333, ans=0.125 2024-09-15 10:52:31,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=275519.0, ans=0.125 2024-09-15 10:53:01,504 INFO [train.py:1198] (1/2) Epoch 16, batch 1400, loss[loss=0.259, ctc_loss=0.1778, cr_loss=0.4061, over 20649.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1686, cr_loss=0.386, over 4091707.99 frames. ], batch size: 71, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:53:21,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-15 10:53:56,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.136e+02 2.287e+02 2.701e+02 4.168e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-15 10:54:09,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=275689.0, ans=0.0 2024-09-15 10:54:11,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-15 10:54:15,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=275689.0, ans=0.125 2024-09-15 10:54:19,481 INFO [train.py:1198] (1/2) Epoch 16, batch 1450, loss[loss=0.279, ctc_loss=0.1944, cr_loss=0.4226, over 20881.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1694, cr_loss=0.3879, over 4094093.62 frames. ], batch size: 65, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:54:26,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=275717.3333333333, ans=0.0 2024-09-15 10:54:27,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=275717.3333333333, ans=0.0 2024-09-15 10:54:51,674 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:55:01,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=275774.0, ans=0.0 2024-09-15 10:55:34,519 INFO [train.py:1198] (1/2) Epoch 16, batch 1500, loss[loss=0.2775, ctc_loss=0.1917, cr_loss=0.4291, over 20632.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1691, cr_loss=0.3874, over 4095854.67 frames. ], batch size: 66, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:55:48,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=275887.3333333333, ans=0.125 2024-09-15 10:56:02,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=275887.3333333333, ans=0.125 2024-09-15 10:56:02,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=22.5 2024-09-15 10:56:27,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.040e+02 2.152e+02 2.370e+02 3.364e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 10:56:52,714 INFO [train.py:1198] (1/2) Epoch 16, batch 1550, loss[loss=0.2557, ctc_loss=0.1737, cr_loss=0.4102, over 21008.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1689, cr_loss=0.3864, over 4094965.77 frames. ], batch size: 63, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:57:09,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=276029.0, ans=0.2 2024-09-15 10:57:20,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=276029.0, ans=0.025 2024-09-15 10:58:08,067 INFO [train.py:1198] (1/2) Epoch 16, batch 1600, loss[loss=0.2223, ctc_loss=0.1487, cr_loss=0.3678, over 21049.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1693, cr_loss=0.3871, over 4097493.54 frames. ], batch size: 53, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:58:20,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276142.3333333333, ans=0.125 2024-09-15 10:59:05,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.010e+02 2.161e+02 2.414e+02 4.031e+02, threshold=4.322e+02, percent-clipped=0.0 2024-09-15 10:59:26,801 INFO [train.py:1198] (1/2) Epoch 16, batch 1650, loss[loss=0.2547, ctc_loss=0.1749, cr_loss=0.3989, over 20849.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.17, cr_loss=0.3881, over 4100579.69 frames. ], batch size: 65, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 10:59:40,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=276312.3333333333, ans=0.0 2024-09-15 10:59:40,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=276312.3333333333, ans=0.125 2024-09-15 10:59:41,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276312.3333333333, ans=0.1 2024-09-15 10:59:50,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276312.3333333333, ans=0.1 2024-09-15 11:00:32,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2024-09-15 11:00:33,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=276397.3333333333, ans=22.5 2024-09-15 11:00:41,681 INFO [train.py:1198] (1/2) Epoch 16, batch 1700, loss[loss=0.1946, ctc_loss=0.1333, cr_loss=0.307, over 20964.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1697, cr_loss=0.3876, over 4105870.21 frames. ], batch size: 50, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:00:50,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-09-15 11:01:01,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=276454.0, ans=0.0 2024-09-15 11:01:29,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276510.6666666667, ans=0.1 2024-09-15 11:01:35,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=276510.6666666667, ans=0.0 2024-09-15 11:01:36,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.054e+02 2.208e+02 2.376e+02 3.024e+02, threshold=4.416e+02, percent-clipped=0.0 2024-09-15 11:01:38,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-09-15 11:01:39,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276510.6666666667, ans=0.1 2024-09-15 11:01:57,116 INFO [train.py:1198] (1/2) Epoch 16, batch 1750, loss[loss=0.2449, ctc_loss=0.1685, cr_loss=0.3821, over 20932.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.169, cr_loss=0.3867, over 4114864.71 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:02:04,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276567.3333333333, ans=0.125 2024-09-15 11:02:46,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-15 11:03:15,010 INFO [train.py:1198] (1/2) Epoch 16, batch 1800, loss[loss=0.2183, ctc_loss=0.149, cr_loss=0.3464, over 20974.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1688, cr_loss=0.3864, over 4109463.65 frames. ], batch size: 51, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:03:18,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=276709.0, ans=0.09899494936611666 2024-09-15 11:03:19,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=276709.0, ans=0.125 2024-09-15 11:03:21,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=276709.0, ans=0.125 2024-09-15 11:03:33,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276737.3333333333, ans=0.125 2024-09-15 11:03:42,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276737.3333333333, ans=0.1 2024-09-15 11:03:52,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2024-09-15 11:04:00,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=276794.0, ans=0.0 2024-09-15 11:04:09,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.054e+02 2.158e+02 2.340e+02 3.233e+02, threshold=4.315e+02, percent-clipped=0.0 2024-09-15 11:04:30,374 INFO [train.py:1198] (1/2) Epoch 16, batch 1850, loss[loss=0.2174, ctc_loss=0.1451, cr_loss=0.3613, over 20994.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1682, cr_loss=0.3855, over 4109499.88 frames. ], batch size: 51, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:04:32,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2024-09-15 11:04:35,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276850.6666666667, ans=0.125 2024-09-15 11:05:08,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=276907.3333333333, ans=0.125 2024-09-15 11:05:46,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=276964.0, ans=0.025 2024-09-15 11:05:48,764 INFO [train.py:1198] (1/2) Epoch 16, batch 1900, loss[loss=0.2906, ctc_loss=0.2006, cr_loss=0.4501, over 20982.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1685, cr_loss=0.3861, over 4111350.69 frames. ], batch size: 64, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:06:11,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=277020.6666666667, ans=0.125 2024-09-15 11:06:28,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=277049.0, ans=0.125 2024-09-15 11:06:43,220 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.018e+02 2.183e+02 2.379e+02 4.800e+02, threshold=4.365e+02, percent-clipped=1.0 2024-09-15 11:07:00,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=277105.6666666667, ans=0.125 2024-09-15 11:07:01,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277105.6666666667, ans=0.125 2024-09-15 11:07:04,349 INFO [train.py:1198] (1/2) Epoch 16, batch 1950, loss[loss=0.2119, ctc_loss=0.1432, cr_loss=0.3431, over 19917.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1682, cr_loss=0.3858, over 4099296.57 frames. ], batch size: 44, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:07:28,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=277162.3333333333, ans=0.1 2024-09-15 11:08:04,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-15 11:08:16,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=277247.3333333333, ans=0.0 2024-09-15 11:08:22,181 INFO [train.py:1198] (1/2) Epoch 16, batch 2000, loss[loss=0.2273, ctc_loss=0.1521, cr_loss=0.3759, over 20967.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.169, cr_loss=0.3873, over 4091849.35 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:08:25,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=277275.6666666667, ans=0.0 2024-09-15 11:09:16,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.127e+02 2.263e+02 2.565e+02 4.364e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 11:09:29,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277389.0, ans=0.1 2024-09-15 11:09:33,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=277389.0, ans=0.0 2024-09-15 11:09:37,735 INFO [train.py:1198] (1/2) Epoch 16, batch 2050, loss[loss=0.2659, ctc_loss=0.1853, cr_loss=0.403, over 20871.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1693, cr_loss=0.3873, over 4093488.07 frames. ], batch size: 65, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:09:42,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277417.3333333333, ans=0.125 2024-09-15 11:09:50,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=277417.3333333333, ans=0.125 2024-09-15 11:10:00,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=277445.6666666667, ans=22.5 2024-09-15 11:10:03,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-15 11:10:07,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277474.0, ans=0.125 2024-09-15 11:10:42,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277530.6666666667, ans=0.125 2024-09-15 11:10:55,894 INFO [train.py:1198] (1/2) Epoch 16, batch 2100, loss[loss=0.2171, ctc_loss=0.1462, cr_loss=0.3543, over 21070.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1693, cr_loss=0.3873, over 4089690.13 frames. ], batch size: 53, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:10:56,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=277559.0, ans=0.0 2024-09-15 11:11:23,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=277587.3333333333, ans=0.0 2024-09-15 11:11:44,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=277644.0, ans=0.2 2024-09-15 11:11:50,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.070e+02 2.230e+02 2.466e+02 3.838e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-15 11:12:02,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-15 11:12:11,200 INFO [train.py:1198] (1/2) Epoch 16, batch 2150, loss[loss=0.263, ctc_loss=0.181, cr_loss=0.4097, over 20849.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1697, cr_loss=0.3885, over 4096785.37 frames. ], batch size: 65, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:12:59,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=277785.6666666667, ans=0.025 2024-09-15 11:13:10,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=277814.0, ans=0.09899494936611666 2024-09-15 11:13:26,454 INFO [train.py:1198] (1/2) Epoch 16, batch 2200, loss[loss=0.2392, ctc_loss=0.1594, cr_loss=0.3992, over 20988.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1696, cr_loss=0.3884, over 4101245.16 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:14:23,376 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.076e+02 2.237e+02 2.356e+02 2.843e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-15 11:14:26,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=277927.3333333333, ans=0.125 2024-09-15 11:14:44,523 INFO [train.py:1198] (1/2) Epoch 16, batch 2250, loss[loss=0.2536, ctc_loss=0.1739, cr_loss=0.3988, over 20868.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1696, cr_loss=0.3892, over 4104624.53 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:15:01,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278012.3333333333, ans=0.125 2024-09-15 11:15:06,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=278012.3333333333, ans=0.125 2024-09-15 11:15:25,662 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:15:36,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=278069.0, ans=0.0 2024-09-15 11:15:59,976 INFO [train.py:1198] (1/2) Epoch 16, batch 2300, loss[loss=0.2223, ctc_loss=0.1517, cr_loss=0.3526, over 20983.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1686, cr_loss=0.3877, over 4118349.74 frames. ], batch size: 52, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:16:25,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=278154.0, ans=0.2 2024-09-15 11:16:26,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2024-09-15 11:16:38,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-15 11:16:57,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.037e+02 2.171e+02 2.368e+02 3.330e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-15 11:17:02,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=278239.0, ans=0.0 2024-09-15 11:17:18,525 INFO [train.py:1198] (1/2) Epoch 16, batch 2350, loss[loss=0.2538, ctc_loss=0.1757, cr_loss=0.3902, over 21035.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1673, cr_loss=0.3856, over 4127000.46 frames. ], batch size: 62, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:17:19,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=8.0 2024-09-15 11:17:37,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=278295.6666666667, ans=0.025 2024-09-15 11:18:30,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278380.6666666667, ans=0.1 2024-09-15 11:18:33,299 INFO [train.py:1198] (1/2) Epoch 16, batch 2400, loss[loss=0.2487, ctc_loss=0.1713, cr_loss=0.3872, over 20939.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1676, cr_loss=0.3867, over 4121973.92 frames. ], batch size: 60, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:18:39,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=278409.0, ans=0.125 2024-09-15 11:18:49,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=278437.3333333333, ans=15.0 2024-09-15 11:18:50,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=278437.3333333333, ans=0.0 2024-09-15 11:19:14,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=278465.6666666667, ans=0.125 2024-09-15 11:19:30,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.054e+02 2.170e+02 2.281e+02 3.481e+02, threshold=4.340e+02, percent-clipped=0.0 2024-09-15 11:19:51,608 INFO [train.py:1198] (1/2) Epoch 16, batch 2450, loss[loss=0.2228, ctc_loss=0.1557, cr_loss=0.3356, over 20826.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1682, cr_loss=0.3871, over 4110181.73 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:20:05,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-15 11:20:17,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=278579.0, ans=0.0 2024-09-15 11:20:24,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-15 11:20:34,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=278607.3333333333, ans=0.05 2024-09-15 11:21:07,631 INFO [train.py:1198] (1/2) Epoch 16, batch 2500, loss[loss=0.2473, ctc_loss=0.1668, cr_loss=0.4028, over 20883.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1682, cr_loss=0.3863, over 4103229.26 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:21:19,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=278692.3333333333, ans=0.125 2024-09-15 11:21:34,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=278720.6666666667, ans=0.125 2024-09-15 11:21:53,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=278777.3333333333, ans=0.125 2024-09-15 11:22:01,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=278777.3333333333, ans=0.0 2024-09-15 11:22:04,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.114e+02 2.246e+02 2.453e+02 4.101e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-15 11:22:16,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=278805.6666666667, ans=0.125 2024-09-15 11:22:25,247 INFO [train.py:1198] (1/2) Epoch 16, batch 2550, loss[loss=0.2461, ctc_loss=0.1707, cr_loss=0.3767, over 19272.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1687, cr_loss=0.3872, over 4098381.65 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:22:57,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=278890.6666666667, ans=0.125 2024-09-15 11:23:04,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278890.6666666667, ans=0.1 2024-09-15 11:23:37,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=278947.3333333333, ans=0.2 2024-09-15 11:23:40,307 INFO [train.py:1198] (1/2) Epoch 16, batch 2600, loss[loss=0.2074, ctc_loss=0.1387, cr_loss=0.3432, over 20948.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1691, cr_loss=0.388, over 4104593.66 frames. ], batch size: 48, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:23:45,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=278975.6666666667, ans=0.0 2024-09-15 11:23:54,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-15 11:23:56,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-15 11:23:58,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.10 vs. limit=5.0 2024-09-15 11:24:04,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=279004.0, ans=0.125 2024-09-15 11:24:34,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.072e+02 2.242e+02 2.418e+02 3.314e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-15 11:24:56,228 INFO [train.py:1198] (1/2) Epoch 16, batch 2650, loss[loss=0.3036, ctc_loss=0.2099, cr_loss=0.4685, over 20097.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1695, cr_loss=0.3895, over 4113035.37 frames. ], batch size: 80, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:25:01,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=22.5 2024-09-15 11:25:32,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=279174.0, ans=0.125 2024-09-15 11:25:34,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=279174.0, ans=0.025 2024-09-15 11:25:40,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-15 11:26:05,717 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:26:14,114 INFO [train.py:1198] (1/2) Epoch 16, batch 2700, loss[loss=0.284, ctc_loss=0.1966, cr_loss=0.437, over 20059.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1698, cr_loss=0.3894, over 4107090.50 frames. ], batch size: 80, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:26:20,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-15 11:26:39,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279287.3333333333, ans=0.1 2024-09-15 11:26:55,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-15 11:26:59,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=279344.0, ans=0.0 2024-09-15 11:27:07,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.038e+02 2.202e+02 2.328e+02 3.031e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 11:27:11,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=279344.0, ans=0.125 2024-09-15 11:27:31,967 INFO [train.py:1198] (1/2) Epoch 16, batch 2750, loss[loss=0.2161, ctc_loss=0.1453, cr_loss=0.3536, over 20959.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1699, cr_loss=0.389, over 4106332.23 frames. ], batch size: 48, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:27:41,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=279400.6666666667, ans=0.125 2024-09-15 11:28:46,887 INFO [train.py:1198] (1/2) Epoch 16, batch 2800, loss[loss=0.2356, ctc_loss=0.159, cr_loss=0.3829, over 20869.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1694, cr_loss=0.3879, over 4093805.96 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:29:38,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=279627.3333333333, ans=0.0 2024-09-15 11:29:42,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.064e+02 2.174e+02 2.612e+02 9.987e+02, threshold=4.347e+02, percent-clipped=2.0 2024-09-15 11:29:45,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-15 11:29:58,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-15 11:30:01,851 INFO [train.py:1198] (1/2) Epoch 16, batch 2850, loss[loss=0.2651, ctc_loss=0.1786, cr_loss=0.4325, over 20974.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1691, cr_loss=0.3877, over 4086251.48 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:30:08,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=279684.0, ans=0.0 2024-09-15 11:30:20,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=279712.3333333333, ans=0.125 2024-09-15 11:30:23,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=279712.3333333333, ans=0.2 2024-09-15 11:31:08,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=279797.3333333333, ans=0.125 2024-09-15 11:31:19,937 INFO [train.py:1198] (1/2) Epoch 16, batch 2900, loss[loss=0.2627, ctc_loss=0.1847, cr_loss=0.3901, over 20949.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3855, over 4098812.50 frames. ], batch size: 60, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:31:28,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=279825.6666666667, ans=0.2 2024-09-15 11:31:29,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279825.6666666667, ans=0.1 2024-09-15 11:32:00,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=279882.3333333333, ans=0.0 2024-09-15 11:32:06,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=279910.6666666667, ans=0.125 2024-09-15 11:32:15,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.089e+02 2.192e+02 2.388e+02 7.769e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 11:32:35,351 INFO [train.py:1198] (1/2) Epoch 16, batch 2950, loss[loss=0.2953, ctc_loss=0.2055, cr_loss=0.4492, over 20863.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1689, cr_loss=0.3864, over 4089722.08 frames. ], batch size: 65, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:33:15,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-09-15 11:33:17,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280024.0, ans=0.1 2024-09-15 11:33:52,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=280109.0, ans=0.125 2024-09-15 11:33:52,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=280109.0, ans=0.2 2024-09-15 11:33:53,899 INFO [train.py:1198] (1/2) Epoch 16, batch 3000, loss[loss=0.2711, ctc_loss=0.1883, cr_loss=0.4139, over 20744.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1696, cr_loss=0.3876, over 4096100.62 frames. ], batch size: 71, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:33:53,899 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 11:34:20,150 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.04642, ctc_loss=0.04642, cr_loss=1.036e-14, over 944034.00 frames. 2024-09-15 11:34:20,150 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 11:34:43,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=280137.3333333333, ans=0.125 2024-09-15 11:34:48,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-09-15 11:35:04,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280194.0, ans=0.125 2024-09-15 11:35:16,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.055e+02 2.187e+02 2.347e+02 3.238e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-15 11:35:36,190 INFO [train.py:1198] (1/2) Epoch 16, batch 3050, loss[loss=0.2512, ctc_loss=0.1722, cr_loss=0.3947, over 20946.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1675, cr_loss=0.3845, over 4108102.59 frames. ], batch size: 60, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:35:48,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=280250.6666666667, ans=0.0 2024-09-15 11:36:24,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-09-15 11:36:28,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-15 11:36:31,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=280335.6666666667, ans=0.0 2024-09-15 11:36:43,922 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:36:49,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=280364.0, ans=0.125 2024-09-15 11:36:53,871 INFO [train.py:1198] (1/2) Epoch 16, batch 3100, loss[loss=0.2633, ctc_loss=0.1806, cr_loss=0.4136, over 20077.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.168, cr_loss=0.3855, over 4117014.21 frames. ], batch size: 80, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:36:54,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=280392.3333333333, ans=0.125 2024-09-15 11:37:16,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=280420.6666666667, ans=0.2 2024-09-15 11:37:32,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-15 11:37:49,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.067e+02 2.238e+02 2.486e+02 3.935e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-15 11:37:56,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=280505.6666666667, ans=0.125 2024-09-15 11:38:09,360 INFO [train.py:1198] (1/2) Epoch 16, batch 3150, loss[loss=0.2617, ctc_loss=0.1791, cr_loss=0.4129, over 20857.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3855, over 4110650.20 frames. ], batch size: 65, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:38:24,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=280562.3333333333, ans=0.125 2024-09-15 11:38:30,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2024-09-15 11:38:42,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=22.5 2024-09-15 11:38:45,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=280590.6666666667, ans=0.0 2024-09-15 11:39:06,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280619.0, ans=0.1 2024-09-15 11:39:08,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280619.0, ans=0.1 2024-09-15 11:39:23,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280647.3333333333, ans=0.125 2024-09-15 11:39:27,310 INFO [train.py:1198] (1/2) Epoch 16, batch 3200, loss[loss=0.2719, ctc_loss=0.1836, cr_loss=0.4415, over 20687.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3869, over 4095518.65 frames. ], batch size: 66, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:39:31,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=280675.6666666667, ans=0.125 2024-09-15 11:40:23,310 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.086e+02 2.212e+02 2.432e+02 6.840e+02, threshold=4.425e+02, percent-clipped=1.0 2024-09-15 11:40:43,319 INFO [train.py:1198] (1/2) Epoch 16, batch 3250, loss[loss=0.2607, ctc_loss=0.1799, cr_loss=0.4037, over 21050.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.168, cr_loss=0.3862, over 4110688.97 frames. ], batch size: 62, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:41:59,156 INFO [train.py:1198] (1/2) Epoch 16, batch 3300, loss[loss=0.2072, ctc_loss=0.1393, cr_loss=0.3394, over 20932.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1672, cr_loss=0.3857, over 4112894.51 frames. ], batch size: 49, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:42:17,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280987.3333333333, ans=0.1 2024-09-15 11:42:35,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=281015.6666666667, ans=0.0 2024-09-15 11:42:54,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.14 vs. limit=10.0 2024-09-15 11:42:58,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.080e+02 2.243e+02 2.473e+02 3.891e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-15 11:43:07,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=281072.3333333333, ans=0.2 2024-09-15 11:43:17,765 INFO [train.py:1198] (1/2) Epoch 16, batch 3350, loss[loss=0.2438, ctc_loss=0.1633, cr_loss=0.4026, over 20831.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3857, over 4112041.84 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:43:19,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281100.6666666667, ans=0.1 2024-09-15 11:43:55,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2024-09-15 11:44:21,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=281214.0, ans=0.0 2024-09-15 11:44:30,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=281214.0, ans=0.125 2024-09-15 11:44:36,166 INFO [train.py:1198] (1/2) Epoch 16, batch 3400, loss[loss=0.2539, ctc_loss=0.1781, cr_loss=0.3792, over 20631.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1681, cr_loss=0.3863, over 4108412.98 frames. ], batch size: 68, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:44:44,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=281242.3333333333, ans=0.025 2024-09-15 11:45:33,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.672e+02 2.022e+02 2.135e+02 2.409e+02 4.243e+02, threshold=4.269e+02, percent-clipped=0.0 2024-09-15 11:45:51,641 INFO [train.py:1198] (1/2) Epoch 16, batch 3450, loss[loss=0.2291, ctc_loss=0.1548, cr_loss=0.3714, over 20956.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1675, cr_loss=0.3853, over 4111778.06 frames. ], batch size: 51, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:45:51,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281384.0, ans=0.125 2024-09-15 11:46:38,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-15 11:47:00,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281497.3333333333, ans=0.1 2024-09-15 11:47:05,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=281525.6666666667, ans=0.125 2024-09-15 11:47:06,500 INFO [train.py:1198] (1/2) Epoch 16, batch 3500, loss[loss=0.2557, ctc_loss=0.1735, cr_loss=0.411, over 21068.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1674, cr_loss=0.3853, over 4102082.79 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:47:28,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=281554.0, ans=0.2 2024-09-15 11:47:51,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=281582.3333333333, ans=0.025 2024-09-15 11:47:58,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-15 11:48:06,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.022e+02 2.139e+02 2.292e+02 3.021e+02, threshold=4.278e+02, percent-clipped=0.0 2024-09-15 11:48:24,688 INFO [train.py:1198] (1/2) Epoch 16, batch 3550, loss[loss=0.2673, ctc_loss=0.1832, cr_loss=0.4205, over 21066.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1677, cr_loss=0.3865, over 4109321.43 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:49:04,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=281724.0, ans=0.125 2024-09-15 11:49:06,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-15 11:49:23,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=281780.6666666667, ans=0.125 2024-09-15 11:49:23,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=281780.6666666667, ans=0.125 2024-09-15 11:49:35,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=281780.6666666667, ans=0.5 2024-09-15 11:49:39,808 INFO [train.py:1198] (1/2) Epoch 16, batch 3600, loss[loss=0.242, ctc_loss=0.1652, cr_loss=0.384, over 20333.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.168, cr_loss=0.386, over 4095900.84 frames. ], batch size: 74, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:49:47,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=281809.0, ans=0.125 2024-09-15 11:50:12,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=281865.6666666667, ans=0.125 2024-09-15 11:50:37,077 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:50:39,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.090e+02 2.191e+02 2.448e+02 6.893e+02, threshold=4.383e+02, percent-clipped=1.0 2024-09-15 11:50:43,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=281922.3333333333, ans=0.1 2024-09-15 11:50:57,519 INFO [train.py:1198] (1/2) Epoch 16, batch 3650, loss[loss=0.2164, ctc_loss=0.1443, cr_loss=0.3606, over 20994.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1676, cr_loss=0.3851, over 4095046.98 frames. ], batch size: 52, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:51:25,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=12.0 2024-09-15 11:52:12,904 INFO [train.py:1198] (1/2) Epoch 16, batch 3700, loss[loss=0.2361, ctc_loss=0.1627, cr_loss=0.3669, over 20983.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1674, cr_loss=0.3847, over 4085777.96 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:52:13,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=22.5 2024-09-15 11:53:02,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=282177.3333333333, ans=0.0 2024-09-15 11:53:10,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.077e+02 2.266e+02 2.521e+02 3.872e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-15 11:53:23,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=282205.6666666667, ans=0.125 2024-09-15 11:53:27,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=282205.6666666667, ans=0.125 2024-09-15 11:53:31,736 INFO [train.py:1198] (1/2) Epoch 16, batch 3750, loss[loss=0.226, ctc_loss=0.1549, cr_loss=0.3555, over 21063.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1679, cr_loss=0.3856, over 4087863.14 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:53:39,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282234.0, ans=0.125 2024-09-15 11:53:58,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=15.0 2024-09-15 11:54:00,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=282290.6666666667, ans=0.0 2024-09-15 11:54:01,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=282290.6666666667, ans=0.2 2024-09-15 11:54:17,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=282319.0, ans=0.0 2024-09-15 11:54:43,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=282347.3333333333, ans=0.125 2024-09-15 11:54:47,125 INFO [train.py:1198] (1/2) Epoch 16, batch 3800, loss[loss=0.2605, ctc_loss=0.1818, cr_loss=0.3936, over 20630.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1677, cr_loss=0.3857, over 4100843.52 frames. ], batch size: 66, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:55:47,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.071e+02 2.197e+02 2.383e+02 3.469e+02, threshold=4.394e+02, percent-clipped=0.0 2024-09-15 11:56:05,314 INFO [train.py:1198] (1/2) Epoch 16, batch 3850, loss[loss=0.2298, ctc_loss=0.1554, cr_loss=0.372, over 20967.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1681, cr_loss=0.3868, over 4107895.37 frames. ], batch size: 48, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:56:37,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=282574.0, ans=0.125 2024-09-15 11:56:39,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=282574.0, ans=0.025 2024-09-15 11:56:56,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-09-15 11:57:09,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-09-15 11:57:20,666 INFO [train.py:1198] (1/2) Epoch 16, batch 3900, loss[loss=0.2287, ctc_loss=0.1542, cr_loss=0.3723, over 20972.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1681, cr_loss=0.3872, over 4103835.56 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:58:18,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.087e+02 2.211e+02 2.371e+02 3.175e+02, threshold=4.423e+02, percent-clipped=0.0 2024-09-15 11:58:36,144 INFO [train.py:1198] (1/2) Epoch 16, batch 3950, loss[loss=0.2729, ctc_loss=0.1895, cr_loss=0.417, over 17967.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1679, cr_loss=0.3867, over 4105620.17 frames. ], batch size: 108, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:58:59,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282829.0, ans=0.125 2024-09-15 11:59:11,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282857.3333333333, ans=0.125 2024-09-15 11:59:17,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=282857.3333333333, ans=0.125 2024-09-15 11:59:30,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282885.6666666667, ans=0.1 2024-09-15 11:59:54,474 INFO [train.py:1198] (1/2) Epoch 16, batch 4000, loss[loss=0.277, ctc_loss=0.1936, cr_loss=0.4167, over 20697.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1684, cr_loss=0.3872, over 4108824.95 frames. ], batch size: 71, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:00:18,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=282970.6666666667, ans=0.125 2024-09-15 12:00:51,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.048e+02 2.185e+02 2.313e+02 3.020e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 12:00:59,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283055.6666666667, ans=0.0 2024-09-15 12:01:04,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-15 12:01:09,762 INFO [train.py:1198] (1/2) Epoch 16, batch 4050, loss[loss=0.2118, ctc_loss=0.1413, cr_loss=0.3526, over 20967.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1685, cr_loss=0.3872, over 4099316.84 frames. ], batch size: 51, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:01:13,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283084.0, ans=0.1 2024-09-15 12:01:25,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=283112.3333333333, ans=0.2 2024-09-15 12:01:28,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=283112.3333333333, ans=0.2 2024-09-15 12:01:35,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=283112.3333333333, ans=0.0 2024-09-15 12:02:02,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283169.0, ans=0.1 2024-09-15 12:02:26,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-15 12:02:27,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283225.6666666667, ans=0.1 2024-09-15 12:02:28,567 INFO [train.py:1198] (1/2) Epoch 16, batch 4100, loss[loss=0.2619, ctc_loss=0.1831, cr_loss=0.3939, over 20854.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1684, cr_loss=0.3862, over 4091689.36 frames. ], batch size: 65, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:03:03,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=283282.3333333333, ans=0.125 2024-09-15 12:03:26,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.020e+02 2.185e+02 2.399e+02 4.031e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 12:03:43,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283367.3333333333, ans=0.1 2024-09-15 12:03:44,672 INFO [train.py:1198] (1/2) Epoch 16, batch 4150, loss[loss=0.2845, ctc_loss=0.1965, cr_loss=0.4402, over 20008.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1674, cr_loss=0.3857, over 4097696.13 frames. ], batch size: 80, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:03:46,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283367.3333333333, ans=0.125 2024-09-15 12:05:03,755 INFO [train.py:1198] (1/2) Epoch 16, batch 4200, loss[loss=0.3225, ctc_loss=0.2379, cr_loss=0.4231, over 13776.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1679, cr_loss=0.3867, over 4094945.37 frames. ], batch size: 150, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:05:11,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=283509.0, ans=0.125 2024-09-15 12:05:25,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=283537.3333333333, ans=10.0 2024-09-15 12:05:31,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-09-15 12:05:32,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=283565.6666666667, ans=0.0 2024-09-15 12:05:37,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=283565.6666666667, ans=0.09899494936611666 2024-09-15 12:05:51,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-09-15 12:06:01,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.043e+02 2.182e+02 2.317e+02 3.070e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 12:06:14,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=283622.3333333333, ans=0.125 2024-09-15 12:06:19,162 INFO [train.py:1198] (1/2) Epoch 16, batch 4250, loss[loss=0.2315, ctc_loss=0.1571, cr_loss=0.3721, over 21067.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1674, cr_loss=0.386, over 4098323.80 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:06:21,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=283650.6666666667, ans=0.025 2024-09-15 12:06:22,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=283650.6666666667, ans=0.125 2024-09-15 12:06:34,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=283679.0, ans=0.0 2024-09-15 12:06:36,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=283679.0, ans=0.125 2024-09-15 12:06:36,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=283679.0, ans=0.125 2024-09-15 12:07:16,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=283735.6666666667, ans=0.0 2024-09-15 12:07:19,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283764.0, ans=0.1 2024-09-15 12:07:33,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=283764.0, ans=6.0 2024-09-15 12:07:38,289 INFO [train.py:1198] (1/2) Epoch 16, batch 4300, loss[loss=0.2602, ctc_loss=0.1794, cr_loss=0.4039, over 19559.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1668, cr_loss=0.3845, over 4091756.90 frames. ], batch size: 90, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:07:42,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-15 12:08:06,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2024-09-15 12:08:16,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=283849.0, ans=0.2 2024-09-15 12:08:36,130 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.016e+02 2.106e+02 2.323e+02 3.174e+02, threshold=4.213e+02, percent-clipped=0.0 2024-09-15 12:08:37,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283905.6666666667, ans=0.125 2024-09-15 12:08:53,895 INFO [train.py:1198] (1/2) Epoch 16, batch 4350, loss[loss=0.2531, ctc_loss=0.1692, cr_loss=0.4195, over 20877.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1664, cr_loss=0.3844, over 4088021.67 frames. ], batch size: 57, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:09:30,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=283990.6666666667, ans=0.125 2024-09-15 12:09:46,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=284019.0, ans=0.125 2024-09-15 12:10:02,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2024-09-15 12:10:09,243 INFO [train.py:1198] (1/2) Epoch 16, batch 4400, loss[loss=0.2556, ctc_loss=0.177, cr_loss=0.3931, over 19504.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1661, cr_loss=0.3831, over 4095252.57 frames. ], batch size: 90, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:10:41,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=284132.3333333333, ans=0.0 2024-09-15 12:10:45,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=284132.3333333333, ans=0.125 2024-09-15 12:11:09,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.015e+02 2.169e+02 2.418e+02 3.055e+02, threshold=4.338e+02, percent-clipped=0.0 2024-09-15 12:11:26,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-15 12:11:27,268 INFO [train.py:1198] (1/2) Epoch 16, batch 4450, loss[loss=0.303, ctc_loss=0.2119, cr_loss=0.4553, over 17960.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1659, cr_loss=0.383, over 4101339.74 frames. ], batch size: 108, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:11:29,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=284217.3333333333, ans=0.125 2024-09-15 12:12:04,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=284274.0, ans=0.2 2024-09-15 12:12:28,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=284330.6666666667, ans=0.0 2024-09-15 12:12:41,898 INFO [train.py:1198] (1/2) Epoch 16, batch 4500, loss[loss=0.2681, ctc_loss=0.1853, cr_loss=0.414, over 19904.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1665, cr_loss=0.384, over 4110709.12 frames. ], batch size: 80, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:12:42,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=284359.0, ans=0.0 2024-09-15 12:13:06,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=284387.3333333333, ans=0.125 2024-09-15 12:13:11,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=284415.6666666667, ans=0.2 2024-09-15 12:13:37,065 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:13:42,890 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.048e+02 2.234e+02 2.459e+02 4.069e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-15 12:13:50,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=284472.3333333333, ans=0.125 2024-09-15 12:13:56,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2024-09-15 12:14:00,739 INFO [train.py:1198] (1/2) Epoch 16, batch 4550, loss[loss=0.2752, ctc_loss=0.1908, cr_loss=0.4219, over 20971.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1678, cr_loss=0.385, over 4094436.23 frames. ], batch size: 64, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:14:07,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-09-15 12:14:10,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-15 12:14:14,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=284529.0, ans=0.2 2024-09-15 12:14:31,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=284557.3333333333, ans=0.0 2024-09-15 12:14:49,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=284585.6666666667, ans=0.125 2024-09-15 12:15:04,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=284614.0, ans=0.125 2024-09-15 12:15:16,412 INFO [train.py:1198] (1/2) Epoch 16, batch 4600, loss[loss=0.2085, ctc_loss=0.1401, cr_loss=0.3418, over 20963.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1677, cr_loss=0.3851, over 4093682.59 frames. ], batch size: 49, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:15:48,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-15 12:16:16,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.086e+02 2.286e+02 2.593e+02 4.063e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-15 12:16:34,392 INFO [train.py:1198] (1/2) Epoch 16, batch 4650, loss[loss=0.3015, ctc_loss=0.2108, cr_loss=0.4532, over 18371.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1693, cr_loss=0.386, over 4058746.51 frames. ], batch size: 108, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:16:48,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=284812.3333333333, ans=0.0 2024-09-15 12:16:57,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-15 12:17:03,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=284840.6666666667, ans=0.2 2024-09-15 12:17:32,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284869.0, ans=0.125 2024-09-15 12:17:50,395 INFO [train.py:1198] (1/2) Epoch 16, batch 4700, loss[loss=0.225, ctc_loss=0.1531, cr_loss=0.3591, over 20349.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3865, over 4073116.27 frames. ], batch size: 74, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:18:10,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-15 12:18:30,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=284982.3333333333, ans=0.035 2024-09-15 12:18:48,393 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.074e+02 2.241e+02 2.500e+02 5.089e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-15 12:18:48,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=285010.6666666667, ans=0.125 2024-09-15 12:19:09,015 INFO [train.py:1198] (1/2) Epoch 16, batch 4750, loss[loss=0.2728, ctc_loss=0.1874, cr_loss=0.4273, over 19334.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1686, cr_loss=0.3861, over 4076275.93 frames. ], batch size: 90, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:19:10,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285067.3333333333, ans=0.1 2024-09-15 12:19:47,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-15 12:20:24,312 INFO [train.py:1198] (1/2) Epoch 16, batch 4800, loss[loss=0.2706, ctc_loss=0.1877, cr_loss=0.4145, over 20965.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1688, cr_loss=0.3864, over 4076867.39 frames. ], batch size: 64, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:20:48,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=285237.3333333333, ans=0.125 2024-09-15 12:21:04,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=285265.6666666667, ans=0.125 2024-09-15 12:21:09,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=285294.0, ans=0.0 2024-09-15 12:21:21,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.095e+02 2.225e+02 2.467e+02 3.509e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-15 12:21:39,102 INFO [train.py:1198] (1/2) Epoch 16, batch 4850, loss[loss=0.2735, ctc_loss=0.1902, cr_loss=0.4163, over 20848.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1698, cr_loss=0.3877, over 4073578.51 frames. ], batch size: 65, lr: 5.20e-03, grad_scale: 16.0 2024-09-15 12:21:50,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=285350.6666666667, ans=0.125 2024-09-15 12:21:55,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=285379.0, ans=0.125 2024-09-15 12:22:18,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=285407.3333333333, ans=10.0 2024-09-15 12:22:43,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-09-15 12:22:57,596 INFO [train.py:1198] (1/2) Epoch 16, batch 4900, loss[loss=0.2065, ctc_loss=0.1351, cr_loss=0.3571, over 20956.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1691, cr_loss=0.3876, over 4090360.09 frames. ], batch size: 49, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:23:10,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=285492.3333333333, ans=0.125 2024-09-15 12:23:10,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-09-15 12:23:56,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.062e+02 2.171e+02 2.368e+02 4.130e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-15 12:24:08,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-15 12:24:12,400 INFO [train.py:1198] (1/2) Epoch 16, batch 4950, loss[loss=0.2338, ctc_loss=0.1612, cr_loss=0.3627, over 20795.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1688, cr_loss=0.3874, over 4090065.85 frames. ], batch size: 53, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:24:15,717 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:24:39,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=285662.3333333333, ans=0.5 2024-09-15 12:24:41,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=285690.6666666667, ans=0.0 2024-09-15 12:24:42,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-09-15 12:24:56,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-09-15 12:25:19,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=285747.3333333333, ans=0.125 2024-09-15 12:25:27,099 INFO [train.py:1198] (1/2) Epoch 16, batch 5000, loss[loss=0.2954, ctc_loss=0.2048, cr_loss=0.453, over 18332.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.168, cr_loss=0.3863, over 4091695.29 frames. ], batch size: 108, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:25:29,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=285775.6666666667, ans=0.2 2024-09-15 12:25:49,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=285804.0, ans=0.125 2024-09-15 12:26:02,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=285832.3333333333, ans=0.0 2024-09-15 12:26:05,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=285832.3333333333, ans=10.0 2024-09-15 12:26:27,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.031e+02 2.174e+02 2.370e+02 4.597e+02, threshold=4.348e+02, percent-clipped=1.0 2024-09-15 12:26:32,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=285889.0, ans=0.125 2024-09-15 12:26:35,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=285889.0, ans=0.0 2024-09-15 12:26:41,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285889.0, ans=0.1 2024-09-15 12:26:44,032 INFO [train.py:1198] (1/2) Epoch 16, batch 5050, loss[loss=0.242, ctc_loss=0.1632, cr_loss=0.3939, over 20988.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1676, cr_loss=0.3859, over 4099792.17 frames. ], batch size: 58, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:27:09,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=285945.6666666667, ans=0.125 2024-09-15 12:27:57,819 INFO [train.py:1198] (1/2) Epoch 16, batch 5100, loss[loss=0.2334, ctc_loss=0.1598, cr_loss=0.3682, over 21053.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1682, cr_loss=0.3867, over 4095694.66 frames. ], batch size: 56, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:28:04,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=286059.0, ans=0.125 2024-09-15 12:28:15,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=286087.3333333333, ans=0.025 2024-09-15 12:28:18,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=286087.3333333333, ans=0.125 2024-09-15 12:28:21,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=286087.3333333333, ans=0.125 2024-09-15 12:28:31,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-15 12:28:48,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286144.0, ans=0.1 2024-09-15 12:28:49,030 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:28:56,182 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.107e+02 2.230e+02 2.385e+02 4.635e+02, threshold=4.460e+02, percent-clipped=2.0 2024-09-15 12:29:12,523 INFO [train.py:1198] (1/2) Epoch 16, batch 5150, loss[loss=0.2214, ctc_loss=0.1484, cr_loss=0.365, over 20984.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1671, cr_loss=0.3856, over 4108431.18 frames. ], batch size: 55, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:29:22,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-15 12:29:53,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=286257.3333333333, ans=0.04949747468305833 2024-09-15 12:30:00,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286285.6666666667, ans=0.0 2024-09-15 12:30:12,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=286314.0, ans=0.125 2024-09-15 12:30:21,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=286314.0, ans=0.125 2024-09-15 12:30:27,318 INFO [train.py:1198] (1/2) Epoch 16, batch 5200, loss[loss=0.2233, ctc_loss=0.156, cr_loss=0.3363, over 20969.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1678, cr_loss=0.3865, over 4105944.73 frames. ], batch size: 49, lr: 5.19e-03, grad_scale: 32.0 2024-09-15 12:31:27,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.069e+02 2.180e+02 2.413e+02 4.104e+02, threshold=4.359e+02, percent-clipped=0.0 2024-09-15 12:31:44,117 INFO [train.py:1198] (1/2) Epoch 16, batch 5250, loss[loss=0.2572, ctc_loss=0.1734, cr_loss=0.4188, over 20685.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1669, cr_loss=0.385, over 4114480.99 frames. ], batch size: 66, lr: 5.19e-03, grad_scale: 32.0 2024-09-15 12:32:31,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286569.0, ans=0.125 2024-09-15 12:32:40,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=286569.0, ans=0.0 2024-09-15 12:32:44,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=286597.3333333333, ans=0.125 2024-09-15 12:32:58,242 INFO [train.py:1198] (1/2) Epoch 16, batch 5300, loss[loss=0.2157, ctc_loss=0.1453, cr_loss=0.3519, over 20035.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3855, over 4102417.25 frames. ], batch size: 44, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:33:04,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=286625.6666666667, ans=0.125 2024-09-15 12:33:07,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=286625.6666666667, ans=0.07 2024-09-15 12:33:18,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-15 12:33:33,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-15 12:33:50,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-15 12:33:56,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.073e+02 2.226e+02 2.425e+02 4.322e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-15 12:33:56,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=286739.0, ans=0.125 2024-09-15 12:34:01,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-15 12:34:12,832 INFO [train.py:1198] (1/2) Epoch 16, batch 5350, loss[loss=0.2507, ctc_loss=0.1692, cr_loss=0.4073, over 20903.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1676, cr_loss=0.3854, over 4104097.38 frames. ], batch size: 54, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:34:43,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=286824.0, ans=0.0 2024-09-15 12:34:46,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286824.0, ans=0.125 2024-09-15 12:35:26,376 INFO [train.py:1198] (1/2) Epoch 16, batch 5400, loss[loss=0.2875, ctc_loss=0.1999, cr_loss=0.4381, over 20689.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3871, over 4096772.13 frames. ], batch size: 71, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:35:47,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=286937.3333333333, ans=0.0 2024-09-15 12:36:26,257 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.051e+02 2.207e+02 2.374e+02 3.301e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-15 12:36:28,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=287022.3333333333, ans=15.0 2024-09-15 12:36:42,711 INFO [train.py:1198] (1/2) Epoch 16, batch 5450, loss[loss=0.2105, ctc_loss=0.1398, cr_loss=0.3534, over 20979.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1679, cr_loss=0.3859, over 4098330.50 frames. ], batch size: 52, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:37:02,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-15 12:37:05,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=287079.0, ans=0.125 2024-09-15 12:37:07,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=287079.0, ans=0.125 2024-09-15 12:37:38,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=287135.6666666667, ans=12.0 2024-09-15 12:37:41,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=287164.0, ans=0.05 2024-09-15 12:37:42,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=287164.0, ans=0.025 2024-09-15 12:37:57,210 INFO [train.py:1198] (1/2) Epoch 16, batch 5500, loss[loss=0.1972, ctc_loss=0.1287, cr_loss=0.3424, over 20348.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1671, cr_loss=0.3842, over 4091326.34 frames. ], batch size: 45, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:37:58,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=287192.3333333333, ans=0.2 2024-09-15 12:38:06,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=287192.3333333333, ans=0.09899494936611666 2024-09-15 12:38:26,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=287249.0, ans=0.0 2024-09-15 12:38:47,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-15 12:38:50,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=287277.3333333333, ans=0.0 2024-09-15 12:38:55,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.109e+02 2.293e+02 2.538e+02 4.168e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-15 12:39:12,243 INFO [train.py:1198] (1/2) Epoch 16, batch 5550, loss[loss=0.2749, ctc_loss=0.1903, cr_loss=0.423, over 20661.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1682, cr_loss=0.3873, over 4099091.16 frames. ], batch size: 66, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:39:36,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=287362.3333333333, ans=0.0 2024-09-15 12:39:36,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-15 12:39:41,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=287390.6666666667, ans=0.0 2024-09-15 12:40:03,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=287419.0, ans=0.025 2024-09-15 12:40:28,472 INFO [train.py:1198] (1/2) Epoch 16, batch 5600, loss[loss=0.2505, ctc_loss=0.1708, cr_loss=0.3981, over 20829.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1694, cr_loss=0.3888, over 4091347.25 frames. ], batch size: 59, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:40:55,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=287504.0, ans=0.05 2024-09-15 12:41:01,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=287532.3333333333, ans=0.0 2024-09-15 12:41:17,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=287560.6666666667, ans=0.0 2024-09-15 12:41:21,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=287560.6666666667, ans=0.0 2024-09-15 12:41:25,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.132e+02 2.266e+02 2.445e+02 3.727e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-15 12:41:42,381 INFO [train.py:1198] (1/2) Epoch 16, batch 5650, loss[loss=0.2713, ctc_loss=0.1878, cr_loss=0.4174, over 20064.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.169, cr_loss=0.3884, over 4092667.40 frames. ], batch size: 80, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:41:42,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287617.3333333333, ans=0.1 2024-09-15 12:42:32,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=287702.3333333333, ans=0.2 2024-09-15 12:42:56,522 INFO [train.py:1198] (1/2) Epoch 16, batch 5700, loss[loss=0.2736, ctc_loss=0.1862, cr_loss=0.437, over 21024.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1685, cr_loss=0.3882, over 4094854.59 frames. ], batch size: 63, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:43:54,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.037e+02 2.209e+02 2.367e+02 3.816e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-15 12:44:10,834 INFO [train.py:1198] (1/2) Epoch 16, batch 5750, loss[loss=0.2174, ctc_loss=0.1482, cr_loss=0.3459, over 21042.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1692, cr_loss=0.3896, over 4092950.30 frames. ], batch size: 62, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:44:42,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=287957.3333333333, ans=0.0 2024-09-15 12:45:28,133 INFO [train.py:1198] (1/2) Epoch 16, batch 5800, loss[loss=0.2896, ctc_loss=0.205, cr_loss=0.4231, over 14007.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.169, cr_loss=0.3897, over 4096004.39 frames. ], batch size: 149, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:45:43,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288070.6666666667, ans=0.1 2024-09-15 12:46:28,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.698e+02 2.017e+02 2.163e+02 2.358e+02 4.756e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-15 12:46:43,096 INFO [train.py:1198] (1/2) Epoch 16, batch 5850, loss[loss=0.2086, ctc_loss=0.1401, cr_loss=0.3425, over 21054.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.169, cr_loss=0.389, over 4093757.21 frames. ], batch size: 53, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:46:47,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=288184.0, ans=0.0 2024-09-15 12:46:48,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2024-09-15 12:46:52,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=288184.0, ans=0.2 2024-09-15 12:47:02,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=288212.3333333333, ans=0.125 2024-09-15 12:47:04,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=288212.3333333333, ans=0.125 2024-09-15 12:47:11,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288240.6666666667, ans=0.125 2024-09-15 12:47:11,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=288240.6666666667, ans=0.07 2024-09-15 12:47:20,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=288240.6666666667, ans=0.0 2024-09-15 12:47:29,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=288269.0, ans=0.125 2024-09-15 12:47:43,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=288297.3333333333, ans=0.0 2024-09-15 12:47:44,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=288297.3333333333, ans=0.0 2024-09-15 12:47:48,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=288297.3333333333, ans=0.0 2024-09-15 12:47:53,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=288297.3333333333, ans=0.0 2024-09-15 12:47:57,536 INFO [train.py:1198] (1/2) Epoch 16, batch 5900, loss[loss=0.2741, ctc_loss=0.1912, cr_loss=0.4144, over 20103.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.169, cr_loss=0.3886, over 4084557.08 frames. ], batch size: 80, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:48:31,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288382.3333333333, ans=0.125 2024-09-15 12:48:59,166 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.052e+02 2.188e+02 2.422e+02 3.889e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-15 12:49:14,160 INFO [train.py:1198] (1/2) Epoch 16, batch 5950, loss[loss=0.2925, ctc_loss=0.2145, cr_loss=0.3897, over 14369.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.169, cr_loss=0.3882, over 4084467.83 frames. ], batch size: 149, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:49:26,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2024-09-15 12:49:30,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=288495.6666666667, ans=0.125 2024-09-15 12:49:45,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=288524.0, ans=0.125 2024-09-15 12:49:58,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=288552.3333333333, ans=0.125 2024-09-15 12:50:00,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-09-15 12:50:02,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=288552.3333333333, ans=0.125 2024-09-15 12:50:07,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=288552.3333333333, ans=0.0 2024-09-15 12:50:10,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-15 12:50:13,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.11 vs. limit=6.0 2024-09-15 12:50:28,263 INFO [train.py:1198] (1/2) Epoch 16, batch 6000, loss[loss=0.2434, ctc_loss=0.1671, cr_loss=0.3814, over 20980.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1685, cr_loss=0.3879, over 4102202.76 frames. ], batch size: 55, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:50:28,264 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 12:50:52,617 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.04619, ctc_loss=0.04619, cr_loss=1.014e-14, over 944034.00 frames. 2024-09-15 12:50:52,618 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 12:50:56,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=22.5 2024-09-15 12:51:13,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=288637.3333333333, ans=0.2 2024-09-15 12:51:51,902 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.049e+02 2.226e+02 2.408e+02 3.127e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 12:52:06,477 INFO [train.py:1198] (1/2) Epoch 16, batch 6050, loss[loss=0.2967, ctc_loss=0.2098, cr_loss=0.4343, over 14773.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1683, cr_loss=0.3877, over 4104062.30 frames. ], batch size: 150, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:52:15,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=288750.6666666667, ans=0.125 2024-09-15 12:52:28,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2024-09-15 12:52:40,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=288807.3333333333, ans=0.125 2024-09-15 12:53:09,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=288864.0, ans=0.125 2024-09-15 12:53:22,811 INFO [train.py:1198] (1/2) Epoch 16, batch 6100, loss[loss=0.2164, ctc_loss=0.1471, cr_loss=0.3462, over 20978.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1684, cr_loss=0.3873, over 4106822.96 frames. ], batch size: 51, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:53:24,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288892.3333333333, ans=0.125 2024-09-15 12:53:35,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=288892.3333333333, ans=0.125 2024-09-15 12:53:38,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-15 12:53:51,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288949.0, ans=0.1 2024-09-15 12:54:18,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288977.3333333333, ans=0.1 2024-09-15 12:54:19,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=288977.3333333333, ans=0.125 2024-09-15 12:54:22,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.023e+02 2.194e+02 2.376e+02 3.598e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 12:54:32,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=289005.6666666667, ans=0.0 2024-09-15 12:54:37,065 INFO [train.py:1198] (1/2) Epoch 16, batch 6150, loss[loss=0.2273, ctc_loss=0.1555, cr_loss=0.3589, over 20785.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3867, over 4082450.84 frames. ], batch size: 53, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:55:16,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=289090.6666666667, ans=0.0 2024-09-15 12:55:25,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289119.0, ans=0.125 2024-09-15 12:55:30,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-15 12:55:41,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=289147.3333333333, ans=0.025 2024-09-15 12:55:50,304 INFO [train.py:1198] (1/2) Epoch 16, batch 6200, loss[loss=0.23, ctc_loss=0.1544, cr_loss=0.3778, over 20933.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1697, cr_loss=0.3872, over 4065247.57 frames. ], batch size: 48, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:55:59,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=289175.6666666667, ans=0.2 2024-09-15 12:56:02,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-15 12:56:27,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=289232.3333333333, ans=0.125 2024-09-15 12:56:38,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=289260.6666666667, ans=0.05 2024-09-15 12:56:47,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=289260.6666666667, ans=0.125 2024-09-15 12:56:50,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.012e+02 2.192e+02 2.343e+02 9.139e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 12:56:53,544 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:56:53,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-09-15 12:57:04,915 INFO [train.py:1198] (1/2) Epoch 16, batch 6250, loss[loss=0.2291, ctc_loss=0.1548, cr_loss=0.3717, over 20985.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1708, cr_loss=0.3875, over 4037441.60 frames. ], batch size: 48, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:57:08,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=289317.3333333333, ans=0.125 2024-09-15 12:57:08,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=289317.3333333333, ans=0.2 2024-09-15 12:57:55,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=289402.3333333333, ans=0.07 2024-09-15 12:58:18,562 INFO [train.py:1198] (1/2) Epoch 16, batch 6300, loss[loss=0.2634, ctc_loss=0.1802, cr_loss=0.4163, over 21034.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1712, cr_loss=0.3871, over 4002400.71 frames. ], batch size: 62, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:58:37,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=289487.3333333333, ans=0.125 2024-09-15 12:59:16,045 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.225e+02 2.457e+02 2.678e+02 4.874e+02, threshold=4.914e+02, percent-clipped=1.0 2024-09-15 12:59:23,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-15 12:59:30,118 INFO [train.py:1198] (1/2) Epoch 16, batch 6350, loss[loss=0.3006, ctc_loss=0.2164, cr_loss=0.4209, over 14353.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1779, cr_loss=0.3908, over 3789420.07 frames. ], batch size: 150, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:59:49,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=289629.0, ans=10.0 2024-09-15 12:59:59,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=289657.3333333333, ans=0.0 2024-09-15 13:00:06,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=289657.3333333333, ans=0.125 2024-09-15 13:00:10,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=289657.3333333333, ans=0.125 2024-09-15 13:00:13,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=289685.6666666667, ans=0.0 2024-09-15 13:00:23,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289685.6666666667, ans=0.1 2024-09-15 13:01:18,115 INFO [train.py:1198] (1/2) Epoch 17, batch 0, loss[loss=0.2686, ctc_loss=0.1872, cr_loss=0.407, over 21065.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1872, cr_loss=0.407, over 21065.00 frames. ], batch size: 59, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:01:18,116 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 13:01:33,169 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3865, 2.4469, 3.0629, 2.3605], device='cuda:1') 2024-09-15 13:01:36,327 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.04638, ctc_loss=0.04638, cr_loss=9.883e-15, over 944034.00 frames. 2024-09-15 13:01:36,328 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 13:02:05,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=289773.5, ans=0.0 2024-09-15 13:02:07,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=289773.5, ans=0.0 2024-09-15 13:02:33,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289801.8333333333, ans=0.125 2024-09-15 13:02:53,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.178e+02 2.559e+02 2.883e+02 3.390e+02, threshold=5.119e+02, percent-clipped=0.0 2024-09-15 13:02:54,846 INFO [train.py:1198] (1/2) Epoch 17, batch 50, loss[loss=0.2467, ctc_loss=0.1675, cr_loss=0.396, over 20970.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1695, cr_loss=0.3857, over 916625.49 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:02:58,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=289858.5, ans=0.2 2024-09-15 13:03:19,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2024-09-15 13:03:20,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=289886.8333333333, ans=0.025 2024-09-15 13:03:28,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=289915.1666666667, ans=15.0 2024-09-15 13:03:48,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-15 13:03:49,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=289943.5, ans=0.0 2024-09-15 13:04:01,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=289971.8333333333, ans=0.0 2024-09-15 13:04:09,980 INFO [train.py:1198] (1/2) Epoch 17, batch 100, loss[loss=0.2412, ctc_loss=0.1661, cr_loss=0.3754, over 21013.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1694, cr_loss=0.3859, over 1612373.82 frames. ], batch size: 61, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:04:16,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-15 13:04:26,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290028.5, ans=0.1 2024-09-15 13:05:27,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.028e+02 2.174e+02 2.312e+02 4.055e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 13:05:28,940 INFO [train.py:1198] (1/2) Epoch 17, batch 150, loss[loss=0.2617, ctc_loss=0.1734, cr_loss=0.4417, over 20669.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1692, cr_loss=0.3873, over 2166549.63 frames. ], batch size: 66, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:05:39,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=290141.8333333333, ans=0.0 2024-09-15 13:06:25,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-15 13:06:34,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=290255.1666666667, ans=0.025 2024-09-15 13:06:43,741 INFO [train.py:1198] (1/2) Epoch 17, batch 200, loss[loss=0.2488, ctc_loss=0.1728, cr_loss=0.3801, over 21071.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1677, cr_loss=0.3846, over 2594627.87 frames. ], batch size: 62, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:06:48,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=290283.5, ans=0.125 2024-09-15 13:06:51,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=290283.5, ans=0.0 2024-09-15 13:07:00,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=290311.8333333333, ans=0.2 2024-09-15 13:07:01,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=290311.8333333333, ans=0.0 2024-09-15 13:07:04,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290311.8333333333, ans=0.1 2024-09-15 13:07:12,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-15 13:07:13,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=290340.1666666667, ans=0.125 2024-09-15 13:07:19,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=290340.1666666667, ans=0.2 2024-09-15 13:07:34,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=290368.5, ans=0.125 2024-09-15 13:07:35,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-15 13:07:56,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.025e+02 2.195e+02 2.365e+02 4.015e+02, threshold=4.390e+02, percent-clipped=0.0 2024-09-15 13:07:57,831 INFO [train.py:1198] (1/2) Epoch 17, batch 250, loss[loss=0.2515, ctc_loss=0.1702, cr_loss=0.4068, over 20982.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1675, cr_loss=0.3845, over 2918578.92 frames. ], batch size: 63, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:08:05,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=290425.1666666667, ans=0.0 2024-09-15 13:09:04,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=290538.5, ans=0.02 2024-09-15 13:09:16,235 INFO [train.py:1198] (1/2) Epoch 17, batch 300, loss[loss=0.2435, ctc_loss=0.1673, cr_loss=0.3812, over 21015.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1671, cr_loss=0.3846, over 3184059.32 frames. ], batch size: 63, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:09:47,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=290623.5, ans=0.025 2024-09-15 13:10:32,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.028e+02 2.166e+02 2.390e+02 6.262e+02, threshold=4.332e+02, percent-clipped=1.0 2024-09-15 13:10:34,259 INFO [train.py:1198] (1/2) Epoch 17, batch 350, loss[loss=0.244, ctc_loss=0.1679, cr_loss=0.3806, over 20978.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.168, cr_loss=0.386, over 3359968.35 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:10:57,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=22.5 2024-09-15 13:11:00,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=290736.8333333333, ans=0.0 2024-09-15 13:11:44,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=290821.8333333333, ans=0.07 2024-09-15 13:11:44,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=290821.8333333333, ans=0.0 2024-09-15 13:11:49,040 INFO [train.py:1198] (1/2) Epoch 17, batch 400, loss[loss=0.2125, ctc_loss=0.1396, cr_loss=0.3645, over 20960.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1692, cr_loss=0.3872, over 3514728.74 frames. ], batch size: 48, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:11:55,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=290850.1666666667, ans=0.015 2024-09-15 13:12:34,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-09-15 13:12:43,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=290935.1666666667, ans=0.125 2024-09-15 13:13:03,054 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.081e+02 2.190e+02 2.341e+02 4.195e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-15 13:13:04,605 INFO [train.py:1198] (1/2) Epoch 17, batch 450, loss[loss=0.2386, ctc_loss=0.1639, cr_loss=0.3734, over 19359.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3873, over 3653952.86 frames. ], batch size: 90, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:13:25,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=291020.1666666667, ans=0.125 2024-09-15 13:13:37,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-15 13:14:22,412 INFO [train.py:1198] (1/2) Epoch 17, batch 500, loss[loss=0.2194, ctc_loss=0.148, cr_loss=0.3571, over 20883.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1685, cr_loss=0.3874, over 3763720.16 frames. ], batch size: 54, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:14:33,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=291133.5, ans=0.025 2024-09-15 13:14:43,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=291161.8333333333, ans=0.0 2024-09-15 13:14:51,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=291190.1666666667, ans=0.125 2024-09-15 13:14:59,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-15 13:15:17,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291218.5, ans=0.125 2024-09-15 13:15:37,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.671e+02 2.007e+02 2.130e+02 2.264e+02 2.791e+02, threshold=4.259e+02, percent-clipped=0.0 2024-09-15 13:15:38,649 INFO [train.py:1198] (1/2) Epoch 17, batch 550, loss[loss=0.211, ctc_loss=0.1414, cr_loss=0.3479, over 20960.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1677, cr_loss=0.3872, over 3830804.69 frames. ], batch size: 50, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:15:39,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=291275.1666666667, ans=0.2 2024-09-15 13:15:46,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=291275.1666666667, ans=0.2 2024-09-15 13:16:07,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=291303.5, ans=0.125 2024-09-15 13:16:15,025 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:16:27,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-15 13:16:56,483 INFO [train.py:1198] (1/2) Epoch 17, batch 600, loss[loss=0.2593, ctc_loss=0.1793, cr_loss=0.4, over 20881.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1686, cr_loss=0.3877, over 3871636.67 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:17:04,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2024-09-15 13:17:17,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=291445.1666666667, ans=0.2 2024-09-15 13:17:19,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=291445.1666666667, ans=0.0 2024-09-15 13:17:38,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=291473.5, ans=0.125 2024-09-15 13:17:59,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-09-15 13:18:09,615 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.050e+02 2.229e+02 2.481e+02 3.764e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 13:18:11,139 INFO [train.py:1198] (1/2) Epoch 17, batch 650, loss[loss=0.2759, ctc_loss=0.1949, cr_loss=0.4051, over 20828.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.169, cr_loss=0.3877, over 3900950.50 frames. ], batch size: 59, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:18:11,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=291558.5, ans=0.125 2024-09-15 13:18:35,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291586.8333333333, ans=0.1 2024-09-15 13:19:05,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291643.5, ans=0.125 2024-09-15 13:19:24,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291700.1666666667, ans=0.125 2024-09-15 13:19:26,133 INFO [train.py:1198] (1/2) Epoch 17, batch 700, loss[loss=0.2624, ctc_loss=0.1819, cr_loss=0.4026, over 20316.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.168, cr_loss=0.3868, over 3948431.20 frames. ], batch size: 74, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:19:40,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-15 13:20:02,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-09-15 13:20:06,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=291756.8333333333, ans=0.125 2024-09-15 13:20:42,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.063e+02 2.189e+02 2.376e+02 3.729e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 13:20:43,871 INFO [train.py:1198] (1/2) Epoch 17, batch 750, loss[loss=0.2496, ctc_loss=0.1684, cr_loss=0.4058, over 21030.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1671, cr_loss=0.3853, over 3981958.65 frames. ], batch size: 63, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:20:50,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=291841.8333333333, ans=0.125 2024-09-15 13:20:56,269 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:21:20,175 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:21:23,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=291898.5, ans=0.0 2024-09-15 13:21:30,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=291926.8333333333, ans=0.025 2024-09-15 13:22:02,438 INFO [train.py:1198] (1/2) Epoch 17, batch 800, loss[loss=0.2303, ctc_loss=0.1559, cr_loss=0.3722, over 20810.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1667, cr_loss=0.3848, over 4007344.98 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:23:16,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.069e+02 2.163e+02 2.357e+02 5.044e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-15 13:23:17,491 INFO [train.py:1198] (1/2) Epoch 17, batch 850, loss[loss=0.2753, ctc_loss=0.1959, cr_loss=0.3974, over 18096.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1676, cr_loss=0.3861, over 4017099.15 frames. ], batch size: 108, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:24:00,468 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:24:12,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.44 vs. limit=10.0 2024-09-15 13:24:13,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=292210.1666666667, ans=0.015 2024-09-15 13:24:31,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=292266.8333333333, ans=0.025 2024-09-15 13:24:32,836 INFO [train.py:1198] (1/2) Epoch 17, batch 900, loss[loss=0.2644, ctc_loss=0.1821, cr_loss=0.4116, over 21090.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1675, cr_loss=0.3862, over 4034382.76 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:24:41,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-09-15 13:24:45,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=292266.8333333333, ans=0.0 2024-09-15 13:24:52,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2024-09-15 13:24:58,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=292295.1666666667, ans=0.125 2024-09-15 13:25:06,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-15 13:25:14,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=292323.5, ans=0.07 2024-09-15 13:25:45,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=292380.1666666667, ans=0.2 2024-09-15 13:25:50,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.058e+02 2.193e+02 2.365e+02 3.432e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 13:25:51,824 INFO [train.py:1198] (1/2) Epoch 17, batch 950, loss[loss=0.2768, ctc_loss=0.1945, cr_loss=0.4114, over 18134.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.168, cr_loss=0.3872, over 4051259.07 frames. ], batch size: 108, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:26:11,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=292436.8333333333, ans=0.95 2024-09-15 13:26:26,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=292465.1666666667, ans=0.0 2024-09-15 13:26:58,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292521.8333333333, ans=0.125 2024-09-15 13:27:06,567 INFO [train.py:1198] (1/2) Epoch 17, batch 1000, loss[loss=0.2113, ctc_loss=0.1411, cr_loss=0.351, over 20399.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1682, cr_loss=0.3879, over 4065815.80 frames. ], batch size: 45, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:27:56,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=292635.1666666667, ans=0.2 2024-09-15 13:28:02,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=292635.1666666667, ans=0.2 2024-09-15 13:28:04,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=292635.1666666667, ans=0.125 2024-09-15 13:28:23,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.095e+02 2.212e+02 2.441e+02 3.460e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 13:28:24,683 INFO [train.py:1198] (1/2) Epoch 17, batch 1050, loss[loss=0.2478, ctc_loss=0.1704, cr_loss=0.3872, over 20967.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1673, cr_loss=0.3867, over 4083917.97 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:28:26,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=292691.8333333333, ans=0.125 2024-09-15 13:28:30,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=292691.8333333333, ans=0.125 2024-09-15 13:28:34,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=292691.8333333333, ans=0.125 2024-09-15 13:28:34,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=292691.8333333333, ans=0.125 2024-09-15 13:28:36,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292691.8333333333, ans=0.125 2024-09-15 13:29:09,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=292776.8333333333, ans=0.09899494936611666 2024-09-15 13:29:21,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292776.8333333333, ans=0.125 2024-09-15 13:29:40,503 INFO [train.py:1198] (1/2) Epoch 17, batch 1100, loss[loss=0.2542, ctc_loss=0.1729, cr_loss=0.4068, over 20938.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1681, cr_loss=0.3874, over 4084408.93 frames. ], batch size: 60, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:30:01,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=292861.8333333333, ans=0.2 2024-09-15 13:30:07,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 13:30:12,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-09-15 13:30:19,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=292890.1666666667, ans=0.125 2024-09-15 13:30:34,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292918.5, ans=0.125 2024-09-15 13:30:36,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=292918.5, ans=0.025 2024-09-15 13:30:46,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=292946.8333333333, ans=0.125 2024-09-15 13:30:53,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.097e+02 2.212e+02 2.363e+02 3.389e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 13:30:55,144 INFO [train.py:1198] (1/2) Epoch 17, batch 1150, loss[loss=0.2296, ctc_loss=0.1583, cr_loss=0.3563, over 20999.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1676, cr_loss=0.3863, over 4085804.64 frames. ], batch size: 63, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:30:55,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=292975.1666666667, ans=0.125 2024-09-15 13:31:16,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=293003.5, ans=0.125 2024-09-15 13:31:28,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293031.8333333333, ans=0.1 2024-09-15 13:31:29,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=293031.8333333333, ans=0.025 2024-09-15 13:31:44,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=293060.1666666667, ans=0.125 2024-09-15 13:32:13,106 INFO [train.py:1198] (1/2) Epoch 17, batch 1200, loss[loss=0.2609, ctc_loss=0.1815, cr_loss=0.3968, over 20607.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1662, cr_loss=0.3845, over 4104854.80 frames. ], batch size: 75, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:32:24,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=293116.8333333333, ans=0.0 2024-09-15 13:32:26,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-15 13:32:29,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-15 13:32:50,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293173.5, ans=0.1 2024-09-15 13:33:30,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.094e+02 2.264e+02 2.504e+02 3.447e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 13:33:32,197 INFO [train.py:1198] (1/2) Epoch 17, batch 1250, loss[loss=0.2447, ctc_loss=0.1668, cr_loss=0.3894, over 20990.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1666, cr_loss=0.3845, over 4102732.51 frames. ], batch size: 52, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:33:50,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-15 13:34:35,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=293371.8333333333, ans=0.07 2024-09-15 13:34:42,110 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:34:47,581 INFO [train.py:1198] (1/2) Epoch 17, batch 1300, loss[loss=0.2391, ctc_loss=0.1612, cr_loss=0.3892, over 21058.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1669, cr_loss=0.3852, over 4092184.98 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:35:01,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293428.5, ans=0.1 2024-09-15 13:35:16,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=293456.8333333333, ans=0.0 2024-09-15 13:36:01,436 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.095e+02 2.247e+02 2.366e+02 3.947e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-15 13:36:02,954 INFO [train.py:1198] (1/2) Epoch 17, batch 1350, loss[loss=0.2381, ctc_loss=0.1632, cr_loss=0.3748, over 20955.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1666, cr_loss=0.385, over 4098222.04 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:36:15,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293541.8333333333, ans=0.1 2024-09-15 13:36:21,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293570.1666666667, ans=0.1 2024-09-15 13:36:25,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=293570.1666666667, ans=0.0 2024-09-15 13:37:21,406 INFO [train.py:1198] (1/2) Epoch 17, batch 1400, loss[loss=0.214, ctc_loss=0.1454, cr_loss=0.3431, over 20974.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1662, cr_loss=0.3848, over 4095609.72 frames. ], batch size: 48, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:38:30,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=293796.8333333333, ans=0.025 2024-09-15 13:38:35,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.045e+02 2.188e+02 2.368e+02 3.462e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-15 13:38:36,660 INFO [train.py:1198] (1/2) Epoch 17, batch 1450, loss[loss=0.2612, ctc_loss=0.183, cr_loss=0.3913, over 21031.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.3867, over 4093834.75 frames. ], batch size: 63, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:38:41,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=293825.1666666667, ans=0.125 2024-09-15 13:38:58,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-15 13:39:13,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293881.8333333333, ans=0.0 2024-09-15 13:39:25,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-15 13:39:28,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=293910.1666666667, ans=0.125 2024-09-15 13:39:34,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=293910.1666666667, ans=0.0 2024-09-15 13:39:55,419 INFO [train.py:1198] (1/2) Epoch 17, batch 1500, loss[loss=0.2287, ctc_loss=0.153, cr_loss=0.3786, over 20967.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.167, cr_loss=0.387, over 4108905.68 frames. ], batch size: 49, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:40:09,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=293995.1666666667, ans=0.04949747468305833 2024-09-15 13:40:09,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=293995.1666666667, ans=0.2 2024-09-15 13:40:42,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=294051.8333333333, ans=0.125 2024-09-15 13:40:54,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294080.1666666667, ans=0.125 2024-09-15 13:41:03,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=294080.1666666667, ans=0.125 2024-09-15 13:41:09,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.010e+02 2.197e+02 2.365e+02 6.314e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-15 13:41:11,183 INFO [train.py:1198] (1/2) Epoch 17, batch 1550, loss[loss=0.2621, ctc_loss=0.1776, cr_loss=0.4227, over 20665.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1678, cr_loss=0.3883, over 4101004.16 frames. ], batch size: 71, lr: 4.96e-03, grad_scale: 64.0 2024-09-15 13:41:17,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=294108.5, ans=0.0 2024-09-15 13:41:17,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=294108.5, ans=0.125 2024-09-15 13:41:22,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294108.5, ans=0.125 2024-09-15 13:41:58,585 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:42:05,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=294193.5, ans=0.125 2024-09-15 13:42:19,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=294221.8333333333, ans=0.125 2024-09-15 13:42:20,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=294221.8333333333, ans=0.125 2024-09-15 13:42:29,229 INFO [train.py:1198] (1/2) Epoch 17, batch 1600, loss[loss=0.2523, ctc_loss=0.1729, cr_loss=0.3969, over 20976.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1689, cr_loss=0.3891, over 4094541.21 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:42:49,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=294278.5, ans=0.0 2024-09-15 13:42:50,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294278.5, ans=0.1 2024-09-15 13:43:04,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=294306.8333333333, ans=0.0 2024-09-15 13:43:12,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=294306.8333333333, ans=0.125 2024-09-15 13:43:12,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-15 13:43:14,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-15 13:43:30,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=294363.5, ans=0.125 2024-09-15 13:43:45,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.071e+02 2.192e+02 2.418e+02 4.463e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 13:43:45,124 INFO [train.py:1198] (1/2) Epoch 17, batch 1650, loss[loss=0.2402, ctc_loss=0.1619, cr_loss=0.3918, over 21016.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1683, cr_loss=0.3882, over 4104082.77 frames. ], batch size: 63, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:43:58,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=294420.1666666667, ans=0.125 2024-09-15 13:44:12,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=294420.1666666667, ans=0.125 2024-09-15 13:44:12,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-09-15 13:44:34,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=294476.8333333333, ans=0.0 2024-09-15 13:45:02,971 INFO [train.py:1198] (1/2) Epoch 17, batch 1700, loss[loss=0.257, ctc_loss=0.1754, cr_loss=0.4083, over 20678.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1689, cr_loss=0.389, over 4102707.42 frames. ], batch size: 71, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:45:03,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=294533.5, ans=0.0 2024-09-15 13:45:13,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2024-09-15 13:46:19,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.064e+02 2.202e+02 2.336e+02 7.186e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-15 13:46:19,491 INFO [train.py:1198] (1/2) Epoch 17, batch 1750, loss[loss=0.236, ctc_loss=0.1615, cr_loss=0.3725, over 20881.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1675, cr_loss=0.3871, over 4111697.78 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:46:44,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=294703.5, ans=0.125 2024-09-15 13:47:19,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=294788.5, ans=0.2 2024-09-15 13:47:33,894 INFO [train.py:1198] (1/2) Epoch 17, batch 1800, loss[loss=0.221, ctc_loss=0.148, cr_loss=0.365, over 21043.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1671, cr_loss=0.3871, over 4113369.33 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:47:59,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294845.1666666667, ans=0.125 2024-09-15 13:48:04,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2024-09-15 13:48:51,342 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.008e+02 2.119e+02 2.283e+02 3.531e+02, threshold=4.239e+02, percent-clipped=0.0 2024-09-15 13:48:51,361 INFO [train.py:1198] (1/2) Epoch 17, batch 1850, loss[loss=0.2605, ctc_loss=0.1768, cr_loss=0.4181, over 20726.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1659, cr_loss=0.3854, over 4119339.15 frames. ], batch size: 71, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:49:05,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=294986.8333333333, ans=0.125 2024-09-15 13:49:05,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=294986.8333333333, ans=0.125 2024-09-15 13:49:06,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294986.8333333333, ans=0.125 2024-09-15 13:49:22,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=295015.1666666667, ans=0.2 2024-09-15 13:49:35,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=295043.5, ans=0.125 2024-09-15 13:50:05,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=295100.1666666667, ans=0.125 2024-09-15 13:50:06,364 INFO [train.py:1198] (1/2) Epoch 17, batch 1900, loss[loss=0.2217, ctc_loss=0.1503, cr_loss=0.3567, over 20965.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1668, cr_loss=0.3871, over 4111492.99 frames. ], batch size: 48, lr: 4.96e-03, grad_scale: 16.0 2024-09-15 13:50:18,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295100.1666666667, ans=0.1 2024-09-15 13:50:51,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=295156.8333333333, ans=0.0 2024-09-15 13:51:03,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=295185.1666666667, ans=0.125 2024-09-15 13:51:24,533 INFO [train.py:1198] (1/2) Epoch 17, batch 1950, loss[loss=0.2271, ctc_loss=0.154, cr_loss=0.3652, over 20946.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1664, cr_loss=0.3862, over 4112538.27 frames. ], batch size: 49, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:51:26,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.021e+02 2.155e+02 2.379e+02 3.372e+02, threshold=4.309e+02, percent-clipped=0.0 2024-09-15 13:52:25,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=295355.1666666667, ans=0.2 2024-09-15 13:52:37,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-15 13:52:38,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=295383.5, ans=10.0 2024-09-15 13:52:39,746 INFO [train.py:1198] (1/2) Epoch 17, batch 2000, loss[loss=0.2526, ctc_loss=0.1748, cr_loss=0.3888, over 20670.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1664, cr_loss=0.3854, over 4115090.64 frames. ], batch size: 66, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:52:46,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295383.5, ans=0.125 2024-09-15 13:52:56,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=295411.8333333333, ans=0.0 2024-09-15 13:52:58,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=295411.8333333333, ans=0.125 2024-09-15 13:53:29,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-15 13:53:47,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=295496.8333333333, ans=0.125 2024-09-15 13:53:54,884 INFO [train.py:1198] (1/2) Epoch 17, batch 2050, loss[loss=0.2517, ctc_loss=0.1705, cr_loss=0.4059, over 20959.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1676, cr_loss=0.3861, over 4090736.87 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:54:00,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.099e+02 2.233e+02 2.487e+02 4.359e+02, threshold=4.466e+02, percent-clipped=2.0 2024-09-15 13:55:07,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=12.0 2024-09-15 13:55:08,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=295638.5, ans=0.09899494936611666 2024-09-15 13:55:12,296 INFO [train.py:1198] (1/2) Epoch 17, batch 2100, loss[loss=0.2163, ctc_loss=0.1452, cr_loss=0.3555, over 20939.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1665, cr_loss=0.3845, over 4092692.51 frames. ], batch size: 49, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:55:18,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=295666.8333333333, ans=0.125 2024-09-15 13:55:21,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=295666.8333333333, ans=0.07 2024-09-15 13:55:24,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=295666.8333333333, ans=0.125 2024-09-15 13:55:30,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=295695.1666666667, ans=0.2 2024-09-15 13:55:48,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=295723.5, ans=0.0 2024-09-15 13:56:03,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=295751.8333333333, ans=0.2 2024-09-15 13:56:23,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-15 13:56:24,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=295780.1666666667, ans=0.2 2024-09-15 13:56:27,482 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:56:27,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295780.1666666667, ans=0.1 2024-09-15 13:56:29,973 INFO [train.py:1198] (1/2) Epoch 17, batch 2150, loss[loss=0.2272, ctc_loss=0.1544, cr_loss=0.3641, over 20975.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1662, cr_loss=0.3842, over 4088780.21 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:56:30,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295808.5, ans=0.0 2024-09-15 13:56:32,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.015e+02 2.167e+02 2.319e+02 3.201e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 13:56:58,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=295865.1666666667, ans=0.95 2024-09-15 13:57:45,069 INFO [train.py:1198] (1/2) Epoch 17, batch 2200, loss[loss=0.2022, ctc_loss=0.1325, cr_loss=0.3483, over 20987.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1659, cr_loss=0.3836, over 4091553.01 frames. ], batch size: 49, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:58:03,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=295978.5, ans=0.05 2024-09-15 13:58:10,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295978.5, ans=0.1 2024-09-15 13:58:21,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=296006.8333333333, ans=0.2 2024-09-15 13:59:00,524 INFO [train.py:1198] (1/2) Epoch 17, batch 2250, loss[loss=0.2789, ctc_loss=0.1987, cr_loss=0.4007, over 18348.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1664, cr_loss=0.3842, over 4086490.93 frames. ], batch size: 108, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:59:03,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.051e+02 2.236e+02 2.524e+02 3.269e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-15 13:59:46,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=296176.8333333333, ans=0.125 2024-09-15 13:59:52,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-09-15 14:00:17,369 INFO [train.py:1198] (1/2) Epoch 17, batch 2300, loss[loss=0.2369, ctc_loss=0.1608, cr_loss=0.3806, over 20964.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1675, cr_loss=0.3858, over 4084292.31 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 14:00:26,938 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:01:19,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=296346.8333333333, ans=0.0 2024-09-15 14:01:32,215 INFO [train.py:1198] (1/2) Epoch 17, batch 2350, loss[loss=0.2253, ctc_loss=0.153, cr_loss=0.3616, over 21060.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1675, cr_loss=0.3859, over 4084219.90 frames. ], batch size: 53, lr: 4.94e-03, grad_scale: 16.0 2024-09-15 14:01:35,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.035e+02 2.207e+02 2.444e+02 3.534e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-15 14:02:21,521 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:02:24,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=296460.1666666667, ans=0.0 2024-09-15 14:02:43,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-15 14:02:47,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=296488.5, ans=0.125 2024-09-15 14:02:49,866 INFO [train.py:1198] (1/2) Epoch 17, batch 2400, loss[loss=0.2258, ctc_loss=0.1522, cr_loss=0.3678, over 20799.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1668, cr_loss=0.3853, over 4098507.93 frames. ], batch size: 53, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:02:59,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-15 14:03:19,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296573.5, ans=0.1 2024-09-15 14:03:44,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=296601.8333333333, ans=0.2 2024-09-15 14:04:02,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=296630.1666666667, ans=10.0 2024-09-15 14:04:04,799 INFO [train.py:1198] (1/2) Epoch 17, batch 2450, loss[loss=0.1813, ctc_loss=0.1179, cr_loss=0.3174, over 20971.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3856, over 4096191.81 frames. ], batch size: 49, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:04:06,718 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:04:07,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.073e+02 2.225e+02 2.492e+02 4.389e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 14:04:18,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=296686.8333333333, ans=0.02 2024-09-15 14:04:18,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296686.8333333333, ans=0.1 2024-09-15 14:04:52,513 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:05:23,774 INFO [train.py:1198] (1/2) Epoch 17, batch 2500, loss[loss=0.298, ctc_loss=0.2108, cr_loss=0.4364, over 18369.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1657, cr_loss=0.3842, over 4095293.03 frames. ], batch size: 108, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:06:38,416 INFO [train.py:1198] (1/2) Epoch 17, batch 2550, loss[loss=0.2358, ctc_loss=0.16, cr_loss=0.379, over 20830.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1663, cr_loss=0.3852, over 4080823.70 frames. ], batch size: 59, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:06:41,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.099e+02 2.244e+02 2.410e+02 3.985e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-15 14:06:41,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=296941.8333333333, ans=0.125 2024-09-15 14:06:41,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=296941.8333333333, ans=10.0 2024-09-15 14:07:22,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=297026.8333333333, ans=0.125 2024-09-15 14:07:36,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=297026.8333333333, ans=0.125 2024-09-15 14:07:53,576 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:07:56,078 INFO [train.py:1198] (1/2) Epoch 17, batch 2600, loss[loss=0.2525, ctc_loss=0.1696, cr_loss=0.4149, over 20690.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3843, over 4096522.72 frames. ], batch size: 68, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:08:18,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=297111.8333333333, ans=0.125 2024-09-15 14:08:48,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297168.5, ans=0.1 2024-09-15 14:09:06,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=297196.8333333333, ans=0.125 2024-09-15 14:09:10,695 INFO [train.py:1198] (1/2) Epoch 17, batch 2650, loss[loss=0.2407, ctc_loss=0.1599, cr_loss=0.4039, over 20858.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1659, cr_loss=0.3846, over 4093113.16 frames. ], batch size: 65, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:09:13,921 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.064e+02 2.236e+02 2.470e+02 3.144e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-15 14:09:27,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=297253.5, ans=0.0 2024-09-15 14:10:08,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=297310.1666666667, ans=0.125 2024-09-15 14:10:11,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297338.5, ans=0.1 2024-09-15 14:10:21,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-15 14:10:24,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297366.8333333333, ans=0.125 2024-09-15 14:10:25,552 INFO [train.py:1198] (1/2) Epoch 17, batch 2700, loss[loss=0.1779, ctc_loss=0.116, cr_loss=0.3093, over 20984.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1646, cr_loss=0.3829, over 4103609.55 frames. ], batch size: 50, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:11:24,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297451.8333333333, ans=0.125 2024-09-15 14:11:29,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=297480.1666666667, ans=0.0 2024-09-15 14:11:38,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297480.1666666667, ans=0.125 2024-09-15 14:11:39,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=297480.1666666667, ans=0.125 2024-09-15 14:11:44,311 INFO [train.py:1198] (1/2) Epoch 17, batch 2750, loss[loss=0.2345, ctc_loss=0.1597, cr_loss=0.3739, over 21034.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1643, cr_loss=0.3828, over 4115305.06 frames. ], batch size: 63, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:11:47,250 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.036e+02 2.144e+02 2.342e+02 3.425e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 14:11:58,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=297536.8333333333, ans=0.0 2024-09-15 14:12:10,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=297536.8333333333, ans=0.2 2024-09-15 14:12:20,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=297565.1666666667, ans=0.125 2024-09-15 14:12:23,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=297565.1666666667, ans=0.125 2024-09-15 14:12:32,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297593.5, ans=0.125 2024-09-15 14:12:59,312 INFO [train.py:1198] (1/2) Epoch 17, batch 2800, loss[loss=0.2125, ctc_loss=0.1398, cr_loss=0.3633, over 20955.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.165, cr_loss=0.3839, over 4114101.50 frames. ], batch size: 48, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:13:01,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=297650.1666666667, ans=0.125 2024-09-15 14:13:19,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=297678.5, ans=0.0 2024-09-15 14:14:02,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=297763.5, ans=0.125 2024-09-15 14:14:16,949 INFO [train.py:1198] (1/2) Epoch 17, batch 2850, loss[loss=0.236, ctc_loss=0.1648, cr_loss=0.3559, over 20973.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1659, cr_loss=0.3843, over 4099614.79 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:14:17,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=297791.8333333333, ans=0.2 2024-09-15 14:14:19,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.046e+02 2.142e+02 2.377e+02 3.016e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-15 14:14:41,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297820.1666666667, ans=0.1 2024-09-15 14:15:32,261 INFO [train.py:1198] (1/2) Epoch 17, batch 2900, loss[loss=0.2499, ctc_loss=0.1675, cr_loss=0.4124, over 20960.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1665, cr_loss=0.3853, over 4099162.07 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:15:34,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 14:15:58,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297961.8333333333, ans=0.125 2024-09-15 14:15:58,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-15 14:16:11,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-15 14:16:17,892 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:16:50,966 INFO [train.py:1198] (1/2) Epoch 17, batch 2950, loss[loss=0.2753, ctc_loss=0.1908, cr_loss=0.4225, over 20824.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1667, cr_loss=0.3854, over 4094407.58 frames. ], batch size: 65, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:16:53,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.050e+02 2.156e+02 2.312e+02 2.932e+02, threshold=4.311e+02, percent-clipped=0.0 2024-09-15 14:17:16,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=298103.5, ans=0.2 2024-09-15 14:17:30,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=298131.8333333333, ans=0.0 2024-09-15 14:17:43,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=298160.1666666667, ans=0.125 2024-09-15 14:17:51,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=298188.5, ans=0.125 2024-09-15 14:17:58,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=298188.5, ans=0.125 2024-09-15 14:18:05,984 INFO [train.py:1198] (1/2) Epoch 17, batch 3000, loss[loss=0.2493, ctc_loss=0.1689, cr_loss=0.4022, over 20832.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.168, cr_loss=0.3867, over 4091191.64 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:18:05,984 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 14:18:20,483 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3206, 6.0551, 5.8349, 5.4496], device='cuda:1') 2024-09-15 14:18:25,261 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.04582, ctc_loss=0.04582, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 14:18:25,262 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 14:18:47,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=298245.1666666667, ans=0.0 2024-09-15 14:19:16,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=298301.8333333333, ans=0.125 2024-09-15 14:19:41,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=298358.5, ans=0.125 2024-09-15 14:19:42,957 INFO [train.py:1198] (1/2) Epoch 17, batch 3050, loss[loss=0.2372, ctc_loss=0.1619, cr_loss=0.3763, over 20781.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1682, cr_loss=0.3872, over 4089909.99 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:19:45,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.109e+02 2.235e+02 2.405e+02 3.205e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-15 14:20:40,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=298443.5, ans=0.04949747468305833 2024-09-15 14:20:49,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298471.8333333333, ans=0.1 2024-09-15 14:20:58,663 INFO [train.py:1198] (1/2) Epoch 17, batch 3100, loss[loss=0.2807, ctc_loss=0.1947, cr_loss=0.4297, over 18198.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1682, cr_loss=0.3868, over 4089406.60 frames. ], batch size: 108, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:21:04,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=298500.1666666667, ans=0.0 2024-09-15 14:21:42,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-15 14:21:49,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=298585.1666666667, ans=0.5 2024-09-15 14:21:56,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=298585.1666666667, ans=0.09899494936611666 2024-09-15 14:22:17,351 INFO [train.py:1198] (1/2) Epoch 17, batch 3150, loss[loss=0.2266, ctc_loss=0.1539, cr_loss=0.3637, over 20784.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1679, cr_loss=0.3863, over 4097965.37 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:22:20,405 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.066e+02 2.183e+02 2.374e+02 4.735e+02, threshold=4.367e+02, percent-clipped=2.0 2024-09-15 14:22:31,655 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:22:50,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298698.5, ans=0.1 2024-09-15 14:23:00,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=22.5 2024-09-15 14:23:16,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298755.1666666667, ans=0.125 2024-09-15 14:23:32,715 INFO [train.py:1198] (1/2) Epoch 17, batch 3200, loss[loss=0.2126, ctc_loss=0.1424, cr_loss=0.3512, over 20990.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1674, cr_loss=0.3864, over 4104578.50 frames. ], batch size: 51, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:24:21,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=298868.5, ans=0.02 2024-09-15 14:24:28,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298868.5, ans=0.1 2024-09-15 14:24:30,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298868.5, ans=0.125 2024-09-15 14:24:50,509 INFO [train.py:1198] (1/2) Epoch 17, batch 3250, loss[loss=0.2751, ctc_loss=0.1914, cr_loss=0.4189, over 20834.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.168, cr_loss=0.3869, over 4094353.77 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:24:54,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.089e+02 2.223e+02 2.425e+02 3.392e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-15 14:24:55,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=298925.1666666667, ans=0.125 2024-09-15 14:25:52,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2024-09-15 14:26:05,594 INFO [train.py:1198] (1/2) Epoch 17, batch 3300, loss[loss=0.2788, ctc_loss=0.1932, cr_loss=0.4281, over 20953.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1677, cr_loss=0.3872, over 4098531.13 frames. ], batch size: 64, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:27:11,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299180.1666666667, ans=0.125 2024-09-15 14:27:20,655 INFO [train.py:1198] (1/2) Epoch 17, batch 3350, loss[loss=0.2668, ctc_loss=0.1818, cr_loss=0.4251, over 20634.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1681, cr_loss=0.3872, over 4076312.27 frames. ], batch size: 66, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:27:25,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.045e+02 2.190e+02 2.326e+02 4.997e+02, threshold=4.380e+02, percent-clipped=1.0 2024-09-15 14:28:00,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299265.1666666667, ans=0.125 2024-09-15 14:28:22,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2024-09-15 14:28:39,382 INFO [train.py:1198] (1/2) Epoch 17, batch 3400, loss[loss=0.2613, ctc_loss=0.1818, cr_loss=0.3978, over 20965.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.168, cr_loss=0.3863, over 4079769.77 frames. ], batch size: 64, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:28:47,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=299350.1666666667, ans=0.2 2024-09-15 14:29:44,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=299463.5, ans=0.125 2024-09-15 14:29:50,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=299463.5, ans=0.125 2024-09-15 14:29:54,513 INFO [train.py:1198] (1/2) Epoch 17, batch 3450, loss[loss=0.212, ctc_loss=0.1431, cr_loss=0.344, over 21011.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1666, cr_loss=0.3842, over 4084101.34 frames. ], batch size: 51, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:29:57,904 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 14:29:58,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.049e+02 2.191e+02 2.372e+02 3.489e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 14:30:43,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299576.8333333333, ans=0.1 2024-09-15 14:31:12,339 INFO [train.py:1198] (1/2) Epoch 17, batch 3500, loss[loss=0.2577, ctc_loss=0.1768, cr_loss=0.4048, over 21008.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1667, cr_loss=0.3849, over 4082374.14 frames. ], batch size: 67, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:31:26,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=299661.8333333333, ans=0.0 2024-09-15 14:31:47,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=299690.1666666667, ans=0.0 2024-09-15 14:31:53,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=299690.1666666667, ans=0.125 2024-09-15 14:32:03,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2024-09-15 14:32:29,025 INFO [train.py:1198] (1/2) Epoch 17, batch 3550, loss[loss=0.2372, ctc_loss=0.1605, cr_loss=0.3833, over 20827.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.167, cr_loss=0.3856, over 4087888.27 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:32:33,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 2.077e+02 2.267e+02 2.390e+02 4.371e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-15 14:32:35,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=299775.1666666667, ans=0.025 2024-09-15 14:32:41,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299775.1666666667, ans=0.1 2024-09-15 14:32:42,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=299803.5, ans=0.0 2024-09-15 14:32:53,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=299803.5, ans=0.125 2024-09-15 14:33:35,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=299888.5, ans=0.0 2024-09-15 14:33:35,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=299888.5, ans=0.0 2024-09-15 14:33:36,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=299888.5, ans=0.025 2024-09-15 14:33:45,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=299916.8333333333, ans=0.2 2024-09-15 14:33:46,982 INFO [train.py:1198] (1/2) Epoch 17, batch 3600, loss[loss=0.2102, ctc_loss=0.1423, cr_loss=0.3394, over 20991.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1668, cr_loss=0.3854, over 4100806.38 frames. ], batch size: 52, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:34:16,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-09-15 14:34:36,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-15 14:35:02,595 INFO [train.py:1198] (1/2) Epoch 17, batch 3650, loss[loss=0.2259, ctc_loss=0.1524, cr_loss=0.3675, over 20871.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1664, cr_loss=0.3848, over 4108453.75 frames. ], batch size: 54, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:35:06,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.087e+02 2.181e+02 2.335e+02 3.955e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 14:35:13,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-15 14:35:43,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-09-15 14:36:07,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-15 14:36:20,299 INFO [train.py:1198] (1/2) Epoch 17, batch 3700, loss[loss=0.261, ctc_loss=0.1795, cr_loss=0.4076, over 20959.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.167, cr_loss=0.3853, over 4111608.11 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:36:20,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=300200.1666666667, ans=0.125 2024-09-15 14:36:59,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300256.8333333333, ans=0.0 2024-09-15 14:37:08,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=300285.1666666667, ans=0.125 2024-09-15 14:37:28,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=300313.5, ans=0.125 2024-09-15 14:37:35,945 INFO [train.py:1198] (1/2) Epoch 17, batch 3750, loss[loss=0.264, ctc_loss=0.1816, cr_loss=0.4118, over 20873.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1665, cr_loss=0.3848, over 4107807.05 frames. ], batch size: 65, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:37:40,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.107e+02 2.281e+02 2.521e+02 4.388e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-15 14:37:48,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=300341.8333333333, ans=0.025 2024-09-15 14:38:18,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300398.5, ans=0.1 2024-09-15 14:38:51,439 INFO [train.py:1198] (1/2) Epoch 17, batch 3800, loss[loss=0.2653, ctc_loss=0.1822, cr_loss=0.4154, over 20173.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1663, cr_loss=0.3837, over 4105013.53 frames. ], batch size: 80, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:39:14,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=300511.8333333333, ans=0.0 2024-09-15 14:39:27,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=300540.1666666667, ans=0.0 2024-09-15 14:39:28,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=300540.1666666667, ans=0.07 2024-09-15 14:39:39,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=300568.5, ans=0.125 2024-09-15 14:39:40,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=300568.5, ans=0.0 2024-09-15 14:40:08,987 INFO [train.py:1198] (1/2) Epoch 17, batch 3850, loss[loss=0.2255, ctc_loss=0.1538, cr_loss=0.3587, over 21009.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.167, cr_loss=0.3853, over 4104865.30 frames. ], batch size: 61, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:40:14,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.070e+02 2.233e+02 2.370e+02 4.826e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-15 14:40:45,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=300681.8333333333, ans=0.0 2024-09-15 14:40:59,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-15 14:41:06,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=10.0 2024-09-15 14:41:23,837 INFO [train.py:1198] (1/2) Epoch 17, batch 3900, loss[loss=0.2768, ctc_loss=0.1929, cr_loss=0.4192, over 20684.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1678, cr_loss=0.3865, over 4102096.86 frames. ], batch size: 68, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:42:20,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-15 14:42:42,035 INFO [train.py:1198] (1/2) Epoch 17, batch 3950, loss[loss=0.2658, ctc_loss=0.1824, cr_loss=0.4168, over 20832.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1674, cr_loss=0.3853, over 4090172.56 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:42:48,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.106e+02 2.234e+02 2.468e+02 5.301e+02, threshold=4.468e+02, percent-clipped=1.0 2024-09-15 14:43:24,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=300965.1666666667, ans=0.125 2024-09-15 14:43:29,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300993.5, ans=0.1 2024-09-15 14:43:35,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=300993.5, ans=0.5 2024-09-15 14:43:39,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=300993.5, ans=0.0 2024-09-15 14:43:57,770 INFO [train.py:1198] (1/2) Epoch 17, batch 4000, loss[loss=0.2661, ctc_loss=0.1826, cr_loss=0.4174, over 21028.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1673, cr_loss=0.3854, over 4076529.61 frames. ], batch size: 61, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:44:48,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=301135.1666666667, ans=0.0 2024-09-15 14:45:15,374 INFO [train.py:1198] (1/2) Epoch 17, batch 4050, loss[loss=0.2549, ctc_loss=0.1747, cr_loss=0.401, over 21030.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1673, cr_loss=0.3854, over 4069092.46 frames. ], batch size: 62, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:45:17,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301191.8333333333, ans=0.1 2024-09-15 14:45:20,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=301191.8333333333, ans=15.0 2024-09-15 14:45:21,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.106e+02 2.228e+02 2.386e+02 4.544e+02, threshold=4.456e+02, percent-clipped=1.0 2024-09-15 14:45:37,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=301220.1666666667, ans=0.125 2024-09-15 14:45:48,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=301248.5, ans=0.125 2024-09-15 14:46:09,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301276.8333333333, ans=0.125 2024-09-15 14:46:14,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=301305.1666666667, ans=15.0 2024-09-15 14:46:20,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=301305.1666666667, ans=0.0 2024-09-15 14:46:30,136 INFO [train.py:1198] (1/2) Epoch 17, batch 4100, loss[loss=0.2595, ctc_loss=0.1762, cr_loss=0.4165, over 19333.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.167, cr_loss=0.3847, over 4068079.28 frames. ], batch size: 90, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:46:46,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=301361.8333333333, ans=0.125 2024-09-15 14:47:06,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=301390.1666666667, ans=0.125 2024-09-15 14:47:48,004 INFO [train.py:1198] (1/2) Epoch 17, batch 4150, loss[loss=0.2381, ctc_loss=0.1617, cr_loss=0.3823, over 21007.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1666, cr_loss=0.3852, over 4081471.77 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:47:53,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.056e+02 2.176e+02 2.386e+02 3.783e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 14:48:00,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301475.1666666667, ans=0.125 2024-09-15 14:48:31,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=301560.1666666667, ans=0.125 2024-09-15 14:48:32,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2024-09-15 14:48:38,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=301560.1666666667, ans=0.125 2024-09-15 14:48:42,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-15 14:49:02,741 INFO [train.py:1198] (1/2) Epoch 17, batch 4200, loss[loss=0.2433, ctc_loss=0.1651, cr_loss=0.3913, over 20969.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1663, cr_loss=0.3853, over 4080579.29 frames. ], batch size: 64, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:49:16,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301645.1666666667, ans=0.125 2024-09-15 14:49:36,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-15 14:50:17,616 INFO [train.py:1198] (1/2) Epoch 17, batch 4250, loss[loss=0.2458, ctc_loss=0.1674, cr_loss=0.3922, over 20682.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1665, cr_loss=0.3852, over 4080859.04 frames. ], batch size: 68, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:50:23,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.250e+02 2.443e+02 4.732e+02, threshold=4.501e+02, percent-clipped=1.0 2024-09-15 14:50:24,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-15 14:50:39,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=301786.8333333333, ans=0.0 2024-09-15 14:50:49,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=301815.1666666667, ans=0.125 2024-09-15 14:50:57,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301815.1666666667, ans=0.125 2024-09-15 14:51:35,478 INFO [train.py:1198] (1/2) Epoch 17, batch 4300, loss[loss=0.278, ctc_loss=0.1919, cr_loss=0.4309, over 18333.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1663, cr_loss=0.3861, over 4086411.78 frames. ], batch size: 108, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:51:35,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=301900.1666666667, ans=0.0 2024-09-15 14:51:43,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=301900.1666666667, ans=0.125 2024-09-15 14:52:38,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302013.5, ans=0.1 2024-09-15 14:52:38,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=302013.5, ans=0.0 2024-09-15 14:52:49,694 INFO [train.py:1198] (1/2) Epoch 17, batch 4350, loss[loss=0.2315, ctc_loss=0.1594, cr_loss=0.3607, over 20986.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1661, cr_loss=0.3861, over 4089544.97 frames. ], batch size: 51, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:52:55,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.097e+02 2.181e+02 2.314e+02 2.892e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 14:53:07,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=302070.1666666667, ans=0.125 2024-09-15 14:53:14,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-15 14:53:15,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=302070.1666666667, ans=0.125 2024-09-15 14:53:30,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302098.5, ans=0.125 2024-09-15 14:53:31,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=22.5 2024-09-15 14:54:07,500 INFO [train.py:1198] (1/2) Epoch 17, batch 4400, loss[loss=0.2784, ctc_loss=0.1924, cr_loss=0.4303, over 20020.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1663, cr_loss=0.3868, over 4095775.74 frames. ], batch size: 80, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:54:18,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=302183.5, ans=0.0 2024-09-15 14:54:21,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=302211.8333333333, ans=0.035 2024-09-15 14:54:25,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=302211.8333333333, ans=0.125 2024-09-15 14:54:32,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-15 14:54:45,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=302240.1666666667, ans=0.125 2024-09-15 14:54:49,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=302240.1666666667, ans=0.125 2024-09-15 14:55:03,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=302268.5, ans=0.2 2024-09-15 14:55:04,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=302268.5, ans=0.5 2024-09-15 14:55:22,733 INFO [train.py:1198] (1/2) Epoch 17, batch 4450, loss[loss=0.1985, ctc_loss=0.1328, cr_loss=0.3286, over 20982.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1667, cr_loss=0.3867, over 4080218.17 frames. ], batch size: 49, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:55:28,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.075e+02 2.240e+02 2.483e+02 6.482e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-15 14:55:33,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=302325.1666666667, ans=0.125 2024-09-15 14:55:34,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=302325.1666666667, ans=0.0 2024-09-15 14:55:45,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=302353.5, ans=0.0 2024-09-15 14:55:47,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-15 14:55:57,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=302381.8333333333, ans=0.2 2024-09-15 14:56:02,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=302381.8333333333, ans=0.0 2024-09-15 14:56:08,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302410.1666666667, ans=0.1 2024-09-15 14:56:27,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302438.5, ans=0.125 2024-09-15 14:56:39,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302466.8333333333, ans=0.1 2024-09-15 14:56:40,625 INFO [train.py:1198] (1/2) Epoch 17, batch 4500, loss[loss=0.2465, ctc_loss=0.1678, cr_loss=0.3933, over 21060.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.166, cr_loss=0.3853, over 4097956.88 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:56:57,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=302495.1666666667, ans=0.125 2024-09-15 14:57:24,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=302551.8333333333, ans=0.125 2024-09-15 14:57:25,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302551.8333333333, ans=0.1 2024-09-15 14:57:36,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2024-09-15 14:57:39,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-15 14:57:55,589 INFO [train.py:1198] (1/2) Epoch 17, batch 4550, loss[loss=0.2095, ctc_loss=0.1437, cr_loss=0.3289, over 20991.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.167, cr_loss=0.3867, over 4087890.99 frames. ], batch size: 48, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 14:58:01,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.070e+02 2.214e+02 2.404e+02 5.625e+02, threshold=4.427e+02, percent-clipped=1.0 2024-09-15 14:58:46,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-15 14:59:13,620 INFO [train.py:1198] (1/2) Epoch 17, batch 4600, loss[loss=0.2781, ctc_loss=0.1933, cr_loss=0.4244, over 18195.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1671, cr_loss=0.3869, over 4085501.37 frames. ], batch size: 108, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 14:59:29,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=302778.5, ans=0.125 2024-09-15 14:59:44,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302806.8333333333, ans=0.125 2024-09-15 14:59:59,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=302835.1666666667, ans=0.125 2024-09-15 15:00:06,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302835.1666666667, ans=0.1 2024-09-15 15:00:25,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=302863.5, ans=0.5 2024-09-15 15:00:28,629 INFO [train.py:1198] (1/2) Epoch 17, batch 4650, loss[loss=0.2702, ctc_loss=0.1892, cr_loss=0.4047, over 20672.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1657, cr_loss=0.3847, over 4095440.65 frames. ], batch size: 68, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:00:34,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.052e+02 2.194e+02 2.380e+02 3.361e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 15:01:10,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=302948.5, ans=0.0 2024-09-15 15:01:33,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=303005.1666666667, ans=0.125 2024-09-15 15:01:44,086 INFO [train.py:1198] (1/2) Epoch 17, batch 4700, loss[loss=0.2052, ctc_loss=0.1374, cr_loss=0.3391, over 20976.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3845, over 4099736.62 frames. ], batch size: 51, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:01:44,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=303033.5, ans=0.025 2024-09-15 15:02:17,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-09-15 15:02:30,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=303118.5, ans=0.2 2024-09-15 15:02:31,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303118.5, ans=0.0 2024-09-15 15:03:02,018 INFO [train.py:1198] (1/2) Epoch 17, batch 4750, loss[loss=0.2069, ctc_loss=0.1406, cr_loss=0.3311, over 19934.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1668, cr_loss=0.3853, over 4078168.04 frames. ], batch size: 44, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:03:08,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.666e+02 2.045e+02 2.168e+02 2.315e+02 2.890e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 15:03:51,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=303260.1666666667, ans=0.125 2024-09-15 15:04:16,719 INFO [train.py:1198] (1/2) Epoch 17, batch 4800, loss[loss=0.2279, ctc_loss=0.1565, cr_loss=0.3571, over 20972.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1675, cr_loss=0.3856, over 4068503.64 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:04:27,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=303316.8333333333, ans=0.0 2024-09-15 15:04:48,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=303373.5, ans=0.2 2024-09-15 15:04:54,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=303373.5, ans=0.125 2024-09-15 15:05:32,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=303430.1666666667, ans=0.125 2024-09-15 15:05:34,740 INFO [train.py:1198] (1/2) Epoch 17, batch 4850, loss[loss=0.2413, ctc_loss=0.1665, cr_loss=0.3743, over 20814.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1674, cr_loss=0.3857, over 4071510.34 frames. ], batch size: 59, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:05:40,693 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.079e+02 2.185e+02 2.356e+02 3.985e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-15 15:05:56,017 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:06:14,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-09-15 15:06:26,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=303543.5, ans=0.02 2024-09-15 15:06:49,733 INFO [train.py:1198] (1/2) Epoch 17, batch 4900, loss[loss=0.2863, ctc_loss=0.1949, cr_loss=0.457, over 20310.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1676, cr_loss=0.386, over 4065224.09 frames. ], batch size: 74, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:06:57,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303600.1666666667, ans=0.1 2024-09-15 15:06:57,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=12.0 2024-09-15 15:07:04,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=303628.5, ans=0.125 2024-09-15 15:07:13,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=303628.5, ans=0.125 2024-09-15 15:07:16,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=303628.5, ans=0.125 2024-09-15 15:07:54,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=303713.5, ans=0.125 2024-09-15 15:08:06,990 INFO [train.py:1198] (1/2) Epoch 17, batch 4950, loss[loss=0.2654, ctc_loss=0.1859, cr_loss=0.3973, over 20709.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1679, cr_loss=0.3871, over 4074074.54 frames. ], batch size: 71, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:08:12,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.056e+02 2.198e+02 2.367e+02 3.342e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-15 15:08:26,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=303770.1666666667, ans=0.0 2024-09-15 15:08:43,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-15 15:09:03,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=303826.8333333333, ans=0.0 2024-09-15 15:09:20,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=303883.5, ans=10.0 2024-09-15 15:09:21,191 INFO [train.py:1198] (1/2) Epoch 17, batch 5000, loss[loss=0.2157, ctc_loss=0.1468, cr_loss=0.3441, over 20963.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1682, cr_loss=0.3876, over 4081183.25 frames. ], batch size: 51, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:09:52,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=303940.1666666667, ans=0.035 2024-09-15 15:10:32,941 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:10:35,442 INFO [train.py:1198] (1/2) Epoch 17, batch 5050, loss[loss=0.2807, ctc_loss=0.2004, cr_loss=0.4019, over 18231.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1685, cr_loss=0.3887, over 4079742.43 frames. ], batch size: 108, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:10:38,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=304025.1666666667, ans=0.04949747468305833 2024-09-15 15:10:41,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.082e+02 2.226e+02 2.461e+02 3.152e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 15:11:23,725 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:11:43,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-15 15:11:50,192 INFO [train.py:1198] (1/2) Epoch 17, batch 5100, loss[loss=0.2297, ctc_loss=0.1534, cr_loss=0.3813, over 21064.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1669, cr_loss=0.3862, over 4084481.65 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:11:53,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-15 15:12:06,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304195.1666666667, ans=0.1 2024-09-15 15:12:22,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=304223.5, ans=0.0 2024-09-15 15:12:27,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=304223.5, ans=0.05 2024-09-15 15:12:33,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304251.8333333333, ans=0.1 2024-09-15 15:12:33,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304251.8333333333, ans=0.1 2024-09-15 15:12:43,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=304251.8333333333, ans=0.0 2024-09-15 15:12:50,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=304280.1666666667, ans=0.125 2024-09-15 15:12:59,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=304280.1666666667, ans=0.0 2024-09-15 15:13:06,609 INFO [train.py:1198] (1/2) Epoch 17, batch 5150, loss[loss=0.2105, ctc_loss=0.1422, cr_loss=0.3416, over 20958.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1663, cr_loss=0.3853, over 4081124.22 frames. ], batch size: 50, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:13:12,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.104e+02 2.325e+02 2.609e+02 4.325e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-15 15:13:18,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=304308.5, ans=0.1 2024-09-15 15:13:20,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=304336.8333333333, ans=0.0 2024-09-15 15:13:26,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2024-09-15 15:13:40,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=304365.1666666667, ans=0.0 2024-09-15 15:14:02,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=304393.5, ans=0.125 2024-09-15 15:14:02,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304393.5, ans=0.1 2024-09-15 15:14:20,509 INFO [train.py:1198] (1/2) Epoch 17, batch 5200, loss[loss=0.2115, ctc_loss=0.143, cr_loss=0.3426, over 20969.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1656, cr_loss=0.3836, over 4081347.90 frames. ], batch size: 49, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:14:30,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304450.1666666667, ans=0.125 2024-09-15 15:15:08,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=304535.1666666667, ans=0.035 2024-09-15 15:15:34,866 INFO [train.py:1198] (1/2) Epoch 17, batch 5250, loss[loss=0.2179, ctc_loss=0.1501, cr_loss=0.339, over 21000.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1645, cr_loss=0.382, over 4094640.39 frames. ], batch size: 52, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:15:40,794 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 2.000e+02 2.130e+02 2.333e+02 2.745e+02, threshold=4.260e+02, percent-clipped=0.0 2024-09-15 15:15:47,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=304591.8333333333, ans=0.125 2024-09-15 15:16:17,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=304676.8333333333, ans=0.0 2024-09-15 15:16:48,834 INFO [train.py:1198] (1/2) Epoch 17, batch 5300, loss[loss=0.2422, ctc_loss=0.1622, cr_loss=0.4002, over 20955.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1653, cr_loss=0.3832, over 4095953.90 frames. ], batch size: 50, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:17:48,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=304846.8333333333, ans=0.2 2024-09-15 15:18:01,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=304846.8333333333, ans=0.125 2024-09-15 15:18:05,675 INFO [train.py:1198] (1/2) Epoch 17, batch 5350, loss[loss=0.1989, ctc_loss=0.1309, cr_loss=0.3401, over 21010.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1663, cr_loss=0.3851, over 4097073.98 frames. ], batch size: 52, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:18:11,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.104e+02 2.238e+02 2.500e+02 4.002e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-15 15:18:29,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304903.5, ans=0.125 2024-09-15 15:18:41,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=304931.8333333333, ans=0.2 2024-09-15 15:19:13,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=304988.5, ans=10.0 2024-09-15 15:19:19,546 INFO [train.py:1198] (1/2) Epoch 17, batch 5400, loss[loss=0.2461, ctc_loss=0.1655, cr_loss=0.4033, over 21006.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1664, cr_loss=0.3854, over 4096459.03 frames. ], batch size: 61, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:19:41,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-15 15:19:49,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=305073.5, ans=0.0 2024-09-15 15:20:29,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-15 15:20:33,466 INFO [train.py:1198] (1/2) Epoch 17, batch 5450, loss[loss=0.2636, ctc_loss=0.1806, cr_loss=0.415, over 20778.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1675, cr_loss=0.3866, over 4084892.53 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:20:39,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.064e+02 2.229e+02 2.388e+02 4.472e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 15:20:45,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=305158.5, ans=0.125 2024-09-15 15:21:16,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=305243.5, ans=0.125 2024-09-15 15:21:47,593 INFO [train.py:1198] (1/2) Epoch 17, batch 5500, loss[loss=0.2034, ctc_loss=0.1368, cr_loss=0.3332, over 20947.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1668, cr_loss=0.3865, over 4091034.83 frames. ], batch size: 51, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:21:55,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=305300.1666666667, ans=0.125 2024-09-15 15:21:57,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-15 15:21:59,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=305300.1666666667, ans=0.2 2024-09-15 15:22:22,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=22.5 2024-09-15 15:22:28,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=305356.8333333333, ans=0.125 2024-09-15 15:22:50,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305413.5, ans=0.1 2024-09-15 15:23:03,721 INFO [train.py:1198] (1/2) Epoch 17, batch 5550, loss[loss=0.2184, ctc_loss=0.1467, cr_loss=0.3582, over 21008.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1658, cr_loss=0.3853, over 4097986.58 frames. ], batch size: 52, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:23:09,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 1.985e+02 2.152e+02 2.317e+02 7.450e+02, threshold=4.304e+02, percent-clipped=1.0 2024-09-15 15:23:22,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2024-09-15 15:23:36,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=305498.5, ans=0.0 2024-09-15 15:23:42,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=305498.5, ans=0.0 2024-09-15 15:23:53,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=305526.8333333333, ans=0.0 2024-09-15 15:23:55,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=305526.8333333333, ans=0.125 2024-09-15 15:24:17,563 INFO [train.py:1198] (1/2) Epoch 17, batch 5600, loss[loss=0.2144, ctc_loss=0.1451, cr_loss=0.3465, over 20965.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1666, cr_loss=0.3864, over 4097052.15 frames. ], batch size: 50, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:24:35,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-15 15:24:48,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=305640.1666666667, ans=0.125 2024-09-15 15:24:59,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-15 15:25:04,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=305668.5, ans=0.125 2024-09-15 15:25:05,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=22.5 2024-09-15 15:25:10,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=305668.5, ans=0.0 2024-09-15 15:25:16,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=305696.8333333333, ans=0.125 2024-09-15 15:25:30,832 INFO [train.py:1198] (1/2) Epoch 17, batch 5650, loss[loss=0.2538, ctc_loss=0.1732, cr_loss=0.4031, over 20910.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.166, cr_loss=0.385, over 4091999.96 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:25:36,883 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.098e+02 2.192e+02 2.395e+02 3.412e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 15:25:54,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=305753.5, ans=0.125 2024-09-15 15:26:09,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=305781.8333333333, ans=0.0 2024-09-15 15:26:11,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=305781.8333333333, ans=0.035 2024-09-15 15:26:24,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-15 15:26:45,018 INFO [train.py:1198] (1/2) Epoch 17, batch 5700, loss[loss=0.2636, ctc_loss=0.182, cr_loss=0.4084, over 21066.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1662, cr_loss=0.3854, over 4099940.62 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:26:54,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=305866.8333333333, ans=0.0 2024-09-15 15:26:54,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-09-15 15:27:02,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=305895.1666666667, ans=0.95 2024-09-15 15:27:15,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=305923.5, ans=0.125 2024-09-15 15:27:52,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=305980.1666666667, ans=0.125 2024-09-15 15:28:02,372 INFO [train.py:1198] (1/2) Epoch 17, batch 5750, loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3779, over 20825.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1663, cr_loss=0.3859, over 4102942.42 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:28:08,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.087e+02 2.200e+02 2.353e+02 3.014e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-15 15:28:17,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=306036.8333333333, ans=0.125 2024-09-15 15:29:14,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=306150.1666666667, ans=0.125 2024-09-15 15:29:16,305 INFO [train.py:1198] (1/2) Epoch 17, batch 5800, loss[loss=0.2646, ctc_loss=0.1821, cr_loss=0.4124, over 21017.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1661, cr_loss=0.3853, over 4103469.45 frames. ], batch size: 62, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:29:53,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=306206.8333333333, ans=0.125 2024-09-15 15:30:30,313 INFO [train.py:1198] (1/2) Epoch 17, batch 5850, loss[loss=0.2404, ctc_loss=0.1625, cr_loss=0.3892, over 20780.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1665, cr_loss=0.3862, over 4093445.66 frames. ], batch size: 53, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:30:36,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.073e+02 2.235e+02 2.582e+02 3.439e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-15 15:30:53,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=306320.1666666667, ans=0.025 2024-09-15 15:31:39,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306405.1666666667, ans=0.1 2024-09-15 15:31:46,450 INFO [train.py:1198] (1/2) Epoch 17, batch 5900, loss[loss=0.2172, ctc_loss=0.1463, cr_loss=0.3546, over 19890.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1666, cr_loss=0.3866, over 4083465.75 frames. ], batch size: 44, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:32:56,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=306546.8333333333, ans=0.05 2024-09-15 15:33:00,702 INFO [train.py:1198] (1/2) Epoch 17, batch 5950, loss[loss=0.2091, ctc_loss=0.1419, cr_loss=0.3358, over 20238.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3857, over 4085198.69 frames. ], batch size: 45, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:33:05,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306575.1666666667, ans=0.1 2024-09-15 15:33:06,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.030e+02 2.193e+02 2.359e+02 3.386e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 15:33:20,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-15 15:33:45,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-15 15:33:48,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=306660.1666666667, ans=0.0 2024-09-15 15:34:14,898 INFO [train.py:1198] (1/2) Epoch 17, batch 6000, loss[loss=0.2614, ctc_loss=0.1787, cr_loss=0.4138, over 20992.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1658, cr_loss=0.3844, over 4084709.09 frames. ], batch size: 63, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:34:14,898 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 15:34:35,279 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.0454, ctc_loss=0.0454, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 15:34:35,279 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 15:34:45,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=306716.8333333333, ans=0.125 2024-09-15 15:35:43,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=306830.1666666667, ans=0.0 2024-09-15 15:35:44,968 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:35:51,719 INFO [train.py:1198] (1/2) Epoch 17, batch 6050, loss[loss=0.2197, ctc_loss=0.1468, cr_loss=0.3649, over 20869.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1676, cr_loss=0.3873, over 4075420.44 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:35:55,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=306858.5, ans=0.0 2024-09-15 15:35:59,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.077e+02 2.264e+02 2.512e+02 4.569e+02, threshold=4.528e+02, percent-clipped=1.0 2024-09-15 15:36:38,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=306943.5, ans=0.0 2024-09-15 15:37:04,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-15 15:37:05,858 INFO [train.py:1198] (1/2) Epoch 17, batch 6100, loss[loss=0.2576, ctc_loss=0.1736, cr_loss=0.4201, over 20694.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1677, cr_loss=0.3879, over 4080168.60 frames. ], batch size: 71, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:37:12,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-15 15:37:14,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=307000.1666666667, ans=0.125 2024-09-15 15:38:02,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307085.1666666667, ans=0.125 2024-09-15 15:38:17,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=307113.5, ans=0.125 2024-09-15 15:38:20,265 INFO [train.py:1198] (1/2) Epoch 17, batch 6150, loss[loss=0.3038, ctc_loss=0.2194, cr_loss=0.4223, over 14209.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1692, cr_loss=0.3896, over 4056835.87 frames. ], batch size: 150, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:38:27,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.113e+02 2.324e+02 2.544e+02 3.191e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-15 15:38:49,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=307198.5, ans=0.0 2024-09-15 15:38:58,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-15 15:39:01,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=307198.5, ans=0.04949747468305833 2024-09-15 15:39:14,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=307226.8333333333, ans=0.125 2024-09-15 15:39:34,450 INFO [train.py:1198] (1/2) Epoch 17, batch 6200, loss[loss=0.2337, ctc_loss=0.1612, cr_loss=0.3628, over 20977.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1698, cr_loss=0.3892, over 4032977.25 frames. ], batch size: 52, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:39:58,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307311.8333333333, ans=0.125 2024-09-15 15:40:30,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-09-15 15:40:36,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.90 vs. limit=22.5 2024-09-15 15:40:48,468 INFO [train.py:1198] (1/2) Epoch 17, batch 6250, loss[loss=0.2586, ctc_loss=0.1763, cr_loss=0.4115, over 21031.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1701, cr_loss=0.3892, over 4017869.77 frames. ], batch size: 62, lr: 4.86e-03, grad_scale: 16.0 2024-09-15 15:40:57,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.078e+02 2.212e+02 2.395e+02 3.360e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 15:41:25,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=307481.8333333333, ans=0.2 2024-09-15 15:41:58,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=307538.5, ans=0.125 2024-09-15 15:42:01,387 INFO [train.py:1198] (1/2) Epoch 17, batch 6300, loss[loss=0.2884, ctc_loss=0.2099, cr_loss=0.3925, over 13991.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1711, cr_loss=0.3894, over 3982280.63 frames. ], batch size: 150, lr: 4.85e-03, grad_scale: 16.0 2024-09-15 15:42:26,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-15 15:42:44,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=307651.8333333333, ans=0.125 2024-09-15 15:43:12,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307708.5, ans=0.1 2024-09-15 15:43:13,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=307708.5, ans=0.0 2024-09-15 15:43:13,984 INFO [train.py:1198] (1/2) Epoch 17, batch 6350, loss[loss=0.2814, ctc_loss=0.2026, cr_loss=0.3937, over 15014.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1731, cr_loss=0.3885, over 3844868.68 frames. ], batch size: 149, lr: 4.85e-03, grad_scale: 16.0 2024-09-15 15:43:22,977 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.182e+02 2.341e+02 2.627e+02 4.942e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-15 15:43:44,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=307765.1666666667, ans=0.0 2024-09-15 15:43:47,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=307765.1666666667, ans=0.2 2024-09-15 15:43:52,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307765.1666666667, ans=0.1 2024-09-15 15:43:58,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2024-09-15 15:44:57,827 INFO [train.py:1198] (1/2) Epoch 18, batch 0, loss[loss=0.2472, ctc_loss=0.1644, cr_loss=0.4142, over 20942.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1644, cr_loss=0.4142, over 20942.00 frames. ], batch size: 60, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:44:57,827 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 15:45:16,138 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.04502, ctc_loss=0.04502, cr_loss=1.051e-14, over 944034.00 frames. 2024-09-15 15:45:16,139 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 15:45:22,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=307824.6666666667, ans=0.025 2024-09-15 15:45:29,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=22.5 2024-09-15 15:45:48,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=307881.3333333333, ans=0.0 2024-09-15 15:46:30,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=307938.0, ans=0.0 2024-09-15 15:46:36,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=307966.3333333333, ans=0.125 2024-09-15 15:46:37,543 INFO [train.py:1198] (1/2) Epoch 18, batch 50, loss[loss=0.2586, ctc_loss=0.1734, cr_loss=0.4262, over 21080.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1625, cr_loss=0.3805, over 930183.47 frames. ], batch size: 56, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:46:59,889 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.121e+02 2.402e+02 2.678e+02 4.542e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-15 15:47:16,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=308023.0, ans=0.0 2024-09-15 15:47:30,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=308051.3333333333, ans=0.125 2024-09-15 15:47:52,394 INFO [train.py:1198] (1/2) Epoch 18, batch 100, loss[loss=0.2251, ctc_loss=0.153, cr_loss=0.3603, over 20969.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1612, cr_loss=0.3787, over 1617618.02 frames. ], batch size: 48, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:48:00,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=308108.0, ans=0.125 2024-09-15 15:48:31,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=308164.6666666667, ans=0.125 2024-09-15 15:48:34,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=308164.6666666667, ans=0.2 2024-09-15 15:48:36,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-15 15:49:06,919 INFO [train.py:1198] (1/2) Epoch 18, batch 150, loss[loss=0.2276, ctc_loss=0.1533, cr_loss=0.3717, over 20980.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1628, cr_loss=0.3818, over 2177391.41 frames. ], batch size: 52, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:49:16,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=308249.6666666667, ans=0.125 2024-09-15 15:49:29,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.042e+02 2.190e+02 2.337e+02 5.128e+02, threshold=4.380e+02, percent-clipped=1.0 2024-09-15 15:49:40,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=308306.3333333333, ans=0.07 2024-09-15 15:49:45,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=308306.3333333333, ans=0.025 2024-09-15 15:50:00,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=308334.6666666667, ans=12.0 2024-09-15 15:50:16,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308363.0, ans=0.1 2024-09-15 15:50:21,907 INFO [train.py:1198] (1/2) Epoch 18, batch 200, loss[loss=0.2494, ctc_loss=0.1702, cr_loss=0.3956, over 21014.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.165, cr_loss=0.3854, over 2610191.11 frames. ], batch size: 63, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:50:23,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=308391.3333333333, ans=0.0 2024-09-15 15:50:43,250 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:50:52,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=308448.0, ans=0.125 2024-09-15 15:51:04,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-15 15:51:18,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=308476.3333333333, ans=0.125 2024-09-15 15:51:35,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308504.6666666667, ans=0.1 2024-09-15 15:51:39,665 INFO [train.py:1198] (1/2) Epoch 18, batch 250, loss[loss=0.302, ctc_loss=0.2178, cr_loss=0.4209, over 14405.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1654, cr_loss=0.3856, over 2939493.16 frames. ], batch size: 150, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:52:04,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.696e+02 2.067e+02 2.174e+02 2.368e+02 3.453e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-15 15:52:15,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=308589.6666666667, ans=0.07 2024-09-15 15:52:20,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-15 15:52:23,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=308589.6666666667, ans=0.125 2024-09-15 15:52:35,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308618.0, ans=0.1 2024-09-15 15:52:57,531 INFO [train.py:1198] (1/2) Epoch 18, batch 300, loss[loss=0.2786, ctc_loss=0.1972, cr_loss=0.4069, over 19391.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1658, cr_loss=0.3856, over 3193497.88 frames. ], batch size: 90, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:54:07,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2024-09-15 15:54:12,482 INFO [train.py:1198] (1/2) Epoch 18, batch 350, loss[loss=0.2695, ctc_loss=0.1875, cr_loss=0.4097, over 19446.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1662, cr_loss=0.3854, over 3385807.12 frames. ], batch size: 90, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:54:12,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308816.3333333333, ans=0.1 2024-09-15 15:54:27,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=308844.6666666667, ans=0.125 2024-09-15 15:54:35,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.073e+02 2.207e+02 2.411e+02 3.269e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-15 15:54:35,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308844.6666666667, ans=0.125 2024-09-15 15:55:20,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=308929.6666666667, ans=0.2 2024-09-15 15:55:27,573 INFO [train.py:1198] (1/2) Epoch 18, batch 400, loss[loss=0.2583, ctc_loss=0.1771, cr_loss=0.4062, over 21055.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1651, cr_loss=0.3844, over 3550489.04 frames. ], batch size: 56, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:55:33,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308958.0, ans=0.1 2024-09-15 15:55:40,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=22.5 2024-09-15 15:55:45,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=308986.3333333333, ans=0.0 2024-09-15 15:56:42,493 INFO [train.py:1198] (1/2) Epoch 18, batch 450, loss[loss=0.2763, ctc_loss=0.1882, cr_loss=0.4403, over 20074.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1649, cr_loss=0.3842, over 3661978.11 frames. ], batch size: 80, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:57:04,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.013e+02 2.139e+02 2.345e+02 2.699e+02, threshold=4.277e+02, percent-clipped=0.0 2024-09-15 15:57:42,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-15 15:57:56,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=22.5 2024-09-15 15:58:03,045 INFO [train.py:1198] (1/2) Epoch 18, batch 500, loss[loss=0.2373, ctc_loss=0.1618, cr_loss=0.3776, over 20666.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.3839, over 3764600.54 frames. ], batch size: 71, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:58:04,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=309241.3333333333, ans=0.125 2024-09-15 15:58:24,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-15 15:58:53,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=309326.3333333333, ans=0.125 2024-09-15 15:58:56,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309326.3333333333, ans=0.125 2024-09-15 15:59:02,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=309354.6666666667, ans=0.125 2024-09-15 15:59:18,615 INFO [train.py:1198] (1/2) Epoch 18, batch 550, loss[loss=0.2387, ctc_loss=0.1631, cr_loss=0.3782, over 20926.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1646, cr_loss=0.3833, over 3836964.10 frames. ], batch size: 60, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:59:19,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=309383.0, ans=0.2 2024-09-15 15:59:41,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.062e+02 2.174e+02 2.381e+02 3.894e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 15:59:50,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=309439.6666666667, ans=0.125 2024-09-15 16:00:11,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-15 16:00:23,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=309496.3333333333, ans=0.0 2024-09-15 16:00:33,722 INFO [train.py:1198] (1/2) Epoch 18, batch 600, loss[loss=0.2065, ctc_loss=0.1391, cr_loss=0.3368, over 21055.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1653, cr_loss=0.385, over 3883382.15 frames. ], batch size: 53, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:00:36,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=309524.6666666667, ans=0.125 2024-09-15 16:00:46,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-09-15 16:01:08,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309581.3333333333, ans=0.0 2024-09-15 16:01:17,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=309609.6666666667, ans=0.0 2024-09-15 16:01:33,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=309638.0, ans=0.5 2024-09-15 16:01:48,733 INFO [train.py:1198] (1/2) Epoch 18, batch 650, loss[loss=0.2704, ctc_loss=0.1889, cr_loss=0.4075, over 21049.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3856, over 3935640.39 frames. ], batch size: 62, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:02:01,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=309666.3333333333, ans=0.125 2024-09-15 16:02:10,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.077e+02 2.206e+02 2.355e+02 2.917e+02, threshold=4.411e+02, percent-clipped=0.0 2024-09-15 16:02:34,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309751.3333333333, ans=0.1 2024-09-15 16:02:43,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=309751.3333333333, ans=0.125 2024-09-15 16:03:05,973 INFO [train.py:1198] (1/2) Epoch 18, batch 700, loss[loss=0.2168, ctc_loss=0.1446, cr_loss=0.3609, over 20942.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1664, cr_loss=0.3882, over 3974833.27 frames. ], batch size: 49, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:03:25,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-15 16:03:44,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=309864.6666666667, ans=0.125 2024-09-15 16:04:01,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309893.0, ans=0.125 2024-09-15 16:04:05,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=309893.0, ans=0.125 2024-09-15 16:04:23,082 INFO [train.py:1198] (1/2) Epoch 18, batch 750, loss[loss=0.2245, ctc_loss=0.15, cr_loss=0.3725, over 20950.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1663, cr_loss=0.3875, over 3989583.19 frames. ], batch size: 49, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:04:29,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-09-15 16:04:33,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=309949.6666666667, ans=0.125 2024-09-15 16:04:41,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=309978.0, ans=0.2 2024-09-15 16:04:45,824 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.063e+02 2.207e+02 2.386e+02 3.897e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-15 16:05:10,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310034.6666666667, ans=0.1 2024-09-15 16:05:19,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=310034.6666666667, ans=0.025 2024-09-15 16:05:38,503 INFO [train.py:1198] (1/2) Epoch 18, batch 800, loss[loss=0.2128, ctc_loss=0.1442, cr_loss=0.3434, over 20764.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1662, cr_loss=0.3873, over 4012186.04 frames. ], batch size: 56, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:05:58,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=310119.6666666667, ans=0.0 2024-09-15 16:06:04,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=310119.6666666667, ans=0.125 2024-09-15 16:06:53,618 INFO [train.py:1198] (1/2) Epoch 18, batch 850, loss[loss=0.2649, ctc_loss=0.1828, cr_loss=0.4105, over 20119.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1661, cr_loss=0.3868, over 4034160.04 frames. ], batch size: 80, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:07:16,149 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.068e+02 2.198e+02 2.441e+02 5.210e+02, threshold=4.396e+02, percent-clipped=2.0 2024-09-15 16:07:43,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=310318.0, ans=0.0 2024-09-15 16:08:00,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-15 16:08:08,628 INFO [train.py:1198] (1/2) Epoch 18, batch 900, loss[loss=0.2443, ctc_loss=0.1666, cr_loss=0.3885, over 21022.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1654, cr_loss=0.3857, over 4062140.63 frames. ], batch size: 63, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:08:39,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310403.0, ans=0.1 2024-09-15 16:08:45,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-09-15 16:08:52,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=310431.3333333333, ans=0.125 2024-09-15 16:09:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=310459.6666666667, ans=0.125 2024-09-15 16:09:13,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-15 16:09:29,114 INFO [train.py:1198] (1/2) Epoch 18, batch 950, loss[loss=0.2499, ctc_loss=0.1689, cr_loss=0.4049, over 20872.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1653, cr_loss=0.3858, over 4077830.73 frames. ], batch size: 65, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:09:53,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.094e+02 2.208e+02 2.373e+02 3.927e+02, threshold=4.417e+02, percent-clipped=0.0 2024-09-15 16:09:54,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=310544.6666666667, ans=0.125 2024-09-15 16:10:06,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=310573.0, ans=0.125 2024-09-15 16:10:13,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=310601.3333333333, ans=0.125 2024-09-15 16:10:35,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-15 16:10:44,163 INFO [train.py:1198] (1/2) Epoch 18, batch 1000, loss[loss=0.2534, ctc_loss=0.1709, cr_loss=0.4125, over 20975.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1647, cr_loss=0.385, over 4089906.30 frames. ], batch size: 58, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:11:16,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=310714.6666666667, ans=0.0 2024-09-15 16:11:17,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310714.6666666667, ans=0.1 2024-09-15 16:11:36,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=310743.0, ans=0.95 2024-09-15 16:11:58,492 INFO [train.py:1198] (1/2) Epoch 18, batch 1050, loss[loss=0.2687, ctc_loss=0.191, cr_loss=0.3888, over 20022.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1654, cr_loss=0.3852, over 4085231.32 frames. ], batch size: 80, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:12:06,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=310799.6666666667, ans=0.0 2024-09-15 16:12:22,481 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.080e+02 2.212e+02 2.362e+02 3.415e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 16:12:57,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=310913.0, ans=0.0 2024-09-15 16:12:57,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=310913.0, ans=0.125 2024-09-15 16:12:59,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.66 vs. limit=10.0 2024-09-15 16:13:07,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=310913.0, ans=0.0 2024-09-15 16:13:14,277 INFO [train.py:1198] (1/2) Epoch 18, batch 1100, loss[loss=0.2443, ctc_loss=0.1692, cr_loss=0.3755, over 20433.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1656, cr_loss=0.3852, over 4094988.43 frames. ], batch size: 74, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:13:19,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2024-09-15 16:13:46,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=12.0 2024-09-15 16:14:26,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=311054.6666666667, ans=0.125 2024-09-15 16:14:35,166 INFO [train.py:1198] (1/2) Epoch 18, batch 1150, loss[loss=0.2804, ctc_loss=0.1899, cr_loss=0.4526, over 20849.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1653, cr_loss=0.3851, over 4100047.70 frames. ], batch size: 65, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:14:38,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=311083.0, ans=0.125 2024-09-15 16:14:40,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2024-09-15 16:14:50,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=311111.3333333333, ans=0.0 2024-09-15 16:14:52,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2024-09-15 16:14:59,153 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.051e+02 2.187e+02 2.330e+02 3.028e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-15 16:15:22,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=311168.0, ans=0.125 2024-09-15 16:15:24,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=311168.0, ans=0.0 2024-09-15 16:15:50,839 INFO [train.py:1198] (1/2) Epoch 18, batch 1200, loss[loss=0.2235, ctc_loss=0.1475, cr_loss=0.3801, over 20955.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1646, cr_loss=0.3845, over 4099442.94 frames. ], batch size: 50, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:16:07,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311253.0, ans=0.1 2024-09-15 16:16:12,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=311253.0, ans=0.0 2024-09-15 16:16:24,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311281.3333333333, ans=0.1 2024-09-15 16:16:30,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=311281.3333333333, ans=0.0 2024-09-15 16:17:06,274 INFO [train.py:1198] (1/2) Epoch 18, batch 1250, loss[loss=0.2059, ctc_loss=0.1394, cr_loss=0.3322, over 20800.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1653, cr_loss=0.3846, over 4086659.45 frames. ], batch size: 53, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:17:27,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=311394.6666666667, ans=0.125 2024-09-15 16:17:30,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.106e+02 2.222e+02 2.491e+02 4.661e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-15 16:17:47,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=311423.0, ans=0.125 2024-09-15 16:18:21,540 INFO [train.py:1198] (1/2) Epoch 18, batch 1300, loss[loss=0.2053, ctc_loss=0.1397, cr_loss=0.328, over 20267.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1647, cr_loss=0.383, over 4084348.25 frames. ], batch size: 45, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:18:45,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-15 16:19:02,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=311564.6666666667, ans=0.015 2024-09-15 16:19:20,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311621.3333333333, ans=0.125 2024-09-15 16:19:36,700 INFO [train.py:1198] (1/2) Epoch 18, batch 1350, loss[loss=0.2435, ctc_loss=0.1664, cr_loss=0.3854, over 19981.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1647, cr_loss=0.3834, over 4090894.28 frames. ], batch size: 80, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:19:37,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=311649.6666666667, ans=0.0 2024-09-15 16:20:03,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.043e+02 2.226e+02 2.470e+02 3.858e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 16:20:22,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=311706.3333333333, ans=0.07 2024-09-15 16:20:31,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=311734.6666666667, ans=0.05 2024-09-15 16:20:49,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311763.0, ans=0.1 2024-09-15 16:20:51,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=311763.0, ans=0.125 2024-09-15 16:20:58,327 INFO [train.py:1198] (1/2) Epoch 18, batch 1400, loss[loss=0.2514, ctc_loss=0.173, cr_loss=0.392, over 21018.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1647, cr_loss=0.3832, over 4098163.29 frames. ], batch size: 61, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:21:01,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=311791.3333333333, ans=0.125 2024-09-15 16:21:28,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=311848.0, ans=0.125 2024-09-15 16:22:05,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-15 16:22:13,518 INFO [train.py:1198] (1/2) Epoch 18, batch 1450, loss[loss=0.2669, ctc_loss=0.1843, cr_loss=0.4134, over 21041.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1643, cr_loss=0.3832, over 4100395.01 frames. ], batch size: 62, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:22:25,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=311933.0, ans=0.0 2024-09-15 16:22:37,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.072e+02 2.254e+02 2.499e+02 3.931e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-15 16:23:12,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2024-09-15 16:23:28,488 INFO [train.py:1198] (1/2) Epoch 18, batch 1500, loss[loss=0.2347, ctc_loss=0.1565, cr_loss=0.391, over 20982.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1648, cr_loss=0.3839, over 4099802.70 frames. ], batch size: 55, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:23:51,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=312103.0, ans=0.125 2024-09-15 16:24:02,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312131.3333333333, ans=0.125 2024-09-15 16:24:09,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=312131.3333333333, ans=0.0 2024-09-15 16:24:26,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312159.6666666667, ans=0.0 2024-09-15 16:24:35,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-15 16:24:43,767 INFO [train.py:1198] (1/2) Epoch 18, batch 1550, loss[loss=0.2047, ctc_loss=0.1386, cr_loss=0.3307, over 21067.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3845, over 4091031.30 frames. ], batch size: 53, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:24:49,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=312216.3333333333, ans=0.0 2024-09-15 16:25:07,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.060e+02 2.181e+02 2.357e+02 4.136e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 16:25:10,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=312244.6666666667, ans=0.125 2024-09-15 16:26:03,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=312358.0, ans=0.2 2024-09-15 16:26:04,473 INFO [train.py:1198] (1/2) Epoch 18, batch 1600, loss[loss=0.2166, ctc_loss=0.1447, cr_loss=0.3596, over 20983.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1653, cr_loss=0.3838, over 4091826.66 frames. ], batch size: 55, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:26:04,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=312358.0, ans=0.0 2024-09-15 16:26:36,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=312414.6666666667, ans=0.04949747468305833 2024-09-15 16:26:39,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=312414.6666666667, ans=0.0 2024-09-15 16:26:40,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=312414.6666666667, ans=0.125 2024-09-15 16:26:45,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=312414.6666666667, ans=0.0 2024-09-15 16:26:54,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.18 vs. limit=22.5 2024-09-15 16:26:58,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312443.0, ans=0.125 2024-09-15 16:27:19,987 INFO [train.py:1198] (1/2) Epoch 18, batch 1650, loss[loss=0.2971, ctc_loss=0.2156, cr_loss=0.4075, over 14359.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1645, cr_loss=0.3835, over 4097191.51 frames. ], batch size: 149, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:27:29,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-09-15 16:27:39,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=312528.0, ans=0.015 2024-09-15 16:27:42,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312528.0, ans=0.1 2024-09-15 16:27:43,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.047e+02 2.189e+02 2.321e+02 6.898e+02, threshold=4.378e+02, percent-clipped=1.0 2024-09-15 16:27:48,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=312556.3333333333, ans=0.05 2024-09-15 16:27:56,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312556.3333333333, ans=0.1 2024-09-15 16:28:15,935 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 16:28:17,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=312584.6666666667, ans=0.125 2024-09-15 16:28:17,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=312584.6666666667, ans=0.125 2024-09-15 16:28:35,064 INFO [train.py:1198] (1/2) Epoch 18, batch 1700, loss[loss=0.2374, ctc_loss=0.16, cr_loss=0.387, over 21043.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1645, cr_loss=0.3839, over 4105608.17 frames. ], batch size: 63, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:28:38,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=312641.3333333333, ans=0.125 2024-09-15 16:28:52,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=312669.6666666667, ans=0.2 2024-09-15 16:28:56,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=312669.6666666667, ans=0.0 2024-09-15 16:29:02,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-15 16:29:50,090 INFO [train.py:1198] (1/2) Epoch 18, batch 1750, loss[loss=0.2202, ctc_loss=0.1504, cr_loss=0.3489, over 21067.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1647, cr_loss=0.3844, over 4109162.98 frames. ], batch size: 56, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:30:02,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=312783.0, ans=0.125 2024-09-15 16:30:14,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.047e+02 2.172e+02 2.427e+02 3.153e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-15 16:30:23,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=312839.6666666667, ans=0.125 2024-09-15 16:30:31,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-15 16:30:40,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=12.0 2024-09-15 16:30:43,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.61 vs. limit=10.0 2024-09-15 16:30:47,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=312868.0, ans=0.09899494936611666 2024-09-15 16:31:05,122 INFO [train.py:1198] (1/2) Epoch 18, batch 1800, loss[loss=0.2828, ctc_loss=0.1994, cr_loss=0.4169, over 14178.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1648, cr_loss=0.3844, over 4102978.91 frames. ], batch size: 149, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:31:05,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312924.6666666667, ans=0.1 2024-09-15 16:31:29,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=22.5 2024-09-15 16:31:30,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=312953.0, ans=0.0 2024-09-15 16:31:32,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-15 16:31:34,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=312953.0, ans=0.125 2024-09-15 16:31:51,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312981.3333333333, ans=0.125 2024-09-15 16:32:08,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-09-15 16:32:25,480 INFO [train.py:1198] (1/2) Epoch 18, batch 1850, loss[loss=0.2491, ctc_loss=0.1694, cr_loss=0.3985, over 20301.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.3843, over 4096284.37 frames. ], batch size: 74, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:32:36,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=313066.3333333333, ans=0.0 2024-09-15 16:32:50,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.044e+02 2.184e+02 2.333e+02 4.056e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-15 16:33:17,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=313151.3333333333, ans=0.025 2024-09-15 16:33:25,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313179.6666666667, ans=0.1 2024-09-15 16:33:27,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=313179.6666666667, ans=0.025 2024-09-15 16:33:40,756 INFO [train.py:1198] (1/2) Epoch 18, batch 1900, loss[loss=0.2014, ctc_loss=0.1349, cr_loss=0.3324, over 21006.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.383, over 4111805.67 frames. ], batch size: 48, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:33:45,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313208.0, ans=0.1 2024-09-15 16:34:24,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=313293.0, ans=0.125 2024-09-15 16:34:33,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-15 16:34:55,952 INFO [train.py:1198] (1/2) Epoch 18, batch 1950, loss[loss=0.2176, ctc_loss=0.1458, cr_loss=0.3591, over 20993.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1636, cr_loss=0.382, over 4102201.95 frames. ], batch size: 50, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:35:12,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=313378.0, ans=0.0 2024-09-15 16:35:21,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.121e+02 2.301e+02 2.467e+02 3.179e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-15 16:35:36,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313406.3333333333, ans=0.1 2024-09-15 16:35:56,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-15 16:36:11,248 INFO [train.py:1198] (1/2) Epoch 18, batch 2000, loss[loss=0.3019, ctc_loss=0.2122, cr_loss=0.4488, over 19871.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1643, cr_loss=0.3833, over 4103100.60 frames. ], batch size: 80, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:36:49,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=313548.0, ans=0.125 2024-09-15 16:37:31,715 INFO [train.py:1198] (1/2) Epoch 18, batch 2050, loss[loss=0.3003, ctc_loss=0.2143, cr_loss=0.43, over 15007.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1643, cr_loss=0.3834, over 4102607.51 frames. ], batch size: 149, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:37:57,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.080e+02 2.224e+02 2.431e+02 3.023e+02, threshold=4.448e+02, percent-clipped=0.0 2024-09-15 16:38:16,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-09-15 16:38:47,196 INFO [train.py:1198] (1/2) Epoch 18, batch 2100, loss[loss=0.2751, ctc_loss=0.1898, cr_loss=0.4262, over 20707.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1646, cr_loss=0.3838, over 4090907.26 frames. ], batch size: 71, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:39:07,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=313803.0, ans=0.0 2024-09-15 16:39:19,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=313831.3333333333, ans=0.0 2024-09-15 16:40:02,537 INFO [train.py:1198] (1/2) Epoch 18, batch 2150, loss[loss=0.2289, ctc_loss=0.1572, cr_loss=0.3583, over 20701.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1648, cr_loss=0.3842, over 4097603.86 frames. ], batch size: 71, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:40:28,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.075e+02 2.196e+02 2.387e+02 4.503e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-15 16:40:42,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=313973.0, ans=0.0 2024-09-15 16:40:48,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=314001.3333333333, ans=0.2 2024-09-15 16:40:55,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=314001.3333333333, ans=0.025 2024-09-15 16:40:58,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=314001.3333333333, ans=0.125 2024-09-15 16:41:01,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=314029.6666666667, ans=0.0 2024-09-15 16:41:04,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=314029.6666666667, ans=0.125 2024-09-15 16:41:07,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=22.5 2024-09-15 16:41:17,738 INFO [train.py:1198] (1/2) Epoch 18, batch 2200, loss[loss=0.2392, ctc_loss=0.1627, cr_loss=0.3826, over 20777.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.164, cr_loss=0.3826, over 4091198.98 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:41:19,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=314058.0, ans=0.2 2024-09-15 16:41:23,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-15 16:41:47,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=22.5 2024-09-15 16:41:51,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=22.5 2024-09-15 16:42:01,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=314143.0, ans=0.125 2024-09-15 16:42:03,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=314143.0, ans=0.2 2024-09-15 16:42:12,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=314143.0, ans=0.05 2024-09-15 16:42:30,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=314171.3333333333, ans=0.125 2024-09-15 16:42:33,356 INFO [train.py:1198] (1/2) Epoch 18, batch 2250, loss[loss=0.2722, ctc_loss=0.1862, cr_loss=0.4299, over 20790.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1646, cr_loss=0.3834, over 4083262.59 frames. ], batch size: 53, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:42:36,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=314199.6666666667, ans=0.02 2024-09-15 16:43:01,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.087e+02 2.211e+02 2.436e+02 5.064e+02, threshold=4.422e+02, percent-clipped=1.0 2024-09-15 16:43:21,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=314256.3333333333, ans=0.125 2024-09-15 16:43:54,109 INFO [train.py:1198] (1/2) Epoch 18, batch 2300, loss[loss=0.1976, ctc_loss=0.1322, cr_loss=0.3267, over 20965.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1654, cr_loss=0.3858, over 4090752.47 frames. ], batch size: 48, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:44:16,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314369.6666666667, ans=0.1 2024-09-15 16:44:27,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=314398.0, ans=0.125 2024-09-15 16:44:51,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=314426.3333333333, ans=0.125 2024-09-15 16:45:03,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=314454.6666666667, ans=0.125 2024-09-15 16:45:09,217 INFO [train.py:1198] (1/2) Epoch 18, batch 2350, loss[loss=0.2754, ctc_loss=0.1905, cr_loss=0.4241, over 20073.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1651, cr_loss=0.3852, over 4095842.22 frames. ], batch size: 80, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:45:11,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-15 16:45:18,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=314483.0, ans=0.125 2024-09-15 16:45:35,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.108e+02 2.281e+02 2.436e+02 3.272e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-15 16:45:41,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=314539.6666666667, ans=0.2 2024-09-15 16:46:23,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=314624.6666666667, ans=0.125 2024-09-15 16:46:24,678 INFO [train.py:1198] (1/2) Epoch 18, batch 2400, loss[loss=0.1919, ctc_loss=0.1306, cr_loss=0.3065, over 20938.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1653, cr_loss=0.3858, over 4101471.93 frames. ], batch size: 49, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:47:21,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-09-15 16:47:36,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=314738.0, ans=0.0 2024-09-15 16:47:39,954 INFO [train.py:1198] (1/2) Epoch 18, batch 2450, loss[loss=0.2769, ctc_loss=0.1942, cr_loss=0.4134, over 18474.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1657, cr_loss=0.3865, over 4104799.20 frames. ], batch size: 108, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:48:05,805 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 2.060e+02 2.224e+02 2.494e+02 4.847e+02, threshold=4.449e+02, percent-clipped=1.0 2024-09-15 16:48:15,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-15 16:48:40,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314851.3333333333, ans=0.1 2024-09-15 16:48:54,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-15 16:48:55,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314879.6666666667, ans=0.1 2024-09-15 16:49:01,449 INFO [train.py:1198] (1/2) Epoch 18, batch 2500, loss[loss=0.2406, ctc_loss=0.1587, cr_loss=0.4093, over 20969.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1659, cr_loss=0.387, over 4095437.46 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:49:05,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-15 16:49:57,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=314993.0, ans=0.2 2024-09-15 16:50:16,193 INFO [train.py:1198] (1/2) Epoch 18, batch 2550, loss[loss=0.2341, ctc_loss=0.16, cr_loss=0.3704, over 20994.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1661, cr_loss=0.3873, over 4093325.62 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:50:41,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.119e+02 2.271e+02 2.541e+02 5.593e+02, threshold=4.541e+02, percent-clipped=2.0 2024-09-15 16:50:45,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=315106.3333333333, ans=0.0 2024-09-15 16:51:00,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315134.6666666667, ans=0.125 2024-09-15 16:51:10,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2024-09-15 16:51:31,768 INFO [train.py:1198] (1/2) Epoch 18, batch 2600, loss[loss=0.2371, ctc_loss=0.1599, cr_loss=0.3862, over 20975.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.165, cr_loss=0.3854, over 4107125.33 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:51:51,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315219.6666666667, ans=0.1 2024-09-15 16:52:08,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=315248.0, ans=0.125 2024-09-15 16:52:33,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=315304.6666666667, ans=0.125 2024-09-15 16:52:46,969 INFO [train.py:1198] (1/2) Epoch 18, batch 2650, loss[loss=0.2132, ctc_loss=0.1431, cr_loss=0.3507, over 20981.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1651, cr_loss=0.3856, over 4118935.29 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:53:12,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.091e+02 2.227e+02 2.454e+02 3.694e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-15 16:53:59,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=315446.3333333333, ans=0.2 2024-09-15 16:54:02,259 INFO [train.py:1198] (1/2) Epoch 18, batch 2700, loss[loss=0.2348, ctc_loss=0.1631, cr_loss=0.3584, over 21066.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1654, cr_loss=0.3851, over 4113651.18 frames. ], batch size: 53, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:54:05,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315474.6666666667, ans=0.1 2024-09-15 16:55:04,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315559.6666666667, ans=0.1 2024-09-15 16:55:23,344 INFO [train.py:1198] (1/2) Epoch 18, batch 2750, loss[loss=0.2354, ctc_loss=0.1567, cr_loss=0.3939, over 20791.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1651, cr_loss=0.3854, over 4120952.87 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:55:34,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315616.3333333333, ans=0.1 2024-09-15 16:55:35,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=315616.3333333333, ans=0.025 2024-09-15 16:55:48,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.045e+02 2.157e+02 2.327e+02 3.592e+02, threshold=4.314e+02, percent-clipped=0.0 2024-09-15 16:55:52,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=315673.0, ans=0.5 2024-09-15 16:55:57,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-09-15 16:56:38,855 INFO [train.py:1198] (1/2) Epoch 18, batch 2800, loss[loss=0.2855, ctc_loss=0.1973, cr_loss=0.441, over 20969.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3857, over 4118020.87 frames. ], batch size: 64, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:57:18,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 16:57:24,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-09-15 16:57:34,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2024-09-15 16:57:36,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-15 16:57:53,767 INFO [train.py:1198] (1/2) Epoch 18, batch 2850, loss[loss=0.2275, ctc_loss=0.1515, cr_loss=0.3801, over 20973.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1656, cr_loss=0.3861, over 4110805.86 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:57:57,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315899.6666666667, ans=0.125 2024-09-15 16:58:12,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-15 16:58:19,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.184e+02 2.305e+02 4.302e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-15 16:59:04,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=316013.0, ans=0.2 2024-09-15 16:59:05,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316013.0, ans=0.1 2024-09-15 16:59:09,221 INFO [train.py:1198] (1/2) Epoch 18, batch 2900, loss[loss=0.2639, ctc_loss=0.1816, cr_loss=0.4115, over 20963.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1654, cr_loss=0.3859, over 4107483.95 frames. ], batch size: 64, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:59:48,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=316098.0, ans=0.04949747468305833 2024-09-15 16:59:51,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=316098.0, ans=0.125 2024-09-15 17:00:29,912 INFO [train.py:1198] (1/2) Epoch 18, batch 2950, loss[loss=0.259, ctc_loss=0.1796, cr_loss=0.3973, over 18174.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1657, cr_loss=0.3854, over 4093318.56 frames. ], batch size: 108, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:00:51,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=316211.3333333333, ans=0.125 2024-09-15 17:00:55,875 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.049e+02 2.209e+02 2.438e+02 3.333e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-15 17:01:45,439 INFO [train.py:1198] (1/2) Epoch 18, batch 3000, loss[loss=0.2274, ctc_loss=0.1542, cr_loss=0.3659, over 20997.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1647, cr_loss=0.3844, over 4099974.43 frames. ], batch size: 48, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:01:45,440 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 17:02:10,947 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.0454, ctc_loss=0.0454, cr_loss=1.042e-14, over 944034.00 frames. 2024-09-15 17:02:10,947 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 17:02:15,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316324.6666666667, ans=0.125 2024-09-15 17:02:56,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=316409.6666666667, ans=0.2 2024-09-15 17:03:03,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=316409.6666666667, ans=0.0 2024-09-15 17:03:26,203 INFO [train.py:1198] (1/2) Epoch 18, batch 3050, loss[loss=0.2069, ctc_loss=0.14, cr_loss=0.3344, over 20974.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3849, over 4105668.93 frames. ], batch size: 48, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:03:43,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=316494.6666666667, ans=0.125 2024-09-15 17:03:51,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 1.989e+02 2.136e+02 2.266e+02 3.019e+02, threshold=4.272e+02, percent-clipped=0.0 2024-09-15 17:04:10,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=316551.3333333333, ans=0.125 2024-09-15 17:04:41,475 INFO [train.py:1198] (1/2) Epoch 18, batch 3100, loss[loss=0.2369, ctc_loss=0.1614, cr_loss=0.3776, over 21063.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.164, cr_loss=0.3844, over 4094571.22 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:04:49,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-15 17:04:52,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-15 17:05:10,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=316664.6666666667, ans=0.125 2024-09-15 17:05:33,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=316693.0, ans=0.125 2024-09-15 17:05:33,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-09-15 17:05:36,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-15 17:06:00,323 INFO [train.py:1198] (1/2) Epoch 18, batch 3150, loss[loss=0.246, ctc_loss=0.1681, cr_loss=0.3896, over 20878.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1642, cr_loss=0.3846, over 4101186.15 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:06:28,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.103e+02 2.241e+02 2.472e+02 6.402e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-15 17:06:29,273 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:06:40,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316806.3333333333, ans=0.1 2024-09-15 17:07:08,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316863.0, ans=0.1 2024-09-15 17:07:18,995 INFO [train.py:1198] (1/2) Epoch 18, batch 3200, loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3701, over 21060.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1644, cr_loss=0.384, over 4097588.30 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:07:22,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316891.3333333333, ans=0.1 2024-09-15 17:07:41,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=316919.6666666667, ans=0.125 2024-09-15 17:08:09,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=316976.3333333333, ans=0.0 2024-09-15 17:08:21,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-15 17:08:34,679 INFO [train.py:1198] (1/2) Epoch 18, batch 3250, loss[loss=0.2887, ctc_loss=0.2095, cr_loss=0.3963, over 14169.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1638, cr_loss=0.3832, over 4096846.60 frames. ], batch size: 150, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:08:36,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=317033.0, ans=0.0 2024-09-15 17:08:58,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317061.3333333333, ans=0.1 2024-09-15 17:09:00,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.008e+02 2.115e+02 2.277e+02 3.221e+02, threshold=4.229e+02, percent-clipped=0.0 2024-09-15 17:09:01,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=317061.3333333333, ans=0.125 2024-09-15 17:09:06,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317089.6666666667, ans=0.1 2024-09-15 17:09:07,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=317089.6666666667, ans=0.125 2024-09-15 17:09:20,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-09-15 17:09:49,967 INFO [train.py:1198] (1/2) Epoch 18, batch 3300, loss[loss=0.2588, ctc_loss=0.1784, cr_loss=0.4021, over 20245.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1646, cr_loss=0.3841, over 4071342.67 frames. ], batch size: 74, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:09:59,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=317174.6666666667, ans=0.125 2024-09-15 17:10:31,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317231.3333333333, ans=0.125 2024-09-15 17:11:05,380 INFO [train.py:1198] (1/2) Epoch 18, batch 3350, loss[loss=0.2545, ctc_loss=0.1737, cr_loss=0.4039, over 20777.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.165, cr_loss=0.3846, over 4083839.27 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:11:11,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317316.3333333333, ans=0.1 2024-09-15 17:11:13,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=317316.3333333333, ans=0.0 2024-09-15 17:11:28,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=317344.6666666667, ans=0.0 2024-09-15 17:11:32,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.047e+02 2.168e+02 2.320e+02 5.001e+02, threshold=4.336e+02, percent-clipped=1.0 2024-09-15 17:11:52,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=317373.0, ans=0.125 2024-09-15 17:11:58,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=317401.3333333333, ans=0.125 2024-09-15 17:12:09,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-15 17:12:28,329 INFO [train.py:1198] (1/2) Epoch 18, batch 3400, loss[loss=0.2623, ctc_loss=0.1794, cr_loss=0.4145, over 21072.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1641, cr_loss=0.3829, over 4089686.66 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:12:36,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-09-15 17:12:56,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-15 17:13:07,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=317514.6666666667, ans=0.0 2024-09-15 17:13:21,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317543.0, ans=0.1 2024-09-15 17:13:44,005 INFO [train.py:1198] (1/2) Epoch 18, batch 3450, loss[loss=0.2251, ctc_loss=0.1516, cr_loss=0.3677, over 21034.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.384, over 4085054.06 frames. ], batch size: 63, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:14:09,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.047e+02 2.154e+02 2.265e+02 3.567e+02, threshold=4.307e+02, percent-clipped=0.0 2024-09-15 17:14:22,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=317656.3333333333, ans=0.2 2024-09-15 17:14:42,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=317684.6666666667, ans=0.125 2024-09-15 17:14:55,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=317713.0, ans=0.125 2024-09-15 17:14:59,829 INFO [train.py:1198] (1/2) Epoch 18, batch 3500, loss[loss=0.2009, ctc_loss=0.1331, cr_loss=0.3393, over 19903.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1641, cr_loss=0.383, over 4082242.28 frames. ], batch size: 44, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:15:16,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-15 17:15:35,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=317798.0, ans=0.125 2024-09-15 17:15:59,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=317854.6666666667, ans=0.125 2024-09-15 17:16:01,366 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:16:08,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317854.6666666667, ans=0.1 2024-09-15 17:16:11,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=317854.6666666667, ans=0.125 2024-09-15 17:16:16,078 INFO [train.py:1198] (1/2) Epoch 18, batch 3550, loss[loss=0.2658, ctc_loss=0.1828, cr_loss=0.415, over 21078.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.164, cr_loss=0.3831, over 4095329.27 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:16:25,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-15 17:16:33,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=317911.3333333333, ans=0.0 2024-09-15 17:16:41,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.046e+02 2.196e+02 2.393e+02 3.616e+02, threshold=4.392e+02, percent-clipped=0.0 2024-09-15 17:17:35,285 INFO [train.py:1198] (1/2) Epoch 18, batch 3600, loss[loss=0.2466, ctc_loss=0.1685, cr_loss=0.3903, over 20830.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1638, cr_loss=0.3822, over 4101607.25 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:17:38,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=318024.6666666667, ans=0.2 2024-09-15 17:18:10,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318081.3333333333, ans=0.1 2024-09-15 17:18:30,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=318109.6666666667, ans=0.2 2024-09-15 17:18:31,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=318109.6666666667, ans=0.125 2024-09-15 17:18:36,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=318138.0, ans=0.0 2024-09-15 17:18:41,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2024-09-15 17:18:50,679 INFO [train.py:1198] (1/2) Epoch 18, batch 3650, loss[loss=0.2988, ctc_loss=0.2184, cr_loss=0.4022, over 14065.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1638, cr_loss=0.3823, over 4091107.56 frames. ], batch size: 149, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:18:52,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=318166.3333333333, ans=0.035 2024-09-15 17:18:55,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318166.3333333333, ans=0.125 2024-09-15 17:18:57,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318166.3333333333, ans=0.125 2024-09-15 17:19:03,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-15 17:19:16,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.068e+02 2.198e+02 2.370e+02 5.605e+02, threshold=4.396e+02, percent-clipped=2.0 2024-09-15 17:19:24,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=318223.0, ans=0.125 2024-09-15 17:19:39,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=318251.3333333333, ans=0.125 2024-09-15 17:20:06,245 INFO [train.py:1198] (1/2) Epoch 18, batch 3700, loss[loss=0.2517, ctc_loss=0.1722, cr_loss=0.3973, over 20880.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.3832, over 4094344.70 frames. ], batch size: 54, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:20:33,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=318336.3333333333, ans=10.0 2024-09-15 17:21:03,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=318393.0, ans=0.04949747468305833 2024-09-15 17:21:21,444 INFO [train.py:1198] (1/2) Epoch 18, batch 3750, loss[loss=0.2027, ctc_loss=0.1364, cr_loss=0.3317, over 20963.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1643, cr_loss=0.3833, over 4089299.17 frames. ], batch size: 52, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:21:47,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.069e+02 2.194e+02 2.394e+02 5.256e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 17:22:37,397 INFO [train.py:1198] (1/2) Epoch 18, batch 3800, loss[loss=0.2073, ctc_loss=0.1386, cr_loss=0.3435, over 21071.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1648, cr_loss=0.3846, over 4086818.33 frames. ], batch size: 53, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:23:16,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=318648.0, ans=0.125 2024-09-15 17:23:24,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318648.0, ans=0.125 2024-09-15 17:23:58,665 INFO [train.py:1198] (1/2) Epoch 18, batch 3850, loss[loss=0.2832, ctc_loss=0.1947, cr_loss=0.4426, over 21024.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.3839, over 4088527.62 frames. ], batch size: 63, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:24:24,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.050e+02 2.198e+02 2.435e+02 4.376e+02, threshold=4.397e+02, percent-clipped=0.0 2024-09-15 17:24:28,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318789.6666666667, ans=0.1 2024-09-15 17:24:52,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2024-09-15 17:24:58,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=318846.3333333333, ans=0.0 2024-09-15 17:25:03,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=318846.3333333333, ans=0.0 2024-09-15 17:25:14,643 INFO [train.py:1198] (1/2) Epoch 18, batch 3900, loss[loss=0.2307, ctc_loss=0.1567, cr_loss=0.3699, over 20946.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1648, cr_loss=0.3843, over 4101987.65 frames. ], batch size: 50, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:25:16,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=318874.6666666667, ans=0.025 2024-09-15 17:25:31,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=318903.0, ans=0.0 2024-09-15 17:25:32,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=22.5 2024-09-15 17:25:32,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-15 17:25:39,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-15 17:25:57,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-15 17:26:03,480 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:26:17,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=318988.0, ans=0.0 2024-09-15 17:26:17,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-15 17:26:26,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-15 17:26:30,413 INFO [train.py:1198] (1/2) Epoch 18, batch 3950, loss[loss=0.2527, ctc_loss=0.1751, cr_loss=0.3878, over 19994.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1644, cr_loss=0.3833, over 4088650.49 frames. ], batch size: 80, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:26:46,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=319044.6666666667, ans=0.025 2024-09-15 17:26:56,411 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.046e+02 2.210e+02 2.454e+02 3.010e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-15 17:27:46,765 INFO [train.py:1198] (1/2) Epoch 18, batch 4000, loss[loss=0.2732, ctc_loss=0.1937, cr_loss=0.3972, over 18253.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1641, cr_loss=0.3825, over 4089852.51 frames. ], batch size: 108, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:28:02,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.41 vs. limit=22.5 2024-09-15 17:28:20,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2024-09-15 17:29:05,354 INFO [train.py:1198] (1/2) Epoch 18, batch 4050, loss[loss=0.2414, ctc_loss=0.1621, cr_loss=0.3964, over 20764.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1644, cr_loss=0.3834, over 4087025.94 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:29:11,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=319299.6666666667, ans=0.035 2024-09-15 17:29:22,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=319328.0, ans=0.125 2024-09-15 17:29:32,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-15 17:29:33,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.092e+02 2.204e+02 2.412e+02 3.530e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 17:30:10,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319413.0, ans=0.1 2024-09-15 17:30:20,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=319413.0, ans=0.2 2024-09-15 17:30:23,479 INFO [train.py:1198] (1/2) Epoch 18, batch 4100, loss[loss=0.2373, ctc_loss=0.1639, cr_loss=0.367, over 21047.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.164, cr_loss=0.3833, over 4092926.28 frames. ], batch size: 62, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:30:37,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=319469.6666666667, ans=0.2 2024-09-15 17:30:49,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=319469.6666666667, ans=0.125 2024-09-15 17:31:13,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319526.3333333333, ans=0.1 2024-09-15 17:31:39,232 INFO [train.py:1198] (1/2) Epoch 18, batch 4150, loss[loss=0.2324, ctc_loss=0.1587, cr_loss=0.3684, over 20893.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1643, cr_loss=0.3845, over 4101505.35 frames. ], batch size: 54, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:32:06,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.065e+02 2.197e+02 2.386e+02 3.146e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-15 17:32:10,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-15 17:32:14,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319639.6666666667, ans=0.125 2024-09-15 17:32:28,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=319668.0, ans=0.2 2024-09-15 17:32:30,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=319668.0, ans=0.125 2024-09-15 17:32:30,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-15 17:32:37,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=319668.0, ans=0.125 2024-09-15 17:32:41,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-15 17:32:55,194 INFO [train.py:1198] (1/2) Epoch 18, batch 4200, loss[loss=0.2243, ctc_loss=0.1501, cr_loss=0.3713, over 20885.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1643, cr_loss=0.3846, over 4105343.83 frames. ], batch size: 54, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:33:03,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=319724.6666666667, ans=0.125 2024-09-15 17:33:10,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=319753.0, ans=0.0 2024-09-15 17:33:18,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-15 17:33:28,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=319781.3333333333, ans=0.125 2024-09-15 17:33:30,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=319781.3333333333, ans=0.125 2024-09-15 17:33:33,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=319781.3333333333, ans=0.125 2024-09-15 17:33:48,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=319809.6666666667, ans=0.2 2024-09-15 17:34:10,820 INFO [train.py:1198] (1/2) Epoch 18, batch 4250, loss[loss=0.199, ctc_loss=0.1322, cr_loss=0.3341, over 20970.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1646, cr_loss=0.3851, over 4104729.77 frames. ], batch size: 49, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:34:39,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319894.6666666667, ans=0.1 2024-09-15 17:34:40,974 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.078e+02 2.251e+02 2.430e+02 5.083e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-15 17:35:09,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=319951.3333333333, ans=0.125 2024-09-15 17:35:13,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=319951.3333333333, ans=0.125 2024-09-15 17:35:27,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=319979.6666666667, ans=0.2 2024-09-15 17:35:33,157 INFO [train.py:1198] (1/2) Epoch 18, batch 4300, loss[loss=0.2412, ctc_loss=0.1601, cr_loss=0.4052, over 20973.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1641, cr_loss=0.3845, over 4104698.15 frames. ], batch size: 58, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:35:33,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=320008.0, ans=0.125 2024-09-15 17:35:33,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=320008.0, ans=0.125 2024-09-15 17:35:39,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320008.0, ans=0.1 2024-09-15 17:35:50,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=320036.3333333333, ans=0.0 2024-09-15 17:36:00,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=320036.3333333333, ans=0.0 2024-09-15 17:36:19,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=320093.0, ans=0.125 2024-09-15 17:36:44,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=320121.3333333333, ans=0.125 2024-09-15 17:36:49,834 INFO [train.py:1198] (1/2) Epoch 18, batch 4350, loss[loss=0.2062, ctc_loss=0.14, cr_loss=0.3306, over 20322.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3842, over 4105031.59 frames. ], batch size: 45, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:36:51,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=320149.6666666667, ans=0.2 2024-09-15 17:37:17,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.021e+02 2.201e+02 2.329e+02 4.676e+02, threshold=4.402e+02, percent-clipped=1.0 2024-09-15 17:37:33,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=22.5 2024-09-15 17:37:34,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=320234.6666666667, ans=0.05 2024-09-15 17:37:53,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=320263.0, ans=0.125 2024-09-15 17:38:05,849 INFO [train.py:1198] (1/2) Epoch 18, batch 4400, loss[loss=0.225, ctc_loss=0.1518, cr_loss=0.3664, over 20964.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1646, cr_loss=0.3846, over 4098177.70 frames. ], batch size: 55, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:38:26,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=320319.6666666667, ans=0.09899494936611666 2024-09-15 17:38:26,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=22.5 2024-09-15 17:38:33,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320319.6666666667, ans=0.1 2024-09-15 17:38:33,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-15 17:39:21,940 INFO [train.py:1198] (1/2) Epoch 18, batch 4450, loss[loss=0.2551, ctc_loss=0.1729, cr_loss=0.4109, over 20652.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1649, cr_loss=0.3861, over 4103628.80 frames. ], batch size: 68, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:39:25,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=320433.0, ans=0.5 2024-09-15 17:39:42,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=320461.3333333333, ans=0.125 2024-09-15 17:39:49,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.090e+02 2.230e+02 2.433e+02 3.407e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 17:39:58,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=320489.6666666667, ans=0.07 2024-09-15 17:40:42,501 INFO [train.py:1198] (1/2) Epoch 18, batch 4500, loss[loss=0.207, ctc_loss=0.1398, cr_loss=0.3361, over 20883.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1648, cr_loss=0.385, over 4089378.63 frames. ], batch size: 54, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:40:54,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=320574.6666666667, ans=0.025 2024-09-15 17:41:26,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320631.3333333333, ans=0.125 2024-09-15 17:41:43,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-15 17:42:00,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=320688.0, ans=0.125 2024-09-15 17:42:03,596 INFO [train.py:1198] (1/2) Epoch 18, batch 4550, loss[loss=0.2239, ctc_loss=0.1502, cr_loss=0.3687, over 21024.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1645, cr_loss=0.3847, over 4097441.75 frames. ], batch size: 63, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:42:32,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.098e+02 2.267e+02 2.519e+02 7.236e+02, threshold=4.534e+02, percent-clipped=1.0 2024-09-15 17:43:20,492 INFO [train.py:1198] (1/2) Epoch 18, batch 4600, loss[loss=0.2792, ctc_loss=0.1882, cr_loss=0.4547, over 20665.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1647, cr_loss=0.3843, over 4082937.35 frames. ], batch size: 66, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:43:22,503 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:43:42,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-09-15 17:43:52,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-15 17:44:13,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=320943.0, ans=0.125 2024-09-15 17:44:16,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320943.0, ans=0.125 2024-09-15 17:44:19,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=320971.3333333333, ans=0.125 2024-09-15 17:44:36,244 INFO [train.py:1198] (1/2) Epoch 18, batch 4650, loss[loss=0.2378, ctc_loss=0.1604, cr_loss=0.3867, over 20879.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1642, cr_loss=0.3841, over 4098635.45 frames. ], batch size: 54, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:45:05,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.058e+02 2.194e+02 2.426e+02 3.072e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 17:45:22,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=321084.6666666667, ans=0.125 2024-09-15 17:45:53,106 INFO [train.py:1198] (1/2) Epoch 18, batch 4700, loss[loss=0.2577, ctc_loss=0.1756, cr_loss=0.4106, over 21079.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3842, over 4108101.02 frames. ], batch size: 59, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:46:19,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=321169.6666666667, ans=0.2 2024-09-15 17:46:20,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321169.6666666667, ans=0.125 2024-09-15 17:46:23,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321169.6666666667, ans=0.0 2024-09-15 17:46:51,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=321226.3333333333, ans=0.2 2024-09-15 17:47:16,773 INFO [train.py:1198] (1/2) Epoch 18, batch 4750, loss[loss=0.1849, ctc_loss=0.1228, cr_loss=0.3106, over 19845.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1639, cr_loss=0.3835, over 4089146.19 frames. ], batch size: 44, lr: 4.61e-03, grad_scale: 16.0 2024-09-15 17:47:17,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-15 17:47:28,133 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:47:31,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=321311.3333333333, ans=0.0 2024-09-15 17:47:38,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=321311.3333333333, ans=0.125 2024-09-15 17:47:45,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.061e+02 2.212e+02 2.449e+02 5.359e+02, threshold=4.423e+02, percent-clipped=1.0 2024-09-15 17:47:46,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321339.6666666667, ans=0.1 2024-09-15 17:47:55,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=321339.6666666667, ans=0.025 2024-09-15 17:48:13,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321368.0, ans=0.125 2024-09-15 17:48:32,913 INFO [train.py:1198] (1/2) Epoch 18, batch 4800, loss[loss=0.2408, ctc_loss=0.1636, cr_loss=0.3856, over 20979.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1649, cr_loss=0.3843, over 4080363.63 frames. ], batch size: 58, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:48:49,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=321453.0, ans=0.1 2024-09-15 17:48:51,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-09-15 17:49:06,508 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:49:08,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321481.3333333333, ans=0.1 2024-09-15 17:49:22,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-15 17:49:35,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321538.0, ans=0.125 2024-09-15 17:49:48,166 INFO [train.py:1198] (1/2) Epoch 18, batch 4850, loss[loss=0.2547, ctc_loss=0.173, cr_loss=0.4086, over 20962.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3855, over 4087797.87 frames. ], batch size: 58, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:49:59,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321566.3333333333, ans=0.125 2024-09-15 17:50:17,130 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.095e+02 2.202e+02 2.306e+02 3.759e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 17:50:26,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=321623.0, ans=0.2 2024-09-15 17:50:41,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=321651.3333333333, ans=22.5 2024-09-15 17:50:44,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=321651.3333333333, ans=0.0 2024-09-15 17:50:50,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=321679.6666666667, ans=0.125 2024-09-15 17:51:03,725 INFO [train.py:1198] (1/2) Epoch 18, batch 4900, loss[loss=0.2315, ctc_loss=0.1585, cr_loss=0.3653, over 20962.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.165, cr_loss=0.385, over 4088583.02 frames. ], batch size: 49, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:51:07,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-09-15 17:51:23,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321736.3333333333, ans=0.125 2024-09-15 17:51:28,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=321736.3333333333, ans=0.05 2024-09-15 17:51:32,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321764.6666666667, ans=0.125 2024-09-15 17:51:41,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=321764.6666666667, ans=0.125 2024-09-15 17:51:58,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=321793.0, ans=0.025 2024-09-15 17:52:07,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321821.3333333333, ans=0.1 2024-09-15 17:52:17,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.03 vs. limit=10.0 2024-09-15 17:52:19,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321821.3333333333, ans=0.1 2024-09-15 17:52:22,513 INFO [train.py:1198] (1/2) Epoch 18, batch 4950, loss[loss=0.2292, ctc_loss=0.1536, cr_loss=0.3781, over 20951.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1657, cr_loss=0.386, over 4086626.73 frames. ], batch size: 55, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:52:50,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.140e+02 2.303e+02 2.502e+02 3.680e+02, threshold=4.606e+02, percent-clipped=0.0 2024-09-15 17:52:59,785 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:53:05,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321934.6666666667, ans=0.1 2024-09-15 17:53:14,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=321934.6666666667, ans=0.0 2024-09-15 17:53:39,624 INFO [train.py:1198] (1/2) Epoch 18, batch 5000, loss[loss=0.2182, ctc_loss=0.1478, cr_loss=0.3522, over 21040.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1664, cr_loss=0.3872, over 4096195.06 frames. ], batch size: 63, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:53:42,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-09-15 17:54:06,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=322019.6666666667, ans=0.2 2024-09-15 17:54:29,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=322076.3333333333, ans=0.09899494936611666 2024-09-15 17:54:40,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=322104.6666666667, ans=0.0 2024-09-15 17:54:48,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=322104.6666666667, ans=0.07 2024-09-15 17:54:53,785 INFO [train.py:1198] (1/2) Epoch 18, batch 5050, loss[loss=0.269, ctc_loss=0.1878, cr_loss=0.406, over 20840.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1665, cr_loss=0.3867, over 4082147.25 frames. ], batch size: 65, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:55:22,043 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.069e+02 2.175e+02 2.371e+02 4.721e+02, threshold=4.351e+02, percent-clipped=2.0 2024-09-15 17:55:26,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=322189.6666666667, ans=0.125 2024-09-15 17:55:28,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=322189.6666666667, ans=0.125 2024-09-15 17:55:48,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.87 vs. limit=10.0 2024-09-15 17:56:07,964 INFO [train.py:1198] (1/2) Epoch 18, batch 5100, loss[loss=0.218, ctc_loss=0.1466, cr_loss=0.3571, over 20878.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1656, cr_loss=0.3858, over 4088169.39 frames. ], batch size: 54, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:56:46,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=322331.3333333333, ans=0.05 2024-09-15 17:57:03,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=322359.6666666667, ans=0.125 2024-09-15 17:57:10,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=322388.0, ans=0.125 2024-09-15 17:57:22,192 INFO [train.py:1198] (1/2) Epoch 18, batch 5150, loss[loss=0.2652, ctc_loss=0.178, cr_loss=0.4362, over 21078.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1651, cr_loss=0.3843, over 4091297.99 frames. ], batch size: 59, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:57:32,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=322416.3333333333, ans=0.0 2024-09-15 17:57:44,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=322444.6666666667, ans=0.025 2024-09-15 17:57:50,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.046e+02 2.252e+02 2.431e+02 5.879e+02, threshold=4.504e+02, percent-clipped=1.0 2024-09-15 17:58:28,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=322529.6666666667, ans=0.2 2024-09-15 17:58:31,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=322529.6666666667, ans=0.125 2024-09-15 17:58:35,912 INFO [train.py:1198] (1/2) Epoch 18, batch 5200, loss[loss=0.2351, ctc_loss=0.1628, cr_loss=0.3616, over 21027.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.165, cr_loss=0.3837, over 4088484.53 frames. ], batch size: 63, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:58:36,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=322558.0, ans=0.125 2024-09-15 17:58:51,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=322586.3333333333, ans=0.0 2024-09-15 17:59:50,801 INFO [train.py:1198] (1/2) Epoch 18, batch 5250, loss[loss=0.2692, ctc_loss=0.1904, cr_loss=0.3942, over 20101.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1651, cr_loss=0.3834, over 4088417.71 frames. ], batch size: 80, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 17:59:52,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=322699.6666666667, ans=0.2 2024-09-15 17:59:55,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=322699.6666666667, ans=0.0 2024-09-15 18:00:13,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=322728.0, ans=0.0 2024-09-15 18:00:17,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=322728.0, ans=0.025 2024-09-15 18:00:19,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.084e+02 2.230e+02 2.413e+02 7.928e+02, threshold=4.461e+02, percent-clipped=1.0 2024-09-15 18:00:44,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=322784.6666666667, ans=0.2 2024-09-15 18:00:47,518 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:00:59,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=322813.0, ans=0.0 2024-09-15 18:01:04,749 INFO [train.py:1198] (1/2) Epoch 18, batch 5300, loss[loss=0.2104, ctc_loss=0.1394, cr_loss=0.3553, over 20955.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1648, cr_loss=0.3833, over 4085706.93 frames. ], batch size: 48, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:01:20,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=322869.6666666667, ans=0.2 2024-09-15 18:02:06,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322954.6666666667, ans=0.1 2024-09-15 18:02:07,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322954.6666666667, ans=0.1 2024-09-15 18:02:22,296 INFO [train.py:1198] (1/2) Epoch 18, batch 5350, loss[loss=0.2136, ctc_loss=0.142, cr_loss=0.3581, over 20890.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1648, cr_loss=0.3833, over 4076625.48 frames. ], batch size: 54, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:02:27,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=322983.0, ans=15.0 2024-09-15 18:02:28,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=322983.0, ans=0.0 2024-09-15 18:02:30,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=322983.0, ans=0.0 2024-09-15 18:02:34,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=322983.0, ans=0.125 2024-09-15 18:02:50,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323011.3333333333, ans=0.1 2024-09-15 18:02:52,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=323011.3333333333, ans=6.0 2024-09-15 18:02:53,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.130e+02 2.257e+02 2.438e+02 3.200e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-15 18:03:08,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=323068.0, ans=0.5 2024-09-15 18:03:16,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-15 18:03:30,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=323096.3333333333, ans=0.0 2024-09-15 18:03:39,160 INFO [train.py:1198] (1/2) Epoch 18, batch 5400, loss[loss=0.2576, ctc_loss=0.1783, cr_loss=0.397, over 20852.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1658, cr_loss=0.3845, over 4058624.79 frames. ], batch size: 65, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:03:48,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=323124.6666666667, ans=0.125 2024-09-15 18:04:11,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=323181.3333333333, ans=0.02 2024-09-15 18:04:13,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=323181.3333333333, ans=0.2 2024-09-15 18:04:53,229 INFO [train.py:1198] (1/2) Epoch 18, batch 5450, loss[loss=0.199, ctc_loss=0.1323, cr_loss=0.3338, over 21071.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1659, cr_loss=0.3853, over 4071451.48 frames. ], batch size: 53, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:05:14,699 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:05:21,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.113e+02 2.285e+02 2.447e+02 3.054e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-15 18:05:26,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=323323.0, ans=0.125 2024-09-15 18:05:44,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=323351.3333333333, ans=0.125 2024-09-15 18:05:48,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323351.3333333333, ans=0.1 2024-09-15 18:05:50,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2024-09-15 18:05:57,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-15 18:06:07,393 INFO [train.py:1198] (1/2) Epoch 18, batch 5500, loss[loss=0.2236, ctc_loss=0.1503, cr_loss=0.3665, over 20937.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.166, cr_loss=0.3854, over 4077668.82 frames. ], batch size: 60, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:06:15,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=323408.0, ans=0.125 2024-09-15 18:07:05,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=323521.3333333333, ans=0.2 2024-09-15 18:07:19,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=323521.3333333333, ans=0.125 2024-09-15 18:07:21,621 INFO [train.py:1198] (1/2) Epoch 18, batch 5550, loss[loss=0.263, ctc_loss=0.177, cr_loss=0.4299, over 20836.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1657, cr_loss=0.3847, over 4075287.01 frames. ], batch size: 65, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:07:34,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=323549.6666666667, ans=0.125 2024-09-15 18:07:50,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.151e+02 2.305e+02 2.497e+02 4.741e+02, threshold=4.611e+02, percent-clipped=1.0 2024-09-15 18:08:36,722 INFO [train.py:1198] (1/2) Epoch 18, batch 5600, loss[loss=0.2415, ctc_loss=0.167, cr_loss=0.3725, over 20836.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1651, cr_loss=0.3848, over 4090778.54 frames. ], batch size: 59, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:08:53,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323719.6666666667, ans=0.1 2024-09-15 18:08:53,503 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:08:56,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323719.6666666667, ans=0.125 2024-09-15 18:09:28,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-15 18:09:51,287 INFO [train.py:1198] (1/2) Epoch 18, batch 5650, loss[loss=0.2823, ctc_loss=0.1944, cr_loss=0.4399, over 20017.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1661, cr_loss=0.3865, over 4089671.39 frames. ], batch size: 80, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:09:56,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-09-15 18:10:01,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=323833.0, ans=0.07 2024-09-15 18:10:19,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.101e+02 2.246e+02 2.405e+02 3.438e+02, threshold=4.492e+02, percent-clipped=0.0 2024-09-15 18:10:19,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=323889.6666666667, ans=0.0 2024-09-15 18:10:36,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=323918.0, ans=0.0 2024-09-15 18:11:07,564 INFO [train.py:1198] (1/2) Epoch 18, batch 5700, loss[loss=0.2456, ctc_loss=0.1691, cr_loss=0.3827, over 21018.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1656, cr_loss=0.3858, over 4092192.36 frames. ], batch size: 63, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:11:58,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=324059.6666666667, ans=0.125 2024-09-15 18:12:19,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=324088.0, ans=0.125 2024-09-15 18:12:25,116 INFO [train.py:1198] (1/2) Epoch 18, batch 5750, loss[loss=0.2481, ctc_loss=0.1692, cr_loss=0.3946, over 20827.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1653, cr_loss=0.3853, over 4084861.83 frames. ], batch size: 59, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:12:48,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=324144.6666666667, ans=0.125 2024-09-15 18:12:53,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.077e+02 2.203e+02 2.459e+02 3.812e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 18:13:07,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=324173.0, ans=0.0 2024-09-15 18:13:08,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=324201.3333333333, ans=0.0 2024-09-15 18:13:08,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=324201.3333333333, ans=0.125 2024-09-15 18:13:39,862 INFO [train.py:1198] (1/2) Epoch 18, batch 5800, loss[loss=0.2567, ctc_loss=0.175, cr_loss=0.4084, over 19614.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1653, cr_loss=0.3859, over 4094881.80 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:13:49,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=324258.0, ans=0.0 2024-09-15 18:13:59,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=324286.3333333333, ans=0.0 2024-09-15 18:14:02,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324286.3333333333, ans=0.1 2024-09-15 18:14:41,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=324371.3333333333, ans=0.125 2024-09-15 18:14:48,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-15 18:14:53,897 INFO [train.py:1198] (1/2) Epoch 18, batch 5850, loss[loss=0.2296, ctc_loss=0.1516, cr_loss=0.3896, over 20777.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1646, cr_loss=0.3847, over 4093746.80 frames. ], batch size: 53, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:15:12,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-09-15 18:15:22,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.055e+02 2.169e+02 2.308e+02 4.486e+02, threshold=4.338e+02, percent-clipped=1.0 2024-09-15 18:15:29,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2024-09-15 18:15:59,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=324513.0, ans=0.125 2024-09-15 18:15:59,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=324513.0, ans=0.05 2024-09-15 18:16:05,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324513.0, ans=0.1 2024-09-15 18:16:07,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=15.0 2024-09-15 18:16:08,528 INFO [train.py:1198] (1/2) Epoch 18, batch 5900, loss[loss=0.2849, ctc_loss=0.2058, cr_loss=0.3956, over 14073.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1634, cr_loss=0.3831, over 4100927.34 frames. ], batch size: 149, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:16:26,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=324569.6666666667, ans=0.2 2024-09-15 18:17:23,095 INFO [train.py:1198] (1/2) Epoch 18, batch 5950, loss[loss=0.227, ctc_loss=0.1509, cr_loss=0.3805, over 20943.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1638, cr_loss=0.3837, over 4094862.48 frames. ], batch size: 49, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:17:24,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2024-09-15 18:17:27,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=324683.0, ans=0.125 2024-09-15 18:17:51,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.671e+02 2.069e+02 2.193e+02 2.423e+02 7.278e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 18:18:03,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=324739.6666666667, ans=0.0 2024-09-15 18:18:08,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=324768.0, ans=0.125 2024-09-15 18:18:14,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-15 18:18:37,508 INFO [train.py:1198] (1/2) Epoch 18, batch 6000, loss[loss=0.2621, ctc_loss=0.1818, cr_loss=0.4018, over 21032.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1639, cr_loss=0.3836, over 4090819.19 frames. ], batch size: 62, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:18:37,508 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 18:19:01,526 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.04538, ctc_loss=0.04538, cr_loss=1.038e-14, over 944034.00 frames. 2024-09-15 18:19:01,526 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 18:19:03,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-15 18:19:54,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=324909.6666666667, ans=0.2 2024-09-15 18:20:19,146 INFO [train.py:1198] (1/2) Epoch 18, batch 6050, loss[loss=0.2984, ctc_loss=0.2144, cr_loss=0.4201, over 14177.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1634, cr_loss=0.3826, over 4080659.56 frames. ], batch size: 149, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:20:48,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.068e+02 2.218e+02 2.388e+02 4.560e+02, threshold=4.436e+02, percent-clipped=1.0 2024-09-15 18:20:55,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-15 18:21:08,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-15 18:21:34,057 INFO [train.py:1198] (1/2) Epoch 18, batch 6100, loss[loss=0.2288, ctc_loss=0.1549, cr_loss=0.3691, over 20893.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1641, cr_loss=0.3837, over 4088112.85 frames. ], batch size: 54, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:21:55,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-09-15 18:22:29,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-15 18:22:42,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325221.3333333333, ans=0.125 2024-09-15 18:22:48,267 INFO [train.py:1198] (1/2) Epoch 18, batch 6150, loss[loss=0.2276, ctc_loss=0.1567, cr_loss=0.3549, over 20972.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.3828, over 4086012.82 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:23:16,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.030e+02 2.241e+02 2.538e+02 4.663e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-15 18:23:46,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=325363.0, ans=0.125 2024-09-15 18:24:01,774 INFO [train.py:1198] (1/2) Epoch 18, batch 6200, loss[loss=0.2671, ctc_loss=0.1848, cr_loss=0.4114, over 21033.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1649, cr_loss=0.3838, over 4049537.94 frames. ], batch size: 63, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:24:06,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=325391.3333333333, ans=0.125 2024-09-15 18:24:09,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2024-09-15 18:24:34,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=325448.0, ans=0.0 2024-09-15 18:24:34,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=325448.0, ans=0.125 2024-09-15 18:25:07,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=325504.6666666667, ans=0.2 2024-09-15 18:25:14,419 INFO [train.py:1198] (1/2) Epoch 18, batch 6250, loss[loss=0.2615, ctc_loss=0.1824, cr_loss=0.3954, over 19282.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1667, cr_loss=0.3855, over 4024513.69 frames. ], batch size: 90, lr: 4.58e-03, grad_scale: 32.0 2024-09-15 18:25:16,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=325533.0, ans=0.125 2024-09-15 18:25:19,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325533.0, ans=0.1 2024-09-15 18:25:22,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=325533.0, ans=0.125 2024-09-15 18:25:41,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=325561.3333333333, ans=0.2 2024-09-15 18:25:42,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.110e+02 2.222e+02 2.392e+02 5.306e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-15 18:25:44,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=325589.6666666667, ans=12.0 2024-09-15 18:25:58,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=325618.0, ans=0.125 2024-09-15 18:26:04,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=325618.0, ans=0.025 2024-09-15 18:26:08,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=325618.0, ans=0.0 2024-09-15 18:26:08,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=325618.0, ans=0.0 2024-09-15 18:26:11,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=325646.3333333333, ans=0.125 2024-09-15 18:26:27,895 INFO [train.py:1198] (1/2) Epoch 18, batch 6300, loss[loss=0.3286, ctc_loss=0.2405, cr_loss=0.4403, over 14258.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1684, cr_loss=0.3861, over 3979438.41 frames. ], batch size: 150, lr: 4.58e-03, grad_scale: 16.0 2024-09-15 18:26:30,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=325674.6666666667, ans=0.0 2024-09-15 18:26:31,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=325674.6666666667, ans=0.125 2024-09-15 18:27:15,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=325759.6666666667, ans=0.125 2024-09-15 18:27:40,246 INFO [train.py:1198] (1/2) Epoch 18, batch 6350, loss[loss=0.3091, ctc_loss=0.2214, cr_loss=0.4388, over 14334.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1729, cr_loss=0.3886, over 3806915.29 frames. ], batch size: 149, lr: 4.58e-03, grad_scale: 16.0 2024-09-15 18:28:08,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.268e+02 2.509e+02 2.765e+02 3.819e+02, threshold=5.018e+02, percent-clipped=0.0 2024-09-15 18:28:25,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=22.5 2024-09-15 18:28:34,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=325901.3333333333, ans=10.0 2024-09-15 18:29:30,273 INFO [train.py:1198] (1/2) Epoch 19, batch 0, loss[loss=0.2865, ctc_loss=0.1999, cr_loss=0.4331, over 19982.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.1999, cr_loss=0.4331, over 19982.00 frames. ], batch size: 80, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:29:30,273 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 18:29:38,162 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4268, 4.0679, 4.1198, 4.1597], device='cuda:1') 2024-09-15 18:29:48,457 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.04489, ctc_loss=0.04489, cr_loss=1.016e-14, over 944034.00 frames. 2024-09-15 18:29:48,458 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 18:29:54,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325932.5, ans=0.125 2024-09-15 18:30:13,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-15 18:30:43,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326017.5, ans=0.1 2024-09-15 18:31:04,828 INFO [train.py:1198] (1/2) Epoch 19, batch 50, loss[loss=0.2001, ctc_loss=0.1339, cr_loss=0.3312, over 20985.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1637, cr_loss=0.3849, over 934097.98 frames. ], batch size: 51, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:31:08,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=326074.1666666667, ans=0.0 2024-09-15 18:31:20,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=326102.5, ans=0.0 2024-09-15 18:31:36,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=326102.5, ans=0.125 2024-09-15 18:31:37,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=326130.8333333333, ans=0.125 2024-09-15 18:31:52,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.067e+02 2.315e+02 2.668e+02 3.599e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-15 18:32:19,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=22.5 2024-09-15 18:32:24,507 INFO [train.py:1198] (1/2) Epoch 19, batch 100, loss[loss=0.2383, ctc_loss=0.1627, cr_loss=0.3784, over 20938.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1622, cr_loss=0.3819, over 1639589.30 frames. ], batch size: 60, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:32:27,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=326215.8333333333, ans=0.0 2024-09-15 18:32:33,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=22.5 2024-09-15 18:32:36,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=326215.8333333333, ans=0.125 2024-09-15 18:32:40,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:32:44,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=326244.1666666667, ans=0.0 2024-09-15 18:32:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=326244.1666666667, ans=0.05 2024-09-15 18:32:50,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-15 18:33:34,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=326329.1666666667, ans=0.2 2024-09-15 18:33:35,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-09-15 18:33:40,087 INFO [train.py:1198] (1/2) Epoch 19, batch 150, loss[loss=0.2672, ctc_loss=0.181, cr_loss=0.4307, over 20792.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1626, cr_loss=0.3832, over 2196512.54 frames. ], batch size: 56, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:33:49,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326357.5, ans=0.1 2024-09-15 18:34:01,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=326385.8333333333, ans=0.2 2024-09-15 18:34:23,782 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.070e+02 2.223e+02 2.435e+02 4.334e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 18:34:30,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326442.5, ans=0.125 2024-09-15 18:34:37,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=326442.5, ans=0.125 2024-09-15 18:34:42,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=326470.8333333333, ans=0.0 2024-09-15 18:34:52,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=326470.8333333333, ans=0.125 2024-09-15 18:34:52,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=326470.8333333333, ans=0.125 2024-09-15 18:34:54,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=326499.1666666667, ans=0.035 2024-09-15 18:34:55,680 INFO [train.py:1198] (1/2) Epoch 19, batch 200, loss[loss=0.2161, ctc_loss=0.1422, cr_loss=0.3696, over 20457.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1629, cr_loss=0.3839, over 2618158.11 frames. ], batch size: 45, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:34:57,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326499.1666666667, ans=0.125 2024-09-15 18:35:11,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.66 vs. limit=6.0 2024-09-15 18:35:43,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=326584.1666666667, ans=0.2 2024-09-15 18:35:44,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=326584.1666666667, ans=0.1 2024-09-15 18:35:50,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326584.1666666667, ans=0.1 2024-09-15 18:36:14,689 INFO [train.py:1198] (1/2) Epoch 19, batch 250, loss[loss=0.2392, ctc_loss=0.162, cr_loss=0.3863, over 20991.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1626, cr_loss=0.3823, over 2949486.47 frames. ], batch size: 55, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:36:31,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=326669.1666666667, ans=0.0 2024-09-15 18:36:59,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.070e+02 2.225e+02 2.386e+02 3.637e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 18:37:12,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=326725.8333333333, ans=0.0 2024-09-15 18:37:34,111 INFO [train.py:1198] (1/2) Epoch 19, batch 300, loss[loss=0.2228, ctc_loss=0.1497, cr_loss=0.3654, over 20977.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1624, cr_loss=0.3817, over 3210204.53 frames. ], batch size: 49, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:37:37,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326782.5, ans=0.125 2024-09-15 18:37:51,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=326810.8333333333, ans=0.025 2024-09-15 18:38:17,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326867.5, ans=0.1 2024-09-15 18:38:30,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-15 18:38:48,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326924.1666666667, ans=0.125 2024-09-15 18:38:49,191 INFO [train.py:1198] (1/2) Epoch 19, batch 350, loss[loss=0.2161, ctc_loss=0.1441, cr_loss=0.3599, over 20965.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1619, cr_loss=0.3801, over 3411734.27 frames. ], batch size: 48, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:39:06,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-09-15 18:39:32,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.064e+02 2.242e+02 2.483e+02 3.467e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-15 18:39:42,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=327009.1666666667, ans=0.125 2024-09-15 18:40:03,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=327065.8333333333, ans=0.125 2024-09-15 18:40:04,367 INFO [train.py:1198] (1/2) Epoch 19, batch 400, loss[loss=0.2503, ctc_loss=0.1733, cr_loss=0.3854, over 20333.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1627, cr_loss=0.381, over 3565545.10 frames. ], batch size: 74, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:40:13,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327065.8333333333, ans=0.1 2024-09-15 18:40:27,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=327094.1666666667, ans=0.025 2024-09-15 18:40:46,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=22.5 2024-09-15 18:41:07,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=327179.1666666667, ans=0.0 2024-09-15 18:41:08,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=327179.1666666667, ans=0.125 2024-09-15 18:41:17,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=327179.1666666667, ans=0.125 2024-09-15 18:41:23,249 INFO [train.py:1198] (1/2) Epoch 19, batch 450, loss[loss=0.272, ctc_loss=0.1973, cr_loss=0.3734, over 14490.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1633, cr_loss=0.3821, over 3675753.97 frames. ], batch size: 149, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:41:25,105 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:41:37,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327235.8333333333, ans=0.125 2024-09-15 18:41:42,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=327235.8333333333, ans=0.0 2024-09-15 18:42:01,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=327264.1666666667, ans=0.04949747468305833 2024-09-15 18:42:03,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327264.1666666667, ans=0.1 2024-09-15 18:42:06,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=327264.1666666667, ans=0.125 2024-09-15 18:42:07,421 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.072e+02 2.186e+02 2.368e+02 4.597e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-15 18:42:11,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=22.5 2024-09-15 18:42:18,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327292.5, ans=0.125 2024-09-15 18:42:39,553 INFO [train.py:1198] (1/2) Epoch 19, batch 500, loss[loss=0.2447, ctc_loss=0.1677, cr_loss=0.3851, over 21063.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1635, cr_loss=0.3823, over 3776060.99 frames. ], batch size: 53, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:42:58,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=327377.5, ans=0.0 2024-09-15 18:43:06,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=327377.5, ans=0.0 2024-09-15 18:43:58,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327490.8333333333, ans=0.125 2024-09-15 18:44:00,205 INFO [train.py:1198] (1/2) Epoch 19, batch 550, loss[loss=0.2487, ctc_loss=0.171, cr_loss=0.3884, over 20898.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1618, cr_loss=0.38, over 3858603.41 frames. ], batch size: 54, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:44:43,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=327547.5, ans=0.0 2024-09-15 18:44:44,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.086e+02 2.199e+02 2.384e+02 3.572e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-15 18:45:06,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=327604.1666666667, ans=0.0 2024-09-15 18:45:14,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=327604.1666666667, ans=0.0 2024-09-15 18:45:16,774 INFO [train.py:1198] (1/2) Epoch 19, batch 600, loss[loss=0.2191, ctc_loss=0.1491, cr_loss=0.3499, over 21060.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3828, over 3911011.08 frames. ], batch size: 53, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:45:20,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=327632.5, ans=0.09899494936611666 2024-09-15 18:46:32,680 INFO [train.py:1198] (1/2) Epoch 19, batch 650, loss[loss=0.2306, ctc_loss=0.1548, cr_loss=0.3792, over 21055.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1627, cr_loss=0.3826, over 3951599.56 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:46:45,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=327774.1666666667, ans=0.05 2024-09-15 18:46:57,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=327802.5, ans=0.0 2024-09-15 18:47:12,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2024-09-15 18:47:12,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327830.8333333333, ans=0.0 2024-09-15 18:47:19,944 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.059e+02 2.175e+02 2.355e+02 3.259e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-15 18:47:29,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=327859.1666666667, ans=0.0 2024-09-15 18:47:52,145 INFO [train.py:1198] (1/2) Epoch 19, batch 700, loss[loss=0.2341, ctc_loss=0.1592, cr_loss=0.3745, over 20827.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1643, cr_loss=0.3848, over 3962637.92 frames. ], batch size: 59, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:48:35,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=327972.5, ans=0.125 2024-09-15 18:48:56,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328029.1666666667, ans=0.1 2024-09-15 18:49:02,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=328029.1666666667, ans=0.125 2024-09-15 18:49:12,595 INFO [train.py:1198] (1/2) Epoch 19, batch 750, loss[loss=0.2073, ctc_loss=0.14, cr_loss=0.3366, over 20960.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1641, cr_loss=0.3847, over 3996847.05 frames. ], batch size: 50, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:49:32,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328085.8333333333, ans=0.1 2024-09-15 18:49:44,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-15 18:49:52,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=328114.1666666667, ans=0.2 2024-09-15 18:49:56,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.055e+02 2.153e+02 2.349e+02 2.781e+02, threshold=4.305e+02, percent-clipped=0.0 2024-09-15 18:50:28,566 INFO [train.py:1198] (1/2) Epoch 19, batch 800, loss[loss=0.26, ctc_loss=0.1811, cr_loss=0.3944, over 20925.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1637, cr_loss=0.3838, over 4021371.42 frames. ], batch size: 60, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:50:43,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2024-09-15 18:51:00,631 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:51:32,900 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:51:37,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=328312.5, ans=0.125 2024-09-15 18:51:44,705 INFO [train.py:1198] (1/2) Epoch 19, batch 850, loss[loss=0.2592, ctc_loss=0.1783, cr_loss=0.4044, over 21035.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1637, cr_loss=0.3843, over 4040294.92 frames. ], batch size: 62, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:51:51,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2024-09-15 18:52:31,518 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.077e+02 2.181e+02 2.349e+02 4.241e+02, threshold=4.361e+02, percent-clipped=0.0 2024-09-15 18:52:39,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=328425.8333333333, ans=0.0 2024-09-15 18:53:03,091 INFO [train.py:1198] (1/2) Epoch 19, batch 900, loss[loss=0.2462, ctc_loss=0.1694, cr_loss=0.3837, over 20969.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1645, cr_loss=0.3845, over 4044421.79 frames. ], batch size: 58, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:53:17,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=328510.8333333333, ans=0.0 2024-09-15 18:53:21,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=328510.8333333333, ans=0.125 2024-09-15 18:53:22,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2024-09-15 18:53:29,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=328510.8333333333, ans=0.125 2024-09-15 18:53:29,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=12.0 2024-09-15 18:53:38,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=328539.1666666667, ans=0.2 2024-09-15 18:53:41,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-09-15 18:54:19,094 INFO [train.py:1198] (1/2) Epoch 19, batch 950, loss[loss=0.2518, ctc_loss=0.1721, cr_loss=0.3986, over 20872.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1639, cr_loss=0.3842, over 4060457.33 frames. ], batch size: 65, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:54:30,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=328624.1666666667, ans=0.0 2024-09-15 18:54:41,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=328652.5, ans=0.0 2024-09-15 18:54:45,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-15 18:55:08,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.082e+02 2.206e+02 2.383e+02 4.044e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-15 18:55:40,829 INFO [train.py:1198] (1/2) Epoch 19, batch 1000, loss[loss=0.2471, ctc_loss=0.1686, cr_loss=0.3922, over 20673.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3853, over 4072013.85 frames. ], batch size: 71, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:55:55,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=328794.1666666667, ans=0.2 2024-09-15 18:56:07,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=328794.1666666667, ans=0.125 2024-09-15 18:56:22,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=328822.5, ans=0.125 2024-09-15 18:56:57,469 INFO [train.py:1198] (1/2) Epoch 19, batch 1050, loss[loss=0.2379, ctc_loss=0.1625, cr_loss=0.3773, over 21063.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1646, cr_loss=0.385, over 4072832.83 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:57:41,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.069e+02 2.181e+02 2.330e+02 6.080e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-15 18:57:46,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328992.5, ans=0.1 2024-09-15 18:58:08,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-15 18:58:12,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329020.8333333333, ans=0.0 2024-09-15 18:58:15,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=329049.1666666667, ans=0.125 2024-09-15 18:58:16,780 INFO [train.py:1198] (1/2) Epoch 19, batch 1100, loss[loss=0.2035, ctc_loss=0.139, cr_loss=0.323, over 20976.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1636, cr_loss=0.3832, over 4078444.06 frames. ], batch size: 48, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:58:29,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-15 18:58:52,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=329105.8333333333, ans=0.125 2024-09-15 18:59:30,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=329162.5, ans=0.025 2024-09-15 18:59:33,456 INFO [train.py:1198] (1/2) Epoch 19, batch 1150, loss[loss=0.2704, ctc_loss=0.186, cr_loss=0.4218, over 21005.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1635, cr_loss=0.3829, over 4092003.82 frames. ], batch size: 63, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 19:00:18,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.059e+02 2.192e+02 2.368e+02 3.140e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 19:00:28,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-09-15 19:00:43,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-09-15 19:00:53,977 INFO [train.py:1198] (1/2) Epoch 19, batch 1200, loss[loss=0.2365, ctc_loss=0.1613, cr_loss=0.3761, over 21051.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1639, cr_loss=0.3834, over 4086994.36 frames. ], batch size: 56, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:00:54,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=329332.5, ans=0.125 2024-09-15 19:00:57,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=329332.5, ans=0.04949747468305833 2024-09-15 19:01:20,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-09-15 19:01:29,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-15 19:01:49,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329417.5, ans=0.125 2024-09-15 19:01:52,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=329417.5, ans=0.0 2024-09-15 19:02:10,751 INFO [train.py:1198] (1/2) Epoch 19, batch 1250, loss[loss=0.2273, ctc_loss=0.1538, cr_loss=0.3677, over 21011.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1626, cr_loss=0.3815, over 4090375.01 frames. ], batch size: 61, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:02:54,871 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.051e+02 2.171e+02 2.356e+02 4.459e+02, threshold=4.341e+02, percent-clipped=1.0 2024-09-15 19:03:01,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-15 19:03:02,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=329559.1666666667, ans=0.2 2024-09-15 19:03:26,441 INFO [train.py:1198] (1/2) Epoch 19, batch 1300, loss[loss=0.2211, ctc_loss=0.147, cr_loss=0.3703, over 19898.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1619, cr_loss=0.3808, over 4099632.97 frames. ], batch size: 44, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:04:04,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=329672.5, ans=0.125 2024-09-15 19:04:45,380 INFO [train.py:1198] (1/2) Epoch 19, batch 1350, loss[loss=0.2228, ctc_loss=0.1524, cr_loss=0.3521, over 20927.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.162, cr_loss=0.3819, over 4104939.27 frames. ], batch size: 60, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:04:58,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329757.5, ans=0.1 2024-09-15 19:05:11,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-15 19:05:26,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=329814.1666666667, ans=0.0 2024-09-15 19:05:29,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.015e+02 2.192e+02 2.414e+02 3.289e+02, threshold=4.384e+02, percent-clipped=0.0 2024-09-15 19:05:40,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329842.5, ans=0.0 2024-09-15 19:05:57,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=329870.8333333333, ans=0.125 2024-09-15 19:06:01,866 INFO [train.py:1198] (1/2) Epoch 19, batch 1400, loss[loss=0.2787, ctc_loss=0.1921, cr_loss=0.4329, over 20087.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1625, cr_loss=0.3827, over 4112288.80 frames. ], batch size: 80, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:07:08,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330012.5, ans=0.1 2024-09-15 19:07:19,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330040.8333333333, ans=0.125 2024-09-15 19:07:20,984 INFO [train.py:1198] (1/2) Epoch 19, batch 1450, loss[loss=0.2551, ctc_loss=0.1772, cr_loss=0.3894, over 20095.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.162, cr_loss=0.3817, over 4119153.13 frames. ], batch size: 80, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:07:30,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-15 19:07:41,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=330069.1666666667, ans=0.125 2024-09-15 19:08:04,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.034e+02 2.159e+02 2.290e+02 3.110e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 19:08:29,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=330154.1666666667, ans=0.0 2024-09-15 19:08:35,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330182.5, ans=0.125 2024-09-15 19:08:36,557 INFO [train.py:1198] (1/2) Epoch 19, batch 1500, loss[loss=0.2639, ctc_loss=0.1811, cr_loss=0.4144, over 19350.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1626, cr_loss=0.3834, over 4127956.13 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:08:44,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=330182.5, ans=0.0 2024-09-15 19:09:25,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=330267.5, ans=0.0 2024-09-15 19:09:26,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=330267.5, ans=0.2 2024-09-15 19:09:42,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=330295.8333333333, ans=0.0 2024-09-15 19:09:49,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=330295.8333333333, ans=0.125 2024-09-15 19:09:55,204 INFO [train.py:1198] (1/2) Epoch 19, batch 1550, loss[loss=0.244, ctc_loss=0.1632, cr_loss=0.4038, over 20987.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1638, cr_loss=0.3849, over 4125023.77 frames. ], batch size: 55, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:10:39,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.059e+02 2.184e+02 2.376e+02 3.606e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-15 19:11:11,095 INFO [train.py:1198] (1/2) Epoch 19, batch 1600, loss[loss=0.2547, ctc_loss=0.1746, cr_loss=0.4003, over 19536.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1636, cr_loss=0.3838, over 4109422.22 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:11:44,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=330522.5, ans=0.125 2024-09-15 19:12:17,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=330579.1666666667, ans=0.125 2024-09-15 19:12:24,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=330579.1666666667, ans=0.125 2024-09-15 19:12:29,777 INFO [train.py:1198] (1/2) Epoch 19, batch 1650, loss[loss=0.24, ctc_loss=0.1626, cr_loss=0.3869, over 20894.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1637, cr_loss=0.3842, over 4106310.32 frames. ], batch size: 57, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:12:50,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=330635.8333333333, ans=0.0 2024-09-15 19:13:14,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.089e+02 2.262e+02 2.450e+02 6.551e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-15 19:13:14,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=330692.5, ans=10.0 2024-09-15 19:13:16,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330692.5, ans=0.1 2024-09-15 19:13:31,135 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:13:35,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=330720.8333333333, ans=0.2 2024-09-15 19:13:45,978 INFO [train.py:1198] (1/2) Epoch 19, batch 1700, loss[loss=0.2543, ctc_loss=0.1717, cr_loss=0.4128, over 19279.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1632, cr_loss=0.3842, over 4118458.05 frames. ], batch size: 90, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:14:16,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330805.8333333333, ans=0.125 2024-09-15 19:14:17,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330805.8333333333, ans=0.125 2024-09-15 19:14:19,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=330805.8333333333, ans=0.125 2024-09-15 19:14:31,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330834.1666666667, ans=0.1 2024-09-15 19:14:56,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=330862.5, ans=0.035 2024-09-15 19:15:00,653 INFO [train.py:1198] (1/2) Epoch 19, batch 1750, loss[loss=0.1993, ctc_loss=0.1356, cr_loss=0.3187, over 20996.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1624, cr_loss=0.3832, over 4120137.58 frames. ], batch size: 48, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:15:16,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=330919.1666666667, ans=0.2 2024-09-15 19:15:19,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=330919.1666666667, ans=0.2 2024-09-15 19:15:47,975 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.058e+02 2.227e+02 2.416e+02 3.846e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-15 19:15:57,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=330975.8333333333, ans=0.0 2024-09-15 19:16:04,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=331004.1666666667, ans=0.0 2024-09-15 19:16:19,290 INFO [train.py:1198] (1/2) Epoch 19, batch 1800, loss[loss=0.2119, ctc_loss=0.1425, cr_loss=0.3466, over 21044.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1633, cr_loss=0.3848, over 4100986.11 frames. ], batch size: 53, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:16:30,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=331032.5, ans=0.125 2024-09-15 19:16:54,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=331089.1666666667, ans=0.125 2024-09-15 19:17:13,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=331117.5, ans=0.0 2024-09-15 19:17:15,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=331117.5, ans=0.125 2024-09-15 19:17:24,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331145.8333333333, ans=0.125 2024-09-15 19:17:37,448 INFO [train.py:1198] (1/2) Epoch 19, batch 1850, loss[loss=0.2388, ctc_loss=0.1625, cr_loss=0.3819, over 20899.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1635, cr_loss=0.3842, over 4092578.47 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:17:37,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331174.1666666667, ans=0.1 2024-09-15 19:17:42,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=331174.1666666667, ans=0.0 2024-09-15 19:17:46,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=331174.1666666667, ans=0.125 2024-09-15 19:18:08,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331230.8333333333, ans=0.1 2024-09-15 19:18:09,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331230.8333333333, ans=0.1 2024-09-15 19:18:21,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.060e+02 2.239e+02 2.481e+02 5.048e+02, threshold=4.478e+02, percent-clipped=2.0 2024-09-15 19:18:53,318 INFO [train.py:1198] (1/2) Epoch 19, batch 1900, loss[loss=0.2961, ctc_loss=0.2124, cr_loss=0.4183, over 14413.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1635, cr_loss=0.3837, over 4090842.25 frames. ], batch size: 150, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:19:16,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=331344.1666666667, ans=0.0 2024-09-15 19:19:35,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331372.5, ans=0.1 2024-09-15 19:19:35,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331372.5, ans=0.1 2024-09-15 19:19:38,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 19:19:47,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=331400.8333333333, ans=0.125 2024-09-15 19:20:08,992 INFO [train.py:1198] (1/2) Epoch 19, batch 1950, loss[loss=0.2521, ctc_loss=0.1728, cr_loss=0.3967, over 20959.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1636, cr_loss=0.3834, over 4075829.29 frames. ], batch size: 64, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:20:18,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2024-09-15 19:20:52,594 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.129e+02 2.298e+02 2.458e+02 3.297e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-15 19:20:56,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2024-09-15 19:21:04,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=331542.5, ans=0.2 2024-09-15 19:21:27,170 INFO [train.py:1198] (1/2) Epoch 19, batch 2000, loss[loss=0.2476, ctc_loss=0.1682, cr_loss=0.3971, over 20940.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1642, cr_loss=0.3846, over 4082733.38 frames. ], batch size: 60, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:21:48,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-09-15 19:21:57,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331655.8333333333, ans=0.1 2024-09-15 19:22:17,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=331684.1666666667, ans=0.0 2024-09-15 19:22:34,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331712.5, ans=0.125 2024-09-15 19:22:43,052 INFO [train.py:1198] (1/2) Epoch 19, batch 2050, loss[loss=0.2283, ctc_loss=0.1525, cr_loss=0.379, over 20981.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1646, cr_loss=0.3844, over 4076914.37 frames. ], batch size: 52, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:23:21,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.36 vs. limit=22.5 2024-09-15 19:23:29,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.010e+02 2.163e+02 2.371e+02 3.289e+02, threshold=4.327e+02, percent-clipped=0.0 2024-09-15 19:23:39,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=331825.8333333333, ans=0.0 2024-09-15 19:23:55,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=331854.1666666667, ans=0.125 2024-09-15 19:24:01,342 INFO [train.py:1198] (1/2) Epoch 19, batch 2100, loss[loss=0.2923, ctc_loss=0.2004, cr_loss=0.4594, over 20630.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1641, cr_loss=0.384, over 4080880.99 frames. ], batch size: 66, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:24:18,297 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:24:40,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331939.1666666667, ans=0.1 2024-09-15 19:25:13,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2024-09-15 19:25:15,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332024.1666666667, ans=0.1 2024-09-15 19:25:16,813 INFO [train.py:1198] (1/2) Epoch 19, batch 2150, loss[loss=0.2434, ctc_loss=0.1646, cr_loss=0.3942, over 20821.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1637, cr_loss=0.3827, over 4078314.07 frames. ], batch size: 59, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:26:00,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.023e+02 2.181e+02 2.338e+02 3.449e+02, threshold=4.361e+02, percent-clipped=0.0 2024-09-15 19:26:03,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=332109.1666666667, ans=0.0 2024-09-15 19:26:23,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=332137.5, ans=0.0 2024-09-15 19:26:26,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=332137.5, ans=0.0 2024-09-15 19:26:32,079 INFO [train.py:1198] (1/2) Epoch 19, batch 2200, loss[loss=0.2594, ctc_loss=0.178, cr_loss=0.4073, over 20864.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1638, cr_loss=0.3828, over 4078679.60 frames. ], batch size: 65, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:26:36,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=332165.8333333333, ans=0.125 2024-09-15 19:27:05,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=332222.5, ans=0.0 2024-09-15 19:27:11,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=332222.5, ans=0.125 2024-09-15 19:27:16,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332222.5, ans=0.1 2024-09-15 19:27:29,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=332250.8333333333, ans=0.125 2024-09-15 19:27:45,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332279.1666666667, ans=0.125 2024-09-15 19:27:50,695 INFO [train.py:1198] (1/2) Epoch 19, batch 2250, loss[loss=0.2112, ctc_loss=0.1377, cr_loss=0.3672, over 20774.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1639, cr_loss=0.3837, over 4090637.85 frames. ], batch size: 53, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:28:29,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=332364.1666666667, ans=0.125 2024-09-15 19:28:35,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.053e+02 2.220e+02 2.394e+02 3.283e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 19:28:52,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=332420.8333333333, ans=0.0 2024-09-15 19:28:53,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=332420.8333333333, ans=0.0 2024-09-15 19:29:09,773 INFO [train.py:1198] (1/2) Epoch 19, batch 2300, loss[loss=0.2417, ctc_loss=0.1654, cr_loss=0.3817, over 20883.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1633, cr_loss=0.383, over 4090607.58 frames. ], batch size: 57, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:30:05,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-15 19:30:25,019 INFO [train.py:1198] (1/2) Epoch 19, batch 2350, loss[loss=0.2559, ctc_loss=0.1751, cr_loss=0.4039, over 21072.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1631, cr_loss=0.3827, over 4097010.70 frames. ], batch size: 59, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:30:49,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=332619.1666666667, ans=0.2 2024-09-15 19:30:57,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-15 19:31:09,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.073e+02 2.181e+02 2.410e+02 3.114e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-15 19:31:20,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2024-09-15 19:31:40,779 INFO [train.py:1198] (1/2) Epoch 19, batch 2400, loss[loss=0.2402, ctc_loss=0.1627, cr_loss=0.387, over 20673.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1639, cr_loss=0.3838, over 4085158.25 frames. ], batch size: 66, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:31:47,224 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:32:15,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=332789.1666666667, ans=0.125 2024-09-15 19:32:43,194 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:32:59,210 INFO [train.py:1198] (1/2) Epoch 19, batch 2450, loss[loss=0.2833, ctc_loss=0.1967, cr_loss=0.4331, over 20036.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3842, over 4086835.59 frames. ], batch size: 80, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:33:04,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=332874.1666666667, ans=0.125 2024-09-15 19:33:05,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332874.1666666667, ans=0.125 2024-09-15 19:33:18,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-15 19:33:38,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=12.0 2024-09-15 19:33:44,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.670e+02 2.073e+02 2.210e+02 2.379e+02 3.922e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-15 19:33:49,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332959.1666666667, ans=0.1 2024-09-15 19:34:04,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=332987.5, ans=0.125 2024-09-15 19:34:04,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=332987.5, ans=10.0 2024-09-15 19:34:11,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=332987.5, ans=0.07 2024-09-15 19:34:14,671 INFO [train.py:1198] (1/2) Epoch 19, batch 2500, loss[loss=0.2487, ctc_loss=0.1712, cr_loss=0.3875, over 20630.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1651, cr_loss=0.3862, over 4082386.50 frames. ], batch size: 68, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:34:56,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2024-09-15 19:35:10,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=333100.8333333333, ans=0.0 2024-09-15 19:35:13,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=333100.8333333333, ans=0.125 2024-09-15 19:35:34,245 INFO [train.py:1198] (1/2) Epoch 19, batch 2550, loss[loss=0.2361, ctc_loss=0.1625, cr_loss=0.368, over 20844.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1643, cr_loss=0.3847, over 4086255.95 frames. ], batch size: 59, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:35:46,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=333157.5, ans=0.125 2024-09-15 19:36:04,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=333214.1666666667, ans=0.2 2024-09-15 19:36:21,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.108e+02 2.254e+02 2.456e+02 3.449e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-15 19:36:23,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=333242.5, ans=0.125 2024-09-15 19:36:35,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=333270.8333333333, ans=0.125 2024-09-15 19:36:50,588 INFO [train.py:1198] (1/2) Epoch 19, batch 2600, loss[loss=0.2475, ctc_loss=0.1688, cr_loss=0.3935, over 21007.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1634, cr_loss=0.3834, over 4096193.51 frames. ], batch size: 63, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:37:09,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=333327.5, ans=0.2 2024-09-15 19:37:49,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=333412.5, ans=0.0 2024-09-15 19:38:08,873 INFO [train.py:1198] (1/2) Epoch 19, batch 2650, loss[loss=0.294, ctc_loss=0.21, cr_loss=0.4201, over 14169.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1624, cr_loss=0.3814, over 4084454.82 frames. ], batch size: 149, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:38:48,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=333497.5, ans=10.0 2024-09-15 19:38:50,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=333497.5, ans=0.5 2024-09-15 19:38:55,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.099e+02 2.211e+02 2.321e+02 3.833e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-15 19:39:22,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-15 19:39:24,365 INFO [train.py:1198] (1/2) Epoch 19, batch 2700, loss[loss=0.2507, ctc_loss=0.1691, cr_loss=0.4079, over 21039.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1624, cr_loss=0.3816, over 4088244.34 frames. ], batch size: 56, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:39:47,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=333610.8333333333, ans=0.125 2024-09-15 19:39:49,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-15 19:40:20,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=333667.5, ans=0.2 2024-09-15 19:40:40,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-15 19:40:43,007 INFO [train.py:1198] (1/2) Epoch 19, batch 2750, loss[loss=0.232, ctc_loss=0.1568, cr_loss=0.3763, over 20987.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1613, cr_loss=0.3799, over 4091874.40 frames. ], batch size: 55, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:41:10,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2024-09-15 19:41:29,867 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.091e+02 2.189e+02 2.423e+02 3.049e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 19:41:30,344 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:41:42,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=333837.5, ans=0.125 2024-09-15 19:41:58,691 INFO [train.py:1198] (1/2) Epoch 19, batch 2800, loss[loss=0.2294, ctc_loss=0.155, cr_loss=0.3716, over 20986.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1623, cr_loss=0.3814, over 4091966.37 frames. ], batch size: 64, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:42:56,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-15 19:43:14,233 INFO [train.py:1198] (1/2) Epoch 19, batch 2850, loss[loss=0.2945, ctc_loss=0.1998, cr_loss=0.4738, over 20855.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1612, cr_loss=0.3799, over 4103061.96 frames. ], batch size: 65, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:44:04,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.046e+02 2.163e+02 2.323e+02 4.362e+02, threshold=4.326e+02, percent-clipped=0.0 2024-09-15 19:44:12,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-15 19:44:19,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:44:27,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=334120.8333333333, ans=0.0 2024-09-15 19:44:32,934 INFO [train.py:1198] (1/2) Epoch 19, batch 2900, loss[loss=0.2132, ctc_loss=0.1431, cr_loss=0.3507, over 21069.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1609, cr_loss=0.3797, over 4109609.95 frames. ], batch size: 53, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:44:34,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334149.1666666667, ans=0.1 2024-09-15 19:44:58,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=334177.5, ans=0.125 2024-09-15 19:45:29,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=334234.1666666667, ans=0.125 2024-09-15 19:45:38,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334262.5, ans=0.125 2024-09-15 19:45:48,750 INFO [train.py:1198] (1/2) Epoch 19, batch 2950, loss[loss=0.2366, ctc_loss=0.1605, cr_loss=0.3804, over 20971.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1619, cr_loss=0.3814, over 4101052.25 frames. ], batch size: 58, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:46:00,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.94 vs. limit=10.0 2024-09-15 19:46:04,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=334319.1666666667, ans=0.2 2024-09-15 19:46:36,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=334375.8333333333, ans=0.125 2024-09-15 19:46:38,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.074e+02 2.223e+02 2.386e+02 3.259e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 19:47:07,108 INFO [train.py:1198] (1/2) Epoch 19, batch 3000, loss[loss=0.2341, ctc_loss=0.157, cr_loss=0.3857, over 20980.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1626, cr_loss=0.383, over 4096706.90 frames. ], batch size: 58, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:47:07,109 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 19:47:28,547 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.04462, ctc_loss=0.04462, cr_loss=1.059e-14, over 944034.00 frames. 2024-09-15 19:47:28,547 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 19:48:30,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=334545.8333333333, ans=0.0 2024-09-15 19:48:43,608 INFO [train.py:1198] (1/2) Epoch 19, batch 3050, loss[loss=0.2151, ctc_loss=0.147, cr_loss=0.3407, over 20964.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.163, cr_loss=0.3833, over 4100625.44 frames. ], batch size: 51, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:49:06,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334602.5, ans=0.1 2024-09-15 19:49:08,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-15 19:49:33,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.080e+02 2.248e+02 2.510e+02 8.196e+02, threshold=4.496e+02, percent-clipped=1.0 2024-09-15 19:49:41,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=334659.1666666667, ans=0.0 2024-09-15 19:50:02,717 INFO [train.py:1198] (1/2) Epoch 19, batch 3100, loss[loss=0.194, ctc_loss=0.1284, cr_loss=0.3283, over 20954.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3833, over 4101272.86 frames. ], batch size: 49, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:50:04,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=334715.8333333333, ans=0.2 2024-09-15 19:50:16,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=334744.1666666667, ans=0.0 2024-09-15 19:50:24,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-15 19:50:49,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-15 19:51:07,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=334829.1666666667, ans=0.0 2024-09-15 19:51:19,285 INFO [train.py:1198] (1/2) Epoch 19, batch 3150, loss[loss=0.1993, ctc_loss=0.1336, cr_loss=0.3281, over 20960.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1642, cr_loss=0.3845, over 4095562.38 frames. ], batch size: 49, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:52:08,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.084e+02 2.200e+02 2.399e+02 3.213e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-15 19:52:20,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-15 19:52:37,491 INFO [train.py:1198] (1/2) Epoch 19, batch 3200, loss[loss=0.2592, ctc_loss=0.1795, cr_loss=0.3988, over 19963.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1644, cr_loss=0.3854, over 4090690.05 frames. ], batch size: 80, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:52:52,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335027.5, ans=0.125 2024-09-15 19:53:06,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=335055.8333333333, ans=0.1 2024-09-15 19:53:53,186 INFO [train.py:1198] (1/2) Epoch 19, batch 3250, loss[loss=0.2962, ctc_loss=0.2121, cr_loss=0.4208, over 14708.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1646, cr_loss=0.3854, over 4084828.80 frames. ], batch size: 152, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:54:12,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=335169.1666666667, ans=0.125 2024-09-15 19:54:40,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.017e+02 2.171e+02 2.442e+02 9.883e+02, threshold=4.342e+02, percent-clipped=1.0 2024-09-15 19:54:52,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335254.1666666667, ans=0.125 2024-09-15 19:55:08,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=335254.1666666667, ans=0.025 2024-09-15 19:55:11,520 INFO [train.py:1198] (1/2) Epoch 19, batch 3300, loss[loss=0.2703, ctc_loss=0.1869, cr_loss=0.4169, over 18646.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.164, cr_loss=0.3846, over 4081833.08 frames. ], batch size: 108, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:55:14,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335282.5, ans=0.125 2024-09-15 19:55:53,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335339.1666666667, ans=0.0 2024-09-15 19:55:58,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2024-09-15 19:56:16,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=335395.8333333333, ans=0.2 2024-09-15 19:56:26,128 INFO [train.py:1198] (1/2) Epoch 19, batch 3350, loss[loss=0.2866, ctc_loss=0.1974, cr_loss=0.4462, over 20140.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.164, cr_loss=0.3846, over 4083818.75 frames. ], batch size: 80, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:56:32,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=335424.1666666667, ans=0.0 2024-09-15 19:56:36,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-15 19:56:50,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335452.5, ans=0.0 2024-09-15 19:57:04,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=335480.8333333333, ans=0.0 2024-09-15 19:57:13,074 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.093e+02 2.200e+02 2.394e+02 8.592e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 19:57:44,882 INFO [train.py:1198] (1/2) Epoch 19, batch 3400, loss[loss=0.2118, ctc_loss=0.1435, cr_loss=0.3411, over 20981.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1633, cr_loss=0.3841, over 4098191.40 frames. ], batch size: 48, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:59:00,740 INFO [train.py:1198] (1/2) Epoch 19, batch 3450, loss[loss=0.2539, ctc_loss=0.1762, cr_loss=0.3886, over 19547.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3837, over 4105308.49 frames. ], batch size: 90, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:59:22,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=335735.8333333333, ans=0.0 2024-09-15 19:59:23,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=335735.8333333333, ans=0.125 2024-09-15 19:59:28,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=335735.8333333333, ans=0.0 2024-09-15 19:59:49,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.053e+02 2.152e+02 2.370e+02 2.979e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 20:00:07,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335820.8333333333, ans=0.1 2024-09-15 20:00:16,498 INFO [train.py:1198] (1/2) Epoch 19, batch 3500, loss[loss=0.2536, ctc_loss=0.172, cr_loss=0.4081, over 21016.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3841, over 4099680.01 frames. ], batch size: 63, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:00:18,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=335849.1666666667, ans=0.2 2024-09-15 20:00:49,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=335905.8333333333, ans=0.2 2024-09-15 20:01:11,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=335934.1666666667, ans=0.0 2024-09-15 20:01:18,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=335962.5, ans=0.2 2024-09-15 20:01:25,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2024-09-15 20:01:35,115 INFO [train.py:1198] (1/2) Epoch 19, batch 3550, loss[loss=0.2846, ctc_loss=0.1988, cr_loss=0.429, over 19987.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3833, over 4111926.94 frames. ], batch size: 80, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:01:44,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-15 20:02:23,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.054e+02 2.146e+02 2.355e+02 6.585e+02, threshold=4.293e+02, percent-clipped=1.0 2024-09-15 20:02:41,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=336104.1666666667, ans=0.125 2024-09-15 20:02:51,197 INFO [train.py:1198] (1/2) Epoch 19, batch 3600, loss[loss=0.2609, ctc_loss=0.1751, cr_loss=0.4289, over 20976.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.163, cr_loss=0.3829, over 4103442.11 frames. ], batch size: 64, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 20:03:25,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=336189.1666666667, ans=0.05 2024-09-15 20:03:26,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=336189.1666666667, ans=0.125 2024-09-15 20:04:08,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=12.0 2024-09-15 20:04:10,440 INFO [train.py:1198] (1/2) Epoch 19, batch 3650, loss[loss=0.2545, ctc_loss=0.1751, cr_loss=0.3969, over 20945.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1631, cr_loss=0.383, over 4105979.71 frames. ], batch size: 60, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:04:44,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2024-09-15 20:05:00,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.085e+02 2.200e+02 2.356e+02 3.013e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-15 20:05:21,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=336387.5, ans=0.0 2024-09-15 20:05:25,984 INFO [train.py:1198] (1/2) Epoch 19, batch 3700, loss[loss=0.2491, ctc_loss=0.1701, cr_loss=0.3954, over 20660.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.163, cr_loss=0.3824, over 4097935.56 frames. ], batch size: 68, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:05:30,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=336415.8333333333, ans=0.0 2024-09-15 20:06:37,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336529.1666666667, ans=0.125 2024-09-15 20:06:40,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=336529.1666666667, ans=0.0 2024-09-15 20:06:44,473 INFO [train.py:1198] (1/2) Epoch 19, batch 3750, loss[loss=0.2773, ctc_loss=0.1903, cr_loss=0.4352, over 20870.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3829, over 4102556.24 frames. ], batch size: 65, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:06:49,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=336557.5, ans=0.125 2024-09-15 20:07:04,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336585.8333333333, ans=0.1 2024-09-15 20:07:24,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336614.1666666667, ans=0.1 2024-09-15 20:07:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336642.5, ans=0.1 2024-09-15 20:07:34,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.024e+02 2.132e+02 2.275e+02 4.276e+02, threshold=4.264e+02, percent-clipped=0.0 2024-09-15 20:07:36,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=336642.5, ans=0.125 2024-09-15 20:08:00,520 INFO [train.py:1198] (1/2) Epoch 19, batch 3800, loss[loss=0.2231, ctc_loss=0.149, cr_loss=0.3705, over 20804.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.383, over 4084365.40 frames. ], batch size: 53, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:08:09,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336699.1666666667, ans=0.1 2024-09-15 20:08:14,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=336727.5, ans=0.2 2024-09-15 20:08:27,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336727.5, ans=0.1 2024-09-15 20:08:43,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=336755.8333333333, ans=0.2 2024-09-15 20:09:19,770 INFO [train.py:1198] (1/2) Epoch 19, batch 3850, loss[loss=0.2288, ctc_loss=0.1571, cr_loss=0.3583, over 20957.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.162, cr_loss=0.3817, over 4092520.49 frames. ], batch size: 51, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:09:20,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=336840.8333333333, ans=0.05 2024-09-15 20:09:35,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=336869.1666666667, ans=0.2 2024-09-15 20:10:09,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.013e+02 2.230e+02 2.392e+02 3.805e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-15 20:10:26,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=336954.1666666667, ans=0.2 2024-09-15 20:10:32,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=336954.1666666667, ans=0.125 2024-09-15 20:10:35,114 INFO [train.py:1198] (1/2) Epoch 19, batch 3900, loss[loss=0.2297, ctc_loss=0.1551, cr_loss=0.3729, over 20884.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1627, cr_loss=0.3829, over 4080120.32 frames. ], batch size: 54, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:10:35,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2024-09-15 20:10:39,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.63 vs. limit=10.0 2024-09-15 20:10:42,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=336982.5, ans=0.125 2024-09-15 20:10:43,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=336982.5, ans=0.125 2024-09-15 20:11:19,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=337067.5, ans=0.125 2024-09-15 20:11:50,981 INFO [train.py:1198] (1/2) Epoch 19, batch 3950, loss[loss=0.2451, ctc_loss=0.1668, cr_loss=0.3915, over 20836.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.163, cr_loss=0.3835, over 4082995.65 frames. ], batch size: 65, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:11:54,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=337124.1666666667, ans=0.0 2024-09-15 20:11:57,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=337124.1666666667, ans=0.125 2024-09-15 20:12:16,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=337152.5, ans=0.125 2024-09-15 20:12:28,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=337180.8333333333, ans=0.05 2024-09-15 20:12:43,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337209.1666666667, ans=0.1 2024-09-15 20:12:44,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.059e+02 2.164e+02 2.382e+02 4.370e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 20:12:50,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337209.1666666667, ans=0.1 2024-09-15 20:12:51,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-09-15 20:13:01,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=337237.5, ans=0.2 2024-09-15 20:13:10,473 INFO [train.py:1198] (1/2) Epoch 19, batch 4000, loss[loss=0.2512, ctc_loss=0.1755, cr_loss=0.3786, over 20926.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1624, cr_loss=0.3832, over 4087370.02 frames. ], batch size: 64, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:13:33,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=337294.1666666667, ans=0.025 2024-09-15 20:13:42,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=22.5 2024-09-15 20:13:51,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=337322.5, ans=0.07 2024-09-15 20:13:54,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=337350.8333333333, ans=0.125 2024-09-15 20:13:56,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337350.8333333333, ans=0.1 2024-09-15 20:13:57,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=337350.8333333333, ans=0.2 2024-09-15 20:13:59,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337350.8333333333, ans=0.1 2024-09-15 20:13:59,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337350.8333333333, ans=0.125 2024-09-15 20:14:26,249 INFO [train.py:1198] (1/2) Epoch 19, batch 4050, loss[loss=0.2487, ctc_loss=0.17, cr_loss=0.3934, over 21051.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1622, cr_loss=0.3826, over 4096398.27 frames. ], batch size: 62, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:14:34,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-15 20:14:52,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=337435.8333333333, ans=0.0 2024-09-15 20:14:59,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337464.1666666667, ans=0.125 2024-09-15 20:15:14,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=337492.5, ans=0.07 2024-09-15 20:15:18,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.052e+02 2.195e+02 2.416e+02 4.050e+02, threshold=4.390e+02, percent-clipped=0.0 2024-09-15 20:15:44,746 INFO [train.py:1198] (1/2) Epoch 19, batch 4100, loss[loss=0.1946, ctc_loss=0.13, cr_loss=0.3229, over 19820.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1628, cr_loss=0.383, over 4103099.29 frames. ], batch size: 44, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:16:05,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=337577.5, ans=0.2 2024-09-15 20:16:37,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=337634.1666666667, ans=0.04949747468305833 2024-09-15 20:16:51,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-15 20:16:52,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337662.5, ans=0.1 2024-09-15 20:16:59,345 INFO [train.py:1198] (1/2) Epoch 19, batch 4150, loss[loss=0.295, ctc_loss=0.205, cr_loss=0.4501, over 18274.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1638, cr_loss=0.3846, over 4097325.30 frames. ], batch size: 108, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:16:59,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337690.8333333333, ans=0.1 2024-09-15 20:17:49,310 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.125e+02 2.270e+02 2.446e+02 3.860e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-15 20:17:50,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=12.0 2024-09-15 20:18:17,821 INFO [train.py:1198] (1/2) Epoch 19, batch 4200, loss[loss=0.2488, ctc_loss=0.1675, cr_loss=0.4065, over 20887.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1633, cr_loss=0.3836, over 4094246.50 frames. ], batch size: 57, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:18:19,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=337832.5, ans=0.125 2024-09-15 20:19:33,302 INFO [train.py:1198] (1/2) Epoch 19, batch 4250, loss[loss=0.2069, ctc_loss=0.138, cr_loss=0.3444, over 20992.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1635, cr_loss=0.3837, over 4087445.21 frames. ], batch size: 51, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:19:41,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=337974.1666666667, ans=0.2 2024-09-15 20:19:42,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337974.1666666667, ans=0.125 2024-09-15 20:19:44,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337974.1666666667, ans=0.0 2024-09-15 20:19:51,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338002.5, ans=0.125 2024-09-15 20:19:55,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338002.5, ans=0.1 2024-09-15 20:20:09,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=338030.8333333333, ans=0.0 2024-09-15 20:20:25,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.078e+02 2.214e+02 2.366e+02 5.564e+02, threshold=4.427e+02, percent-clipped=2.0 2024-09-15 20:20:51,784 INFO [train.py:1198] (1/2) Epoch 19, batch 4300, loss[loss=0.2514, ctc_loss=0.1726, cr_loss=0.394, over 21077.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1637, cr_loss=0.3842, over 4086300.45 frames. ], batch size: 59, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:20:56,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=338115.8333333333, ans=15.0 2024-09-15 20:21:14,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=338144.1666666667, ans=0.0 2024-09-15 20:21:19,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=22.5 2024-09-15 20:21:36,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=338200.8333333333, ans=0.04949747468305833 2024-09-15 20:22:06,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=338257.5, ans=0.0 2024-09-15 20:22:07,758 INFO [train.py:1198] (1/2) Epoch 19, batch 4350, loss[loss=0.2215, ctc_loss=0.1505, cr_loss=0.3549, over 21037.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1637, cr_loss=0.3835, over 4086480.41 frames. ], batch size: 62, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:22:18,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=338257.5, ans=0.0 2024-09-15 20:22:50,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=338314.1666666667, ans=0.2 2024-09-15 20:22:57,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.078e+02 2.191e+02 2.349e+02 5.066e+02, threshold=4.382e+02, percent-clipped=2.0 2024-09-15 20:23:05,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=338342.5, ans=0.125 2024-09-15 20:23:23,868 INFO [train.py:1198] (1/2) Epoch 19, batch 4400, loss[loss=0.2358, ctc_loss=0.1612, cr_loss=0.3728, over 21042.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1639, cr_loss=0.3833, over 4076213.87 frames. ], batch size: 62, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:23:24,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=338399.1666666667, ans=0.2 2024-09-15 20:23:24,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=338399.1666666667, ans=0.125 2024-09-15 20:23:28,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=338399.1666666667, ans=0.125 2024-09-15 20:23:30,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=338399.1666666667, ans=0.0 2024-09-15 20:23:32,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-15 20:24:11,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338484.1666666667, ans=0.125 2024-09-15 20:24:37,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=338512.5, ans=0.2 2024-09-15 20:24:39,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=338512.5, ans=0.0 2024-09-15 20:24:43,499 INFO [train.py:1198] (1/2) Epoch 19, batch 4450, loss[loss=0.1915, ctc_loss=0.1273, cr_loss=0.3209, over 20233.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1633, cr_loss=0.3826, over 4081350.85 frames. ], batch size: 45, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:24:43,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=338540.8333333333, ans=0.0 2024-09-15 20:25:00,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=338569.1666666667, ans=0.0 2024-09-15 20:25:08,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=338569.1666666667, ans=0.0 2024-09-15 20:25:12,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=338597.5, ans=0.125 2024-09-15 20:25:33,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.098e+02 2.200e+02 2.365e+02 3.307e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-15 20:25:59,148 INFO [train.py:1198] (1/2) Epoch 19, batch 4500, loss[loss=0.2514, ctc_loss=0.172, cr_loss=0.3966, over 20569.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1646, cr_loss=0.3846, over 4082391.19 frames. ], batch size: 75, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:27:15,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338824.1666666667, ans=0.125 2024-09-15 20:27:16,822 INFO [train.py:1198] (1/2) Epoch 19, batch 4550, loss[loss=0.2132, ctc_loss=0.1447, cr_loss=0.3423, over 20900.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1649, cr_loss=0.3854, over 4088810.45 frames. ], batch size: 54, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:27:28,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2024-09-15 20:28:03,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-15 20:28:07,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.072e+02 2.177e+02 2.383e+02 3.306e+02, threshold=4.353e+02, percent-clipped=0.0 2024-09-15 20:28:20,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2024-09-15 20:28:24,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=338937.5, ans=0.0 2024-09-15 20:28:31,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=338965.8333333333, ans=0.2 2024-09-15 20:28:33,216 INFO [train.py:1198] (1/2) Epoch 19, batch 4600, loss[loss=0.2451, ctc_loss=0.1667, cr_loss=0.392, over 20671.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1651, cr_loss=0.3855, over 4079538.35 frames. ], batch size: 71, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:29:08,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=339022.5, ans=0.0 2024-09-15 20:29:29,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=339050.8333333333, ans=0.125 2024-09-15 20:29:37,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-15 20:29:39,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2024-09-15 20:29:52,174 INFO [train.py:1198] (1/2) Epoch 19, batch 4650, loss[loss=0.2257, ctc_loss=0.1535, cr_loss=0.3609, over 20953.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1645, cr_loss=0.3842, over 4083414.15 frames. ], batch size: 51, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:29:54,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=339107.5, ans=0.125 2024-09-15 20:30:05,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=339135.8333333333, ans=0.125 2024-09-15 20:30:36,380 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:30:41,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.070e+02 2.191e+02 2.427e+02 4.412e+02, threshold=4.382e+02, percent-clipped=1.0 2024-09-15 20:31:07,521 INFO [train.py:1198] (1/2) Epoch 19, batch 4700, loss[loss=0.1962, ctc_loss=0.133, cr_loss=0.3161, over 20008.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1641, cr_loss=0.3837, over 4077862.55 frames. ], batch size: 44, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:31:48,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-15 20:31:56,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-15 20:32:14,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=339362.5, ans=0.125 2024-09-15 20:32:25,781 INFO [train.py:1198] (1/2) Epoch 19, batch 4750, loss[loss=0.2852, ctc_loss=0.2048, cr_loss=0.4019, over 14557.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.164, cr_loss=0.3824, over 4065413.97 frames. ], batch size: 149, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:32:30,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=339390.8333333333, ans=0.0 2024-09-15 20:33:08,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=339447.5, ans=0.0 2024-09-15 20:33:15,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-15 20:33:16,051 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.051e+02 2.213e+02 2.424e+02 3.685e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 20:33:39,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=339504.1666666667, ans=0.0 2024-09-15 20:33:41,845 INFO [train.py:1198] (1/2) Epoch 19, batch 4800, loss[loss=0.2333, ctc_loss=0.1547, cr_loss=0.3933, over 20812.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.3831, over 4064434.23 frames. ], batch size: 59, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:33:47,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-15 20:33:54,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=339532.5, ans=10.0 2024-09-15 20:34:11,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-15 20:34:27,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=339617.5, ans=0.125 2024-09-15 20:34:27,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=339617.5, ans=0.0 2024-09-15 20:34:30,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=339617.5, ans=0.125 2024-09-15 20:34:44,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-09-15 20:34:57,322 INFO [train.py:1198] (1/2) Epoch 19, batch 4850, loss[loss=0.2329, ctc_loss=0.1583, cr_loss=0.3728, over 20881.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1634, cr_loss=0.3829, over 4069745.56 frames. ], batch size: 54, lr: 4.37e-03, grad_scale: 16.0 2024-09-15 20:35:33,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=339730.8333333333, ans=0.2 2024-09-15 20:35:38,722 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:35:45,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-15 20:35:51,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.055e+02 2.168e+02 2.309e+02 3.263e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 20:35:58,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=339759.1666666667, ans=0.0 2024-09-15 20:36:03,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-15 20:36:15,796 INFO [train.py:1198] (1/2) Epoch 19, batch 4900, loss[loss=0.2651, ctc_loss=0.1838, cr_loss=0.4065, over 20856.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1624, cr_loss=0.3823, over 4084999.85 frames. ], batch size: 65, lr: 4.37e-03, grad_scale: 16.0 2024-09-15 20:36:51,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-15 20:37:25,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-15 20:37:30,215 INFO [train.py:1198] (1/2) Epoch 19, batch 4950, loss[loss=0.2958, ctc_loss=0.2139, cr_loss=0.4098, over 14441.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3836, over 4082535.05 frames. ], batch size: 149, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:37:30,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=339957.5, ans=0.025 2024-09-15 20:38:22,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.082e+02 2.229e+02 2.461e+02 3.381e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 20:38:29,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=340070.8333333333, ans=0.2 2024-09-15 20:38:48,742 INFO [train.py:1198] (1/2) Epoch 19, batch 5000, loss[loss=0.2556, ctc_loss=0.1755, cr_loss=0.401, over 20843.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1635, cr_loss=0.3842, over 4093480.85 frames. ], batch size: 65, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:38:57,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=340099.1666666667, ans=0.125 2024-09-15 20:39:07,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=340127.5, ans=0.125 2024-09-15 20:39:52,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340212.5, ans=0.0 2024-09-15 20:40:03,421 INFO [train.py:1198] (1/2) Epoch 19, batch 5050, loss[loss=0.2395, ctc_loss=0.1633, cr_loss=0.3807, over 20984.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.164, cr_loss=0.3848, over 4097560.38 frames. ], batch size: 55, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:40:03,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=340240.8333333333, ans=0.125 2024-09-15 20:40:12,506 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:40:15,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340240.8333333333, ans=0.125 2024-09-15 20:40:32,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2024-09-15 20:40:34,893 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:40:44,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-15 20:40:45,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340297.5, ans=0.125 2024-09-15 20:40:46,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=340325.8333333333, ans=0.0 2024-09-15 20:40:53,660 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.148e+02 2.300e+02 2.499e+02 8.274e+02, threshold=4.601e+02, percent-clipped=1.0 2024-09-15 20:41:01,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=340354.1666666667, ans=0.0 2024-09-15 20:41:17,162 INFO [train.py:1198] (1/2) Epoch 19, batch 5100, loss[loss=0.2744, ctc_loss=0.1912, cr_loss=0.4161, over 20087.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1637, cr_loss=0.3836, over 4104553.81 frames. ], batch size: 80, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:41:17,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=340382.5, ans=0.09899494936611666 2024-09-15 20:41:33,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=340410.8333333333, ans=0.0 2024-09-15 20:41:41,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=340410.8333333333, ans=0.125 2024-09-15 20:41:51,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=340439.1666666667, ans=0.025 2024-09-15 20:42:06,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-15 20:42:11,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-15 20:42:26,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=340495.8333333333, ans=15.0 2024-09-15 20:42:31,496 INFO [train.py:1198] (1/2) Epoch 19, batch 5150, loss[loss=0.3305, ctc_loss=0.2411, cr_loss=0.4467, over 14204.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1641, cr_loss=0.3842, over 4098796.80 frames. ], batch size: 149, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:43:16,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340609.1666666667, ans=0.1 2024-09-15 20:43:16,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=340609.1666666667, ans=0.025 2024-09-15 20:43:21,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.071e+02 2.213e+02 2.468e+02 3.137e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-15 20:43:23,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=340609.1666666667, ans=0.125 2024-09-15 20:43:45,845 INFO [train.py:1198] (1/2) Epoch 19, batch 5200, loss[loss=0.2481, ctc_loss=0.1663, cr_loss=0.409, over 21031.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1634, cr_loss=0.3833, over 4101259.72 frames. ], batch size: 61, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:44:03,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340694.1666666667, ans=0.1 2024-09-15 20:44:16,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=340722.5, ans=0.0 2024-09-15 20:44:27,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=340722.5, ans=0.0 2024-09-15 20:44:40,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=340750.8333333333, ans=0.07 2024-09-15 20:45:02,568 INFO [train.py:1198] (1/2) Epoch 19, batch 5250, loss[loss=0.25, ctc_loss=0.1647, cr_loss=0.4262, over 20679.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1627, cr_loss=0.3827, over 4105322.90 frames. ], batch size: 66, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:45:53,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.075e+02 2.257e+02 2.468e+02 3.104e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-15 20:46:00,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-15 20:46:15,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=22.5 2024-09-15 20:46:17,500 INFO [train.py:1198] (1/2) Epoch 19, batch 5300, loss[loss=0.2413, ctc_loss=0.1652, cr_loss=0.3803, over 19997.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1631, cr_loss=0.3835, over 4106309.82 frames. ], batch size: 80, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:46:17,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=340949.1666666667, ans=0.0 2024-09-15 20:46:33,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-15 20:46:54,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-15 20:47:05,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=341034.1666666667, ans=10.0 2024-09-15 20:47:13,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=341034.1666666667, ans=0.0 2024-09-15 20:47:26,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=341062.5, ans=0.125 2024-09-15 20:47:32,528 INFO [train.py:1198] (1/2) Epoch 19, batch 5350, loss[loss=0.1999, ctc_loss=0.135, cr_loss=0.3241, over 20767.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1626, cr_loss=0.3828, over 4110676.18 frames. ], batch size: 53, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:47:46,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2024-09-15 20:48:09,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-15 20:48:09,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=341147.5, ans=15.0 2024-09-15 20:48:25,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.063e+02 2.144e+02 2.322e+02 3.198e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 20:48:49,519 INFO [train.py:1198] (1/2) Epoch 19, batch 5400, loss[loss=0.2154, ctc_loss=0.1452, cr_loss=0.3511, over 21050.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1627, cr_loss=0.3821, over 4104000.58 frames. ], batch size: 56, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:49:26,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=341289.1666666667, ans=0.125 2024-09-15 20:50:00,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341345.8333333333, ans=0.0 2024-09-15 20:50:03,418 INFO [train.py:1198] (1/2) Epoch 19, batch 5450, loss[loss=0.2418, ctc_loss=0.1616, cr_loss=0.4011, over 20977.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1631, cr_loss=0.3829, over 4103041.10 frames. ], batch size: 58, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:50:23,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=341402.5, ans=0.0 2024-09-15 20:50:42,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=341430.8333333333, ans=0.2 2024-09-15 20:50:53,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.025e+02 2.193e+02 2.354e+02 3.816e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 20:50:57,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=341459.1666666667, ans=0.0 2024-09-15 20:51:00,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.55 vs. limit=6.0 2024-09-15 20:51:17,813 INFO [train.py:1198] (1/2) Epoch 19, batch 5500, loss[loss=0.2453, ctc_loss=0.1665, cr_loss=0.3939, over 20797.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1633, cr_loss=0.3839, over 4107487.40 frames. ], batch size: 53, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:51:38,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=341544.1666666667, ans=0.0 2024-09-15 20:51:45,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341544.1666666667, ans=0.125 2024-09-15 20:52:19,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=341629.1666666667, ans=0.0 2024-09-15 20:52:25,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341629.1666666667, ans=0.125 2024-09-15 20:52:31,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=341657.5, ans=0.025 2024-09-15 20:52:32,628 INFO [train.py:1198] (1/2) Epoch 19, batch 5550, loss[loss=0.2277, ctc_loss=0.1535, cr_loss=0.3707, over 20889.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1623, cr_loss=0.3818, over 4106086.01 frames. ], batch size: 57, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:53:05,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=341714.1666666667, ans=0.0 2024-09-15 20:53:23,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.077e+02 2.214e+02 2.415e+02 5.498e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-15 20:53:37,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=341770.8333333333, ans=0.0 2024-09-15 20:53:44,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341770.8333333333, ans=0.1 2024-09-15 20:53:50,192 INFO [train.py:1198] (1/2) Epoch 19, batch 5600, loss[loss=0.1936, ctc_loss=0.1331, cr_loss=0.3028, over 20968.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1617, cr_loss=0.3814, over 4118413.59 frames. ], batch size: 49, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:54:45,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=341884.1666666667, ans=0.5 2024-09-15 20:55:04,695 INFO [train.py:1198] (1/2) Epoch 19, batch 5650, loss[loss=0.2152, ctc_loss=0.1478, cr_loss=0.3373, over 21005.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1615, cr_loss=0.3807, over 4105817.31 frames. ], batch size: 52, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:55:11,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-15 20:55:18,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=341969.1666666667, ans=0.125 2024-09-15 20:55:55,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.088e+02 2.206e+02 2.423e+02 3.823e+02, threshold=4.412e+02, percent-clipped=1.0 2024-09-15 20:55:56,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-15 20:56:05,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=342054.1666666667, ans=0.025 2024-09-15 20:56:11,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=342054.1666666667, ans=0.2 2024-09-15 20:56:17,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=342082.5, ans=0.125 2024-09-15 20:56:18,849 INFO [train.py:1198] (1/2) Epoch 19, batch 5700, loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3677, over 21004.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1613, cr_loss=0.3804, over 4108807.23 frames. ], batch size: 52, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:56:31,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342082.5, ans=0.125 2024-09-15 20:56:31,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-15 20:56:34,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342110.8333333333, ans=0.1 2024-09-15 20:57:15,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342167.5, ans=0.1 2024-09-15 20:57:35,916 INFO [train.py:1198] (1/2) Epoch 19, batch 5750, loss[loss=0.2429, ctc_loss=0.1635, cr_loss=0.3968, over 21075.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1612, cr_loss=0.3807, over 4104098.17 frames. ], batch size: 59, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:58:26,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.039e+02 2.192e+02 2.375e+02 4.646e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 20:58:38,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=342337.5, ans=0.125 2024-09-15 20:58:40,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=342337.5, ans=0.2 2024-09-15 20:58:46,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=342337.5, ans=0.0 2024-09-15 20:58:50,191 INFO [train.py:1198] (1/2) Epoch 19, batch 5800, loss[loss=0.2391, ctc_loss=0.1619, cr_loss=0.3865, over 20783.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1608, cr_loss=0.3795, over 4103095.33 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:58:53,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=342365.8333333333, ans=0.2 2024-09-15 20:58:55,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=342365.8333333333, ans=0.025 2024-09-15 20:59:40,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=22.5 2024-09-15 21:00:04,427 INFO [train.py:1198] (1/2) Epoch 19, batch 5850, loss[loss=0.2351, ctc_loss=0.1625, cr_loss=0.3628, over 20667.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1617, cr_loss=0.3807, over 4108324.82 frames. ], batch size: 66, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:00:47,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=342592.5, ans=0.2 2024-09-15 21:00:54,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.054e+02 2.231e+02 2.392e+02 5.027e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-15 21:00:56,737 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:01:01,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=342592.5, ans=0.0 2024-09-15 21:01:16,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=342620.8333333333, ans=0.2 2024-09-15 21:01:18,922 INFO [train.py:1198] (1/2) Epoch 19, batch 5900, loss[loss=0.2015, ctc_loss=0.1355, cr_loss=0.3298, over 18980.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1613, cr_loss=0.381, over 4114163.84 frames. ], batch size: 42, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:01:25,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=342649.1666666667, ans=0.2 2024-09-15 21:01:28,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-15 21:01:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342677.5, ans=0.125 2024-09-15 21:01:48,695 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:02:28,571 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:02:34,920 INFO [train.py:1198] (1/2) Epoch 19, batch 5950, loss[loss=0.2338, ctc_loss=0.1595, cr_loss=0.3713, over 21014.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1619, cr_loss=0.3816, over 4106638.80 frames. ], batch size: 63, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:02:54,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=342819.1666666667, ans=0.125 2024-09-15 21:03:07,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=15.0 2024-09-15 21:03:25,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.091e+02 2.197e+02 2.312e+02 3.561e+02, threshold=4.393e+02, percent-clipped=0.0 2024-09-15 21:03:49,374 INFO [train.py:1198] (1/2) Epoch 19, batch 6000, loss[loss=0.2352, ctc_loss=0.1602, cr_loss=0.3751, over 20867.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1614, cr_loss=0.381, over 4107562.86 frames. ], batch size: 54, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:03:49,374 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 21:04:13,867 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.04428, ctc_loss=0.04428, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 21:04:13,867 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 21:05:23,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=343045.8333333333, ans=0.125 2024-09-15 21:05:27,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-15 21:05:30,110 INFO [train.py:1198] (1/2) Epoch 19, batch 6050, loss[loss=0.2016, ctc_loss=0.1333, cr_loss=0.3415, over 20963.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1612, cr_loss=0.3804, over 4101802.61 frames. ], batch size: 50, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:05:32,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-15 21:05:43,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343074.1666666667, ans=0.125 2024-09-15 21:05:44,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343102.5, ans=0.1 2024-09-15 21:05:50,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=343102.5, ans=0.125 2024-09-15 21:06:21,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.107e+02 2.319e+02 2.569e+02 3.285e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-15 21:06:22,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-09-15 21:06:27,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=343159.1666666667, ans=0.0 2024-09-15 21:06:29,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-15 21:06:45,486 INFO [train.py:1198] (1/2) Epoch 19, batch 6100, loss[loss=0.2529, ctc_loss=0.1696, cr_loss=0.4162, over 21057.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3827, over 4099272.62 frames. ], batch size: 59, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:06:47,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=343215.8333333333, ans=0.1 2024-09-15 21:07:09,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343244.1666666667, ans=0.125 2024-09-15 21:07:18,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-09-15 21:07:58,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343357.5, ans=0.125 2024-09-15 21:07:59,224 INFO [train.py:1198] (1/2) Epoch 19, batch 6150, loss[loss=0.2497, ctc_loss=0.1671, cr_loss=0.413, over 21017.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1633, cr_loss=0.3843, over 4086161.48 frames. ], batch size: 63, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:08:02,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=343357.5, ans=0.0 2024-09-15 21:08:49,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.125e+02 2.258e+02 2.510e+02 3.321e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-15 21:08:51,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=343442.5, ans=0.0 2024-09-15 21:08:58,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=343470.8333333333, ans=0.0 2024-09-15 21:09:02,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343470.8333333333, ans=0.125 2024-09-15 21:09:12,849 INFO [train.py:1198] (1/2) Epoch 19, batch 6200, loss[loss=0.2443, ctc_loss=0.1631, cr_loss=0.4061, over 21032.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1651, cr_loss=0.3857, over 4049623.49 frames. ], batch size: 62, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:09:25,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=343499.1666666667, ans=0.0 2024-09-15 21:09:32,617 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:09:35,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343527.5, ans=0.0 2024-09-15 21:09:35,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=343527.5, ans=0.2 2024-09-15 21:09:57,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 21:09:58,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=343584.1666666667, ans=0.0 2024-09-15 21:10:22,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=343612.5, ans=0.125 2024-09-15 21:10:25,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=343640.8333333333, ans=0.2 2024-09-15 21:10:26,981 INFO [train.py:1198] (1/2) Epoch 19, batch 6250, loss[loss=0.2139, ctc_loss=0.1437, cr_loss=0.3512, over 21018.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1634, cr_loss=0.3828, over 4049289.57 frames. ], batch size: 52, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:10:52,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-15 21:10:56,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=343697.5, ans=10.0 2024-09-15 21:11:02,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=343697.5, ans=0.0 2024-09-15 21:11:15,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=343725.8333333333, ans=0.0 2024-09-15 21:11:17,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.073e+02 2.286e+02 2.465e+02 3.594e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-15 21:11:41,750 INFO [train.py:1198] (1/2) Epoch 19, batch 6300, loss[loss=0.2088, ctc_loss=0.1376, cr_loss=0.3558, over 20968.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.164, cr_loss=0.3819, over 3990009.39 frames. ], batch size: 52, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:11:47,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=343782.5, ans=0.0 2024-09-15 21:11:55,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343810.8333333333, ans=0.1 2024-09-15 21:12:08,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=343810.8333333333, ans=0.2 2024-09-15 21:12:28,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=343867.5, ans=0.125 2024-09-15 21:12:31,694 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:12:51,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=343895.8333333333, ans=0.0 2024-09-15 21:12:54,193 INFO [train.py:1198] (1/2) Epoch 19, batch 6350, loss[loss=0.2938, ctc_loss=0.2102, cr_loss=0.4179, over 14290.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1689, cr_loss=0.3846, over 3830754.93 frames. ], batch size: 150, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:12:57,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=343924.1666666667, ans=0.125 2024-09-15 21:13:36,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-15 21:13:43,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.320e+02 2.521e+02 2.765e+02 3.309e+02, threshold=5.042e+02, percent-clipped=0.0 2024-09-15 21:14:41,660 INFO [train.py:1198] (1/2) Epoch 20, batch 0, loss[loss=0.2828, ctc_loss=0.1973, cr_loss=0.4276, over 18383.00 frames. ], tot_loss[loss=0.2828, ctc_loss=0.1973, cr_loss=0.4276, over 18383.00 frames. ], batch size: 108, lr: 4.23e-03, grad_scale: 32.0 2024-09-15 21:14:41,661 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 21:15:01,252 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.04431, ctc_loss=0.04431, cr_loss=1.055e-14, over 944034.00 frames. 2024-09-15 21:15:01,252 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 21:15:19,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=344068.6666666667, ans=0.0 2024-09-15 21:16:11,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=344153.6666666667, ans=0.05 2024-09-15 21:16:17,097 INFO [train.py:1198] (1/2) Epoch 20, batch 50, loss[loss=0.2682, ctc_loss=0.1832, cr_loss=0.4247, over 20674.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1632, cr_loss=0.3853, over 923709.66 frames. ], batch size: 66, lr: 4.23e-03, grad_scale: 32.0 2024-09-15 21:16:52,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=344238.6666666667, ans=15.0 2024-09-15 21:17:01,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=22.5 2024-09-15 21:17:20,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=344295.3333333333, ans=0.2 2024-09-15 21:17:26,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.057e+02 2.202e+02 2.357e+02 3.385e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 21:17:35,249 INFO [train.py:1198] (1/2) Epoch 20, batch 100, loss[loss=0.202, ctc_loss=0.1353, cr_loss=0.3333, over 20958.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1655, cr_loss=0.3885, over 1618516.28 frames. ], batch size: 51, lr: 4.23e-03, grad_scale: 16.0 2024-09-15 21:18:10,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=344380.3333333333, ans=0.025 2024-09-15 21:18:25,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=344408.6666666667, ans=0.125 2024-09-15 21:18:37,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=344437.0, ans=0.125 2024-09-15 21:18:45,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=344437.0, ans=0.125 2024-09-15 21:18:52,294 INFO [train.py:1198] (1/2) Epoch 20, batch 150, loss[loss=0.2122, ctc_loss=0.1415, cr_loss=0.3534, over 20875.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1638, cr_loss=0.3858, over 2174134.13 frames. ], batch size: 57, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:18:52,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344465.3333333333, ans=0.1 2024-09-15 21:19:00,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=344465.3333333333, ans=0.2 2024-09-15 21:19:58,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.034e+02 2.155e+02 2.380e+02 4.137e+02, threshold=4.310e+02, percent-clipped=0.0 2024-09-15 21:19:58,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=344578.6666666667, ans=0.125 2024-09-15 21:20:07,502 INFO [train.py:1198] (1/2) Epoch 20, batch 200, loss[loss=0.2364, ctc_loss=0.1614, cr_loss=0.3754, over 20779.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1648, cr_loss=0.3869, over 2597779.33 frames. ], batch size: 71, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:20:29,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344635.3333333333, ans=0.125 2024-09-15 21:20:38,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.84 vs. limit=10.0 2024-09-15 21:20:58,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=344692.0, ans=0.09899494936611666 2024-09-15 21:20:58,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=344692.0, ans=10.0 2024-09-15 21:21:07,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344720.3333333333, ans=0.125 2024-09-15 21:21:10,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=344720.3333333333, ans=10.0 2024-09-15 21:21:22,488 INFO [train.py:1198] (1/2) Epoch 20, batch 250, loss[loss=0.2368, ctc_loss=0.1607, cr_loss=0.3805, over 20705.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1651, cr_loss=0.3872, over 2918842.17 frames. ], batch size: 71, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:21:31,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=344748.6666666667, ans=0.125 2024-09-15 21:21:58,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-09-15 21:22:00,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=344805.3333333333, ans=0.05 2024-09-15 21:22:17,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=344833.6666666667, ans=0.125 2024-09-15 21:22:28,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.037e+02 2.145e+02 2.263e+02 8.354e+02, threshold=4.291e+02, percent-clipped=1.0 2024-09-15 21:22:37,886 INFO [train.py:1198] (1/2) Epoch 20, batch 300, loss[loss=0.2708, ctc_loss=0.185, cr_loss=0.4291, over 19280.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1629, cr_loss=0.3841, over 3190772.64 frames. ], batch size: 90, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:23:35,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-15 21:23:46,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=22.5 2024-09-15 21:23:59,630 INFO [train.py:1198] (1/2) Epoch 20, batch 350, loss[loss=0.2827, ctc_loss=0.2014, cr_loss=0.4064, over 18205.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1635, cr_loss=0.3846, over 3387400.08 frames. ], batch size: 108, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:24:22,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=345060.3333333333, ans=0.0 2024-09-15 21:24:54,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=345117.0, ans=0.0 2024-09-15 21:25:00,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=345145.3333333333, ans=0.125 2024-09-15 21:25:06,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.027e+02 2.187e+02 2.410e+02 4.213e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-15 21:25:15,394 INFO [train.py:1198] (1/2) Epoch 20, batch 400, loss[loss=0.2972, ctc_loss=0.2122, cr_loss=0.4248, over 14567.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.162, cr_loss=0.3821, over 3541540.21 frames. ], batch size: 150, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:25:28,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-15 21:25:35,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-15 21:25:47,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=345230.3333333333, ans=0.125 2024-09-15 21:25:52,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=345230.3333333333, ans=0.0 2024-09-15 21:25:52,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=345230.3333333333, ans=0.07 2024-09-15 21:26:02,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=345258.6666666667, ans=0.125 2024-09-15 21:26:07,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=345258.6666666667, ans=0.025 2024-09-15 21:26:20,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=345287.0, ans=0.125 2024-09-15 21:26:30,962 INFO [train.py:1198] (1/2) Epoch 20, batch 450, loss[loss=0.2339, ctc_loss=0.1551, cr_loss=0.394, over 19969.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1617, cr_loss=0.383, over 3678701.10 frames. ], batch size: 44, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:26:47,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=345343.6666666667, ans=0.2 2024-09-15 21:26:55,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-15 21:26:56,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-15 21:27:13,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=345372.0, ans=0.125 2024-09-15 21:27:33,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345428.6666666667, ans=0.1 2024-09-15 21:27:33,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-09-15 21:27:37,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.091e+02 2.227e+02 2.463e+02 3.132e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-15 21:27:41,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345428.6666666667, ans=0.1 2024-09-15 21:27:46,641 INFO [train.py:1198] (1/2) Epoch 20, batch 500, loss[loss=0.2525, ctc_loss=0.1753, cr_loss=0.3859, over 19293.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3829, over 3770960.99 frames. ], batch size: 90, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:27:48,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 21:28:48,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345542.0, ans=0.1 2024-09-15 21:28:52,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=345570.3333333333, ans=0.2 2024-09-15 21:29:05,867 INFO [train.py:1198] (1/2) Epoch 20, batch 550, loss[loss=0.248, ctc_loss=0.1649, cr_loss=0.4158, over 20850.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.3829, over 3840293.94 frames. ], batch size: 59, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:29:06,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=345598.6666666667, ans=0.07 2024-09-15 21:29:15,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-15 21:29:37,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-15 21:30:15,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.033e+02 2.183e+02 2.361e+02 3.780e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-15 21:30:24,665 INFO [train.py:1198] (1/2) Epoch 20, batch 600, loss[loss=0.1994, ctc_loss=0.1325, cr_loss=0.3348, over 20971.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.382, over 3907812.41 frames. ], batch size: 48, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:31:22,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=345825.3333333333, ans=0.125 2024-09-15 21:31:35,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345853.6666666667, ans=0.125 2024-09-15 21:31:40,100 INFO [train.py:1198] (1/2) Epoch 20, batch 650, loss[loss=0.2589, ctc_loss=0.1774, cr_loss=0.4074, over 20296.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1613, cr_loss=0.3826, over 3944219.06 frames. ], batch size: 74, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:31:46,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-15 21:31:49,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=345882.0, ans=0.025 2024-09-15 21:32:46,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.018e+02 2.206e+02 2.436e+02 5.404e+02, threshold=4.412e+02, percent-clipped=1.0 2024-09-15 21:32:53,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=345995.3333333333, ans=0.0 2024-09-15 21:32:55,725 INFO [train.py:1198] (1/2) Epoch 20, batch 700, loss[loss=0.269, ctc_loss=0.1821, cr_loss=0.4343, over 20855.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.3828, over 3987374.78 frames. ], batch size: 65, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:33:23,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-15 21:33:27,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=346080.3333333333, ans=0.125 2024-09-15 21:34:10,923 INFO [train.py:1198] (1/2) Epoch 20, batch 750, loss[loss=0.2131, ctc_loss=0.1422, cr_loss=0.3545, over 20975.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3826, over 4010728.91 frames. ], batch size: 50, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:34:11,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346165.3333333333, ans=0.125 2024-09-15 21:34:26,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-15 21:35:20,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.018e+02 2.162e+02 2.350e+02 3.556e+02, threshold=4.324e+02, percent-clipped=0.0 2024-09-15 21:35:33,147 INFO [train.py:1198] (1/2) Epoch 20, batch 800, loss[loss=0.2418, ctc_loss=0.1614, cr_loss=0.402, over 21082.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3823, over 4033196.43 frames. ], batch size: 59, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:35:37,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=346307.0, ans=0.09899494936611666 2024-09-15 21:35:48,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=346335.3333333333, ans=0.125 2024-09-15 21:35:51,822 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:36:05,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=346363.6666666667, ans=0.0 2024-09-15 21:36:05,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=346363.6666666667, ans=0.125 2024-09-15 21:36:25,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2024-09-15 21:36:43,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-15 21:36:48,782 INFO [train.py:1198] (1/2) Epoch 20, batch 850, loss[loss=0.2622, ctc_loss=0.1794, cr_loss=0.414, over 20651.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.3832, over 4055143.79 frames. ], batch size: 66, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:37:47,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=346533.6666666667, ans=0.2 2024-09-15 21:37:55,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.103e+02 2.201e+02 2.408e+02 3.477e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-15 21:38:04,877 INFO [train.py:1198] (1/2) Epoch 20, batch 900, loss[loss=0.2701, ctc_loss=0.1866, cr_loss=0.4177, over 20667.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.383, over 4070085.65 frames. ], batch size: 68, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:38:06,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=346590.3333333333, ans=0.2 2024-09-15 21:38:43,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=346647.0, ans=0.125 2024-09-15 21:39:18,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-15 21:39:20,346 INFO [train.py:1198] (1/2) Epoch 20, batch 950, loss[loss=0.2058, ctc_loss=0.1346, cr_loss=0.356, over 21051.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1616, cr_loss=0.3823, over 4080130.96 frames. ], batch size: 56, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:39:52,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=346788.6666666667, ans=0.125 2024-09-15 21:40:30,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.105e+02 2.209e+02 2.368e+02 5.712e+02, threshold=4.418e+02, percent-clipped=1.0 2024-09-15 21:40:30,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=346845.3333333333, ans=0.125 2024-09-15 21:40:35,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=346845.3333333333, ans=0.09899494936611666 2024-09-15 21:40:38,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=346873.6666666667, ans=0.2 2024-09-15 21:40:39,244 INFO [train.py:1198] (1/2) Epoch 20, batch 1000, loss[loss=0.2738, ctc_loss=0.1902, cr_loss=0.4183, over 18381.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1613, cr_loss=0.3824, over 4090506.20 frames. ], batch size: 108, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:41:02,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346902.0, ans=0.1 2024-09-15 21:41:07,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=346902.0, ans=0.2 2024-09-15 21:41:57,908 INFO [train.py:1198] (1/2) Epoch 20, batch 1050, loss[loss=0.2272, ctc_loss=0.1538, cr_loss=0.3668, over 20775.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.161, cr_loss=0.3824, over 4101331.50 frames. ], batch size: 56, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:42:36,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=347072.0, ans=0.2 2024-09-15 21:43:02,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=347128.6666666667, ans=0.2 2024-09-15 21:43:03,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.041e+02 2.158e+02 2.292e+02 3.094e+02, threshold=4.316e+02, percent-clipped=0.0 2024-09-15 21:43:13,069 INFO [train.py:1198] (1/2) Epoch 20, batch 1100, loss[loss=0.26, ctc_loss=0.1753, cr_loss=0.4234, over 20849.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1623, cr_loss=0.3837, over 4077813.12 frames. ], batch size: 65, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:43:20,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=347157.0, ans=0.0 2024-09-15 21:43:22,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-15 21:43:26,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=347185.3333333333, ans=0.2 2024-09-15 21:43:46,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=347213.6666666667, ans=0.0 2024-09-15 21:44:28,345 INFO [train.py:1198] (1/2) Epoch 20, batch 1150, loss[loss=0.2523, ctc_loss=0.1679, cr_loss=0.4223, over 20973.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1623, cr_loss=0.3834, over 4088517.65 frames. ], batch size: 64, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:44:36,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-15 21:44:36,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=22.5 2024-09-15 21:45:35,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.079e+02 2.206e+02 2.346e+02 3.281e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-15 21:45:44,054 INFO [train.py:1198] (1/2) Epoch 20, batch 1200, loss[loss=0.2253, ctc_loss=0.1544, cr_loss=0.3542, over 20977.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1616, cr_loss=0.3821, over 4094307.21 frames. ], batch size: 51, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:46:05,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=347468.6666666667, ans=0.125 2024-09-15 21:46:17,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=347497.0, ans=0.0 2024-09-15 21:46:35,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=347525.3333333333, ans=0.125 2024-09-15 21:46:56,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-15 21:47:05,218 INFO [train.py:1198] (1/2) Epoch 20, batch 1250, loss[loss=0.2488, ctc_loss=0.1673, cr_loss=0.4075, over 20841.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1617, cr_loss=0.3821, over 4087885.50 frames. ], batch size: 59, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:47:17,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=347582.0, ans=0.125 2024-09-15 21:47:47,116 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:47:48,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=347667.0, ans=0.0 2024-09-15 21:48:11,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.092e+02 2.225e+02 2.357e+02 3.190e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 21:48:17,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=347695.3333333333, ans=0.95 2024-09-15 21:48:20,095 INFO [train.py:1198] (1/2) Epoch 20, batch 1300, loss[loss=0.2765, ctc_loss=0.1938, cr_loss=0.4134, over 18158.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.381, over 4091978.40 frames. ], batch size: 108, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:48:32,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=347723.6666666667, ans=0.04949747468305833 2024-09-15 21:48:58,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=347780.3333333333, ans=0.125 2024-09-15 21:49:16,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=347808.6666666667, ans=0.125 2024-09-15 21:49:35,571 INFO [train.py:1198] (1/2) Epoch 20, batch 1350, loss[loss=0.2092, ctc_loss=0.1403, cr_loss=0.3443, over 20977.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1618, cr_loss=0.3825, over 4096222.88 frames. ], batch size: 55, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:49:41,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=347865.3333333333, ans=0.2 2024-09-15 21:50:07,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-15 21:50:15,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-09-15 21:50:40,966 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:50:41,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.096e+02 2.229e+02 2.420e+02 3.122e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 21:50:51,225 INFO [train.py:1198] (1/2) Epoch 20, batch 1400, loss[loss=0.2792, ctc_loss=0.1997, cr_loss=0.398, over 14398.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1628, cr_loss=0.3834, over 4092375.90 frames. ], batch size: 149, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:51:23,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=348063.6666666667, ans=0.0 2024-09-15 21:51:26,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348063.6666666667, ans=0.125 2024-09-15 21:51:36,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=348092.0, ans=0.2 2024-09-15 21:51:50,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=348120.3333333333, ans=0.125 2024-09-15 21:52:04,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=348120.3333333333, ans=0.2 2024-09-15 21:52:09,935 INFO [train.py:1198] (1/2) Epoch 20, batch 1450, loss[loss=0.2147, ctc_loss=0.1425, cr_loss=0.3613, over 21001.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1625, cr_loss=0.3833, over 4104586.69 frames. ], batch size: 52, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:52:13,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348148.6666666667, ans=0.125 2024-09-15 21:52:25,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=348177.0, ans=0.2 2024-09-15 21:52:57,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=348233.6666666667, ans=0.125 2024-09-15 21:53:04,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348233.6666666667, ans=0.1 2024-09-15 21:53:19,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.133e+02 2.247e+02 2.402e+02 3.201e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-15 21:53:22,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=348262.0, ans=0.0 2024-09-15 21:53:28,788 INFO [train.py:1198] (1/2) Epoch 20, batch 1500, loss[loss=0.2428, ctc_loss=0.1684, cr_loss=0.3719, over 20853.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1635, cr_loss=0.3852, over 4108632.17 frames. ], batch size: 65, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:53:42,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348318.6666666667, ans=0.1 2024-09-15 21:53:51,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=348318.6666666667, ans=0.04949747468305833 2024-09-15 21:53:53,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=348318.6666666667, ans=0.125 2024-09-15 21:53:57,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=348347.0, ans=0.0 2024-09-15 21:54:44,302 INFO [train.py:1198] (1/2) Epoch 20, batch 1550, loss[loss=0.2411, ctc_loss=0.1629, cr_loss=0.3907, over 20688.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1628, cr_loss=0.3841, over 4104836.48 frames. ], batch size: 68, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:55:50,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.021e+02 2.194e+02 2.346e+02 4.155e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 21:55:59,870 INFO [train.py:1198] (1/2) Epoch 20, batch 1600, loss[loss=0.2485, ctc_loss=0.1675, cr_loss=0.4046, over 20969.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.162, cr_loss=0.3834, over 4116188.48 frames. ], batch size: 58, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:56:17,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=348602.0, ans=0.125 2024-09-15 21:56:21,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348602.0, ans=0.1 2024-09-15 21:56:35,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348630.3333333333, ans=0.1 2024-09-15 21:57:08,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=348687.0, ans=0.125 2024-09-15 21:57:16,280 INFO [train.py:1198] (1/2) Epoch 20, batch 1650, loss[loss=0.218, ctc_loss=0.1493, cr_loss=0.3434, over 20833.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1632, cr_loss=0.3847, over 4100408.56 frames. ], batch size: 59, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:57:25,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=348715.3333333333, ans=0.125 2024-09-15 21:57:33,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=348743.6666666667, ans=0.125 2024-09-15 21:57:36,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=348743.6666666667, ans=0.04949747468305833 2024-09-15 21:57:57,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-09-15 21:58:03,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=348800.3333333333, ans=0.125 2024-09-15 21:58:25,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.098e+02 2.222e+02 2.396e+02 4.762e+02, threshold=4.443e+02, percent-clipped=1.0 2024-09-15 21:58:37,110 INFO [train.py:1198] (1/2) Epoch 20, batch 1700, loss[loss=0.2485, ctc_loss=0.1709, cr_loss=0.3876, over 20666.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1632, cr_loss=0.3846, over 4101200.14 frames. ], batch size: 68, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:58:40,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348857.0, ans=0.125 2024-09-15 21:58:42,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348857.0, ans=0.1 2024-09-15 21:58:58,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=348885.3333333333, ans=0.2 2024-09-15 21:59:37,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-15 21:59:52,719 INFO [train.py:1198] (1/2) Epoch 20, batch 1750, loss[loss=0.2193, ctc_loss=0.1498, cr_loss=0.3476, over 21053.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1623, cr_loss=0.3836, over 4108133.19 frames. ], batch size: 53, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:59:56,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-15 22:00:03,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=348998.6666666667, ans=0.125 2024-09-15 22:00:06,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=349027.0, ans=0.125 2024-09-15 22:00:09,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=349027.0, ans=0.2 2024-09-15 22:00:34,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-09-15 22:00:56,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=349112.0, ans=0.125 2024-09-15 22:00:59,453 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.078e+02 2.203e+02 2.438e+02 4.193e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 22:01:02,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=349112.0, ans=0.07 2024-09-15 22:01:08,575 INFO [train.py:1198] (1/2) Epoch 20, batch 1800, loss[loss=0.2155, ctc_loss=0.1459, cr_loss=0.3479, over 21004.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1613, cr_loss=0.3824, over 4111882.03 frames. ], batch size: 61, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 22:02:23,950 INFO [train.py:1198] (1/2) Epoch 20, batch 1850, loss[loss=0.2214, ctc_loss=0.1481, cr_loss=0.3664, over 21036.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1613, cr_loss=0.3821, over 4114215.36 frames. ], batch size: 63, lr: 4.20e-03, grad_scale: 16.0 2024-09-15 22:02:42,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=349310.3333333333, ans=0.125 2024-09-15 22:02:42,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=22.5 2024-09-15 22:02:50,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-15 22:03:00,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=349338.6666666667, ans=0.0 2024-09-15 22:03:07,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-15 22:03:08,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=349367.0, ans=0.025 2024-09-15 22:03:17,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-15 22:03:32,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=349395.3333333333, ans=0.125 2024-09-15 22:03:35,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.062e+02 2.190e+02 2.352e+02 6.993e+02, threshold=4.381e+02, percent-clipped=1.0 2024-09-15 22:03:43,253 INFO [train.py:1198] (1/2) Epoch 20, batch 1900, loss[loss=0.2342, ctc_loss=0.1588, cr_loss=0.3774, over 20898.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1605, cr_loss=0.3808, over 4103242.34 frames. ], batch size: 54, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:03:47,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2024-09-15 22:03:49,699 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:04:10,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=349452.0, ans=0.125 2024-09-15 22:04:50,139 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:04:54,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349537.0, ans=0.125 2024-09-15 22:04:58,919 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:05:01,569 INFO [train.py:1198] (1/2) Epoch 20, batch 1950, loss[loss=0.1966, ctc_loss=0.13, cr_loss=0.333, over 21068.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1611, cr_loss=0.3813, over 4088795.13 frames. ], batch size: 56, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:05:13,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349565.3333333333, ans=0.1 2024-09-15 22:05:31,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349622.0, ans=0.125 2024-09-15 22:05:33,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2024-09-15 22:05:34,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349622.0, ans=0.1 2024-09-15 22:06:08,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.079e+02 2.264e+02 2.474e+02 3.500e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 22:06:16,540 INFO [train.py:1198] (1/2) Epoch 20, batch 2000, loss[loss=0.2231, ctc_loss=0.1479, cr_loss=0.3761, over 20985.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1628, cr_loss=0.3833, over 4065274.44 frames. ], batch size: 52, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:07:32,356 INFO [train.py:1198] (1/2) Epoch 20, batch 2050, loss[loss=0.2325, ctc_loss=0.1572, cr_loss=0.3764, over 21045.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1622, cr_loss=0.3829, over 4076322.87 frames. ], batch size: 53, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:07:34,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=349848.6666666667, ans=0.95 2024-09-15 22:07:41,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=349848.6666666667, ans=0.0 2024-09-15 22:07:44,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=349848.6666666667, ans=0.1 2024-09-15 22:07:46,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=349877.0, ans=0.2 2024-09-15 22:08:27,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=349933.6666666667, ans=0.025 2024-09-15 22:08:42,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.026e+02 2.161e+02 2.319e+02 3.736e+02, threshold=4.321e+02, percent-clipped=0.0 2024-09-15 22:08:48,312 INFO [train.py:1198] (1/2) Epoch 20, batch 2100, loss[loss=0.2463, ctc_loss=0.1647, cr_loss=0.4077, over 20644.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1623, cr_loss=0.383, over 4072920.24 frames. ], batch size: 68, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:08:59,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2024-09-15 22:09:09,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350018.6666666667, ans=0.125 2024-09-15 22:09:17,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=350018.6666666667, ans=0.2 2024-09-15 22:10:11,255 INFO [train.py:1198] (1/2) Epoch 20, batch 2150, loss[loss=0.2432, ctc_loss=0.1637, cr_loss=0.3972, over 20975.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1622, cr_loss=0.3826, over 4069905.11 frames. ], batch size: 48, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:10:37,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=350160.3333333333, ans=0.025 2024-09-15 22:11:12,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=350245.3333333333, ans=0.125 2024-09-15 22:11:21,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.081e+02 2.203e+02 2.427e+02 4.025e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 22:11:27,254 INFO [train.py:1198] (1/2) Epoch 20, batch 2200, loss[loss=0.2583, ctc_loss=0.1747, cr_loss=0.418, over 21070.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1633, cr_loss=0.3839, over 4073780.21 frames. ], batch size: 59, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:11:36,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-09-15 22:12:16,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=350358.6666666667, ans=0.04949747468305833 2024-09-15 22:12:37,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-15 22:12:43,009 INFO [train.py:1198] (1/2) Epoch 20, batch 2250, loss[loss=0.2179, ctc_loss=0.1491, cr_loss=0.3437, over 20970.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1632, cr_loss=0.3842, over 4082375.67 frames. ], batch size: 50, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:12:43,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=350415.3333333333, ans=0.125 2024-09-15 22:13:17,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=350472.0, ans=0.2 2024-09-15 22:13:19,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=350472.0, ans=0.125 2024-09-15 22:13:31,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350500.3333333333, ans=0.1 2024-09-15 22:13:35,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=350500.3333333333, ans=0.025 2024-09-15 22:13:44,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=350528.6666666667, ans=0.09899494936611666 2024-09-15 22:13:46,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=350528.6666666667, ans=0.2 2024-09-15 22:13:51,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.114e+02 2.267e+02 2.461e+02 3.098e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-15 22:13:57,791 INFO [train.py:1198] (1/2) Epoch 20, batch 2300, loss[loss=0.2048, ctc_loss=0.1361, cr_loss=0.3438, over 20943.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1625, cr_loss=0.3827, over 4090296.73 frames. ], batch size: 49, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:14:29,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=350613.6666666667, ans=0.0 2024-09-15 22:15:16,618 INFO [train.py:1198] (1/2) Epoch 20, batch 2350, loss[loss=0.198, ctc_loss=0.1316, cr_loss=0.3321, over 20968.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3827, over 4089208.84 frames. ], batch size: 50, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:15:29,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=350698.6666666667, ans=0.2 2024-09-15 22:15:47,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=350727.0, ans=0.07 2024-09-15 22:16:08,523 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:16:25,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=350812.0, ans=0.0 2024-09-15 22:16:29,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.106e+02 2.231e+02 2.400e+02 4.033e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-15 22:16:35,217 INFO [train.py:1198] (1/2) Epoch 20, batch 2400, loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.374, over 21060.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3825, over 4090955.04 frames. ], batch size: 53, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:16:43,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=350840.3333333333, ans=0.125 2024-09-15 22:17:14,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=350897.0, ans=0.125 2024-09-15 22:17:43,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-15 22:17:50,661 INFO [train.py:1198] (1/2) Epoch 20, batch 2450, loss[loss=0.2488, ctc_loss=0.1703, cr_loss=0.3923, over 20654.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.162, cr_loss=0.3822, over 4098627.38 frames. ], batch size: 66, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:17:54,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=350982.0, ans=0.125 2024-09-15 22:17:55,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=350982.0, ans=10.0 2024-09-15 22:17:57,195 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.862e-03 2024-09-15 22:18:23,158 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:18:24,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=351038.6666666667, ans=0.0 2024-09-15 22:18:48,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=351067.0, ans=0.125 2024-09-15 22:19:00,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.068e+02 2.187e+02 2.375e+02 2.912e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-15 22:19:00,700 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:19:06,318 INFO [train.py:1198] (1/2) Epoch 20, batch 2500, loss[loss=0.2336, ctc_loss=0.1556, cr_loss=0.39, over 21053.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1622, cr_loss=0.3825, over 4089515.77 frames. ], batch size: 62, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:19:09,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=351123.6666666667, ans=0.0 2024-09-15 22:19:24,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351152.0, ans=0.125 2024-09-15 22:19:26,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=351152.0, ans=0.125 2024-09-15 22:19:30,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=351152.0, ans=0.125 2024-09-15 22:20:20,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=351265.3333333333, ans=0.2 2024-09-15 22:20:21,721 INFO [train.py:1198] (1/2) Epoch 20, batch 2550, loss[loss=0.2748, ctc_loss=0.1861, cr_loss=0.4434, over 20963.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1622, cr_loss=0.3832, over 4101471.10 frames. ], batch size: 67, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:20:22,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2024-09-15 22:20:25,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=351265.3333333333, ans=0.0 2024-09-15 22:20:28,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=12.0 2024-09-15 22:21:19,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351350.3333333333, ans=0.1 2024-09-15 22:21:37,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.078e+02 2.237e+02 2.413e+02 3.960e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-15 22:21:43,084 INFO [train.py:1198] (1/2) Epoch 20, batch 2600, loss[loss=0.2693, ctc_loss=0.1925, cr_loss=0.3838, over 13803.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1626, cr_loss=0.3829, over 4075485.35 frames. ], batch size: 149, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:21:49,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351407.0, ans=0.125 2024-09-15 22:21:50,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=351407.0, ans=0.025 2024-09-15 22:21:53,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=351407.0, ans=0.125 2024-09-15 22:22:01,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=351435.3333333333, ans=0.2 2024-09-15 22:22:08,473 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:22:14,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=351463.6666666667, ans=10.0 2024-09-15 22:22:25,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-15 22:22:30,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=351492.0, ans=0.125 2024-09-15 22:22:30,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=351492.0, ans=0.125 2024-09-15 22:22:37,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=351492.0, ans=0.025 2024-09-15 22:22:58,641 INFO [train.py:1198] (1/2) Epoch 20, batch 2650, loss[loss=0.2123, ctc_loss=0.1416, cr_loss=0.3532, over 20977.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1619, cr_loss=0.3825, over 4090368.34 frames. ], batch size: 48, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:23:27,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=351605.3333333333, ans=0.125 2024-09-15 22:23:46,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-15 22:23:47,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351633.6666666667, ans=0.1 2024-09-15 22:23:49,189 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:23:54,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-15 22:24:08,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.040e+02 2.198e+02 2.355e+02 3.323e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-15 22:24:14,472 INFO [train.py:1198] (1/2) Epoch 20, batch 2700, loss[loss=0.2444, ctc_loss=0.1634, cr_loss=0.4051, over 20781.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1616, cr_loss=0.3819, over 4091453.38 frames. ], batch size: 53, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:24:32,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=351718.6666666667, ans=0.0 2024-09-15 22:24:49,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=351747.0, ans=0.0 2024-09-15 22:24:51,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=351747.0, ans=0.125 2024-09-15 22:25:03,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351775.3333333333, ans=0.1 2024-09-15 22:25:09,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=351775.3333333333, ans=0.025 2024-09-15 22:25:30,152 INFO [train.py:1198] (1/2) Epoch 20, batch 2750, loss[loss=0.1983, ctc_loss=0.1308, cr_loss=0.3375, over 20971.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1609, cr_loss=0.3807, over 4102722.10 frames. ], batch size: 51, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:26:26,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=351917.0, ans=0.0 2024-09-15 22:26:33,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=351945.3333333333, ans=0.125 2024-09-15 22:26:33,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=351945.3333333333, ans=0.0 2024-09-15 22:26:35,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=351945.3333333333, ans=0.125 2024-09-15 22:26:47,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.065e+02 2.172e+02 2.321e+02 4.117e+02, threshold=4.344e+02, percent-clipped=0.0 2024-09-15 22:26:51,629 INFO [train.py:1198] (1/2) Epoch 20, batch 2800, loss[loss=0.1931, ctc_loss=0.1268, cr_loss=0.3312, over 19939.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3809, over 4089817.32 frames. ], batch size: 44, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:27:02,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351973.6666666667, ans=0.1 2024-09-15 22:27:04,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351973.6666666667, ans=0.125 2024-09-15 22:27:51,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=352087.0, ans=0.5 2024-09-15 22:28:07,526 INFO [train.py:1198] (1/2) Epoch 20, batch 2850, loss[loss=0.2154, ctc_loss=0.1433, cr_loss=0.3604, over 21095.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1605, cr_loss=0.3795, over 4096152.46 frames. ], batch size: 59, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:28:15,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352115.3333333333, ans=0.1 2024-09-15 22:28:21,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=352143.6666666667, ans=0.04949747468305833 2024-09-15 22:28:25,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=352143.6666666667, ans=15.0 2024-09-15 22:28:49,315 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:28:55,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=352200.3333333333, ans=0.0 2024-09-15 22:28:56,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=352200.3333333333, ans=0.125 2024-09-15 22:28:56,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=352200.3333333333, ans=0.125 2024-09-15 22:29:11,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352228.6666666667, ans=0.125 2024-09-15 22:29:20,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.073e+02 2.225e+02 2.423e+02 3.316e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 22:29:20,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=352228.6666666667, ans=0.125 2024-09-15 22:29:22,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352257.0, ans=0.125 2024-09-15 22:29:23,208 INFO [train.py:1198] (1/2) Epoch 20, batch 2900, loss[loss=0.2371, ctc_loss=0.1595, cr_loss=0.3879, over 20864.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1616, cr_loss=0.381, over 4091238.70 frames. ], batch size: 57, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:29:38,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-15 22:29:50,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=352285.3333333333, ans=0.125 2024-09-15 22:29:58,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352313.6666666667, ans=0.125 2024-09-15 22:30:31,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352370.3333333333, ans=0.1 2024-09-15 22:30:38,942 INFO [train.py:1198] (1/2) Epoch 20, batch 2950, loss[loss=0.23, ctc_loss=0.1568, cr_loss=0.3659, over 21001.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1619, cr_loss=0.3811, over 4079675.97 frames. ], batch size: 63, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:30:40,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=352398.6666666667, ans=0.2 2024-09-15 22:30:46,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=352398.6666666667, ans=0.125 2024-09-15 22:30:56,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-15 22:31:09,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=352455.3333333333, ans=0.125 2024-09-15 22:31:14,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=352455.3333333333, ans=0.125 2024-09-15 22:31:47,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=352512.0, ans=0.0 2024-09-15 22:31:51,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.098e+02 2.284e+02 2.558e+02 3.973e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-15 22:31:52,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-15 22:31:57,779 INFO [train.py:1198] (1/2) Epoch 20, batch 3000, loss[loss=0.2507, ctc_loss=0.1695, cr_loss=0.4061, over 20776.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1622, cr_loss=0.3822, over 4080992.74 frames. ], batch size: 56, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:31:57,780 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 22:32:21,362 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.04399, ctc_loss=0.04399, cr_loss=1.053e-14, over 944034.00 frames. 2024-09-15 22:32:21,363 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 22:32:29,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352540.3333333333, ans=0.125 2024-09-15 22:32:32,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-15 22:32:40,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352568.6666666667, ans=0.125 2024-09-15 22:33:38,323 INFO [train.py:1198] (1/2) Epoch 20, batch 3050, loss[loss=0.2203, ctc_loss=0.1473, cr_loss=0.3647, over 20968.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1615, cr_loss=0.3808, over 4085251.24 frames. ], batch size: 58, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:33:54,173 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:34:07,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-15 22:34:34,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=352767.0, ans=10.0 2024-09-15 22:34:34,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=352767.0, ans=0.2 2024-09-15 22:34:47,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=352795.3333333333, ans=0.95 2024-09-15 22:34:49,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-15 22:34:50,271 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.014e+02 2.170e+02 2.369e+02 3.465e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 22:34:53,401 INFO [train.py:1198] (1/2) Epoch 20, batch 3100, loss[loss=0.2356, ctc_loss=0.1607, cr_loss=0.3749, over 20652.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1621, cr_loss=0.3817, over 4082448.83 frames. ], batch size: 71, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:34:59,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=352823.6666666667, ans=0.125 2024-09-15 22:35:18,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=352852.0, ans=0.125 2024-09-15 22:35:18,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=352852.0, ans=0.125 2024-09-15 22:35:42,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=352908.6666666667, ans=0.0 2024-09-15 22:35:45,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=352908.6666666667, ans=0.09899494936611666 2024-09-15 22:35:55,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=352937.0, ans=0.07 2024-09-15 22:35:59,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=352937.0, ans=0.125 2024-09-15 22:36:02,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=352937.0, ans=0.125 2024-09-15 22:36:03,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=352937.0, ans=0.125 2024-09-15 22:36:06,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=352937.0, ans=0.95 2024-09-15 22:36:09,308 INFO [train.py:1198] (1/2) Epoch 20, batch 3150, loss[loss=0.2235, ctc_loss=0.1492, cr_loss=0.3711, over 20784.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1613, cr_loss=0.381, over 4090971.78 frames. ], batch size: 53, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:36:31,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=12.0 2024-09-15 22:36:36,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=352993.6666666667, ans=0.125 2024-09-15 22:36:41,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=353022.0, ans=0.0 2024-09-15 22:37:07,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-15 22:37:08,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=353078.6666666667, ans=0.2 2024-09-15 22:37:21,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.112e+02 2.247e+02 2.439e+02 2.946e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-15 22:37:25,139 INFO [train.py:1198] (1/2) Epoch 20, batch 3200, loss[loss=0.2122, ctc_loss=0.1397, cr_loss=0.3624, over 20781.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1609, cr_loss=0.3805, over 4092723.16 frames. ], batch size: 56, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:37:57,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=353163.6666666667, ans=0.2 2024-09-15 22:38:06,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=353163.6666666667, ans=0.2 2024-09-15 22:38:07,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=353163.6666666667, ans=0.0 2024-09-15 22:38:24,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=353192.0, ans=0.125 2024-09-15 22:38:33,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353220.3333333333, ans=0.1 2024-09-15 22:38:46,842 INFO [train.py:1198] (1/2) Epoch 20, batch 3250, loss[loss=0.2467, ctc_loss=0.1655, cr_loss=0.4062, over 20846.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3822, over 4105776.10 frames. ], batch size: 65, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:38:59,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=353248.6666666667, ans=0.125 2024-09-15 22:39:11,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=353277.0, ans=0.2 2024-09-15 22:39:29,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=353305.3333333333, ans=0.125 2024-09-15 22:40:00,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.071e+02 2.269e+02 2.505e+02 3.417e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-15 22:40:02,198 INFO [train.py:1198] (1/2) Epoch 20, batch 3300, loss[loss=0.2361, ctc_loss=0.1598, cr_loss=0.3813, over 21062.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1612, cr_loss=0.3818, over 4099574.15 frames. ], batch size: 62, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:40:35,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353447.0, ans=0.1 2024-09-15 22:41:17,137 INFO [train.py:1198] (1/2) Epoch 20, batch 3350, loss[loss=0.1952, ctc_loss=0.1303, cr_loss=0.3244, over 20785.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1629, cr_loss=0.3845, over 4092826.77 frames. ], batch size: 53, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:41:19,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=353532.0, ans=0.125 2024-09-15 22:41:23,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353532.0, ans=0.1 2024-09-15 22:41:29,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=353532.0, ans=0.125 2024-09-15 22:41:32,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=353560.3333333333, ans=0.0 2024-09-15 22:41:49,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=353588.6666666667, ans=0.025 2024-09-15 22:41:55,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=353588.6666666667, ans=0.125 2024-09-15 22:42:01,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=353617.0, ans=0.125 2024-09-15 22:42:31,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.077e+02 2.182e+02 2.343e+02 3.801e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 22:42:32,468 INFO [train.py:1198] (1/2) Epoch 20, batch 3400, loss[loss=0.2029, ctc_loss=0.1367, cr_loss=0.3306, over 20889.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1637, cr_loss=0.3856, over 4086265.23 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:43:16,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=353758.6666666667, ans=0.0 2024-09-15 22:43:28,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-15 22:43:53,676 INFO [train.py:1198] (1/2) Epoch 20, batch 3450, loss[loss=0.2337, ctc_loss=0.1571, cr_loss=0.3828, over 20960.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1627, cr_loss=0.3837, over 4095778.13 frames. ], batch size: 55, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:44:06,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353815.3333333333, ans=0.125 2024-09-15 22:44:18,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=353843.6666666667, ans=0.2 2024-09-15 22:45:02,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=353928.6666666667, ans=0.0 2024-09-15 22:45:08,376 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.075e+02 2.215e+02 2.345e+02 3.245e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-15 22:45:08,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=353957.0, ans=0.125 2024-09-15 22:45:09,931 INFO [train.py:1198] (1/2) Epoch 20, batch 3500, loss[loss=0.2249, ctc_loss=0.1544, cr_loss=0.3527, over 20956.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1625, cr_loss=0.384, over 4100392.40 frames. ], batch size: 50, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:45:13,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353957.0, ans=0.0 2024-09-15 22:45:25,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-15 22:46:11,263 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:46:20,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2024-09-15 22:46:25,917 INFO [train.py:1198] (1/2) Epoch 20, batch 3550, loss[loss=0.1942, ctc_loss=0.1271, cr_loss=0.3358, over 20959.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1626, cr_loss=0.3841, over 4086996.42 frames. ], batch size: 51, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:46:30,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=354098.6666666667, ans=0.0 2024-09-15 22:46:34,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=354098.6666666667, ans=0.0 2024-09-15 22:47:39,227 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.109e+02 2.237e+02 2.459e+02 9.433e+02, threshold=4.474e+02, percent-clipped=1.0 2024-09-15 22:47:40,762 INFO [train.py:1198] (1/2) Epoch 20, batch 3600, loss[loss=0.2574, ctc_loss=0.1744, cr_loss=0.4146, over 20888.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1623, cr_loss=0.3836, over 4099358.99 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:47:41,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=354240.3333333333, ans=22.5 2024-09-15 22:48:03,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=354268.6666666667, ans=0.125 2024-09-15 22:48:08,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=354268.6666666667, ans=0.2 2024-09-15 22:48:34,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-15 22:48:47,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=354353.6666666667, ans=0.0 2024-09-15 22:48:56,317 INFO [train.py:1198] (1/2) Epoch 20, batch 3650, loss[loss=0.2223, ctc_loss=0.1475, cr_loss=0.3737, over 20984.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3834, over 4101270.20 frames. ], batch size: 52, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:48:58,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=354382.0, ans=0.025 2024-09-15 22:49:17,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=354410.3333333333, ans=0.2 2024-09-15 22:49:26,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=354410.3333333333, ans=0.125 2024-09-15 22:49:30,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=354438.6666666667, ans=0.025 2024-09-15 22:49:55,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-15 22:50:04,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=354495.3333333333, ans=0.0 2024-09-15 22:50:16,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.039e+02 2.213e+02 2.325e+02 4.277e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-15 22:50:16,637 INFO [train.py:1198] (1/2) Epoch 20, batch 3700, loss[loss=0.2386, ctc_loss=0.1607, cr_loss=0.3894, over 21049.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3832, over 4106808.60 frames. ], batch size: 56, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:51:21,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=354637.0, ans=0.0 2024-09-15 22:51:28,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=354637.0, ans=0.125 2024-09-15 22:51:31,643 INFO [train.py:1198] (1/2) Epoch 20, batch 3750, loss[loss=0.2204, ctc_loss=0.1484, cr_loss=0.3601, over 21071.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1616, cr_loss=0.3829, over 4113923.49 frames. ], batch size: 53, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:51:35,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.52 vs. limit=6.0 2024-09-15 22:51:38,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=354665.3333333333, ans=0.0 2024-09-15 22:52:11,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=354722.0, ans=0.2 2024-09-15 22:52:20,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 22:52:27,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=12.0 2024-09-15 22:52:34,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=354778.6666666667, ans=0.0 2024-09-15 22:52:47,398 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.063e+02 2.189e+02 2.332e+02 3.793e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 22:52:47,417 INFO [train.py:1198] (1/2) Epoch 20, batch 3800, loss[loss=0.2404, ctc_loss=0.1625, cr_loss=0.3893, over 21031.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.161, cr_loss=0.3824, over 4121201.75 frames. ], batch size: 62, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:52:50,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=354807.0, ans=0.125 2024-09-15 22:53:26,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=354863.6666666667, ans=0.0 2024-09-15 22:53:43,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354892.0, ans=0.125 2024-09-15 22:53:45,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=354892.0, ans=0.125 2024-09-15 22:53:54,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-15 22:54:02,842 INFO [train.py:1198] (1/2) Epoch 20, batch 3850, loss[loss=0.2755, ctc_loss=0.1914, cr_loss=0.4204, over 20169.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1612, cr_loss=0.3829, over 4121913.50 frames. ], batch size: 80, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:54:13,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=354948.6666666667, ans=0.2 2024-09-15 22:55:24,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.035e+02 2.181e+02 2.287e+02 3.384e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-15 22:55:24,117 INFO [train.py:1198] (1/2) Epoch 20, batch 3900, loss[loss=0.2564, ctc_loss=0.1735, cr_loss=0.4148, over 21005.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1604, cr_loss=0.3819, over 4125780.97 frames. ], batch size: 61, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:55:28,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=355090.3333333333, ans=0.125 2024-09-15 22:55:50,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=355118.6666666667, ans=0.04949747468305833 2024-09-15 22:55:54,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=355147.0, ans=0.125 2024-09-15 22:56:39,663 INFO [train.py:1198] (1/2) Epoch 20, batch 3950, loss[loss=0.2009, ctc_loss=0.1364, cr_loss=0.3226, over 20007.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1595, cr_loss=0.3802, over 4120104.59 frames. ], batch size: 44, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:57:07,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=355260.3333333333, ans=0.0 2024-09-15 22:57:41,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=355345.3333333333, ans=0.125 2024-09-15 22:57:55,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.069e+02 2.199e+02 2.358e+02 4.159e+02, threshold=4.398e+02, percent-clipped=0.0 2024-09-15 22:57:55,566 INFO [train.py:1198] (1/2) Epoch 20, batch 4000, loss[loss=0.2689, ctc_loss=0.1898, cr_loss=0.3956, over 18053.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1598, cr_loss=0.3802, over 4111077.26 frames. ], batch size: 108, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 22:57:57,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=355373.6666666667, ans=0.2 2024-09-15 22:58:03,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=355373.6666666667, ans=0.125 2024-09-15 22:58:09,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355402.0, ans=0.125 2024-09-15 22:58:40,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-09-15 22:58:56,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355487.0, ans=0.1 2024-09-15 22:59:11,158 INFO [train.py:1198] (1/2) Epoch 20, batch 4050, loss[loss=0.2409, ctc_loss=0.1632, cr_loss=0.3887, over 20956.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1606, cr_loss=0.3808, over 4113172.69 frames. ], batch size: 64, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 22:59:41,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=355572.0, ans=0.2 2024-09-15 22:59:56,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=355600.3333333333, ans=0.0 2024-09-15 23:00:05,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=355600.3333333333, ans=0.125 2024-09-15 23:00:15,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=355628.6666666667, ans=15.0 2024-09-15 23:00:27,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.111e+02 2.238e+02 2.395e+02 4.534e+02, threshold=4.475e+02, percent-clipped=1.0 2024-09-15 23:00:27,043 INFO [train.py:1198] (1/2) Epoch 20, batch 4100, loss[loss=0.2452, ctc_loss=0.1665, cr_loss=0.3934, over 20777.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3821, over 4099037.55 frames. ], batch size: 53, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:00:51,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355685.3333333333, ans=0.125 2024-09-15 23:01:48,729 INFO [train.py:1198] (1/2) Epoch 20, batch 4150, loss[loss=0.2356, ctc_loss=0.1578, cr_loss=0.3891, over 21071.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3818, over 4109232.97 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:01:56,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=355798.6666666667, ans=0.0 2024-09-15 23:02:02,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=355827.0, ans=0.125 2024-09-15 23:02:32,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=355855.3333333333, ans=0.125 2024-09-15 23:02:38,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=355883.6666666667, ans=0.125 2024-09-15 23:02:40,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355883.6666666667, ans=0.125 2024-09-15 23:02:54,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=355912.0, ans=0.2 2024-09-15 23:03:04,820 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.101e+02 2.241e+02 2.524e+02 3.949e+02, threshold=4.482e+02, percent-clipped=0.0 2024-09-15 23:03:04,848 INFO [train.py:1198] (1/2) Epoch 20, batch 4200, loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3792, over 20983.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3813, over 4104380.54 frames. ], batch size: 55, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:03:56,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=356025.3333333333, ans=0.2 2024-09-15 23:04:20,665 INFO [train.py:1198] (1/2) Epoch 20, batch 4250, loss[loss=0.2227, ctc_loss=0.1513, cr_loss=0.3568, over 20879.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1605, cr_loss=0.3808, over 4111965.10 frames. ], batch size: 54, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:04:22,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=356082.0, ans=0.125 2024-09-15 23:04:23,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=356082.0, ans=0.025 2024-09-15 23:04:53,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=356138.6666666667, ans=0.025 2024-09-15 23:05:04,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=356167.0, ans=0.125 2024-09-15 23:05:21,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=356195.3333333333, ans=0.0 2024-09-15 23:05:34,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=356223.6666666667, ans=10.0 2024-09-15 23:05:35,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.060e+02 2.153e+02 2.296e+02 3.155e+02, threshold=4.306e+02, percent-clipped=0.0 2024-09-15 23:05:35,856 INFO [train.py:1198] (1/2) Epoch 20, batch 4300, loss[loss=0.1947, ctc_loss=0.1286, cr_loss=0.3304, over 20967.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1598, cr_loss=0.379, over 4093681.08 frames. ], batch size: 49, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:06:48,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=356337.0, ans=0.07 2024-09-15 23:06:56,895 INFO [train.py:1198] (1/2) Epoch 20, batch 4350, loss[loss=0.2538, ctc_loss=0.1694, cr_loss=0.4221, over 20936.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1595, cr_loss=0.3795, over 4097926.50 frames. ], batch size: 60, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:07:15,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=356393.6666666667, ans=0.0 2024-09-15 23:07:17,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-09-15 23:07:40,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=356422.0, ans=0.125 2024-09-15 23:07:57,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356478.6666666667, ans=0.1 2024-09-15 23:08:12,614 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:08:13,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.052e+02 2.190e+02 2.351e+02 3.242e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-15 23:08:13,790 INFO [train.py:1198] (1/2) Epoch 20, batch 4400, loss[loss=0.2475, ctc_loss=0.169, cr_loss=0.3924, over 19579.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.16, cr_loss=0.3802, over 4102455.42 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:09:00,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=356592.0, ans=0.07 2024-09-15 23:09:09,454 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:09:19,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=356620.3333333333, ans=15.0 2024-09-15 23:09:26,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=356620.3333333333, ans=0.025 2024-09-15 23:09:28,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=356648.6666666667, ans=0.025 2024-09-15 23:09:29,514 INFO [train.py:1198] (1/2) Epoch 20, batch 4450, loss[loss=0.251, ctc_loss=0.168, cr_loss=0.4153, over 20971.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3817, over 4083848.19 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:09:34,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-15 23:09:41,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=356648.6666666667, ans=0.125 2024-09-15 23:10:07,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=356705.3333333333, ans=0.125 2024-09-15 23:10:16,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356733.6666666667, ans=0.1 2024-09-15 23:10:38,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=356762.0, ans=15.0 2024-09-15 23:10:45,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.102e+02 2.235e+02 2.380e+02 3.140e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-15 23:10:45,540 INFO [train.py:1198] (1/2) Epoch 20, batch 4500, loss[loss=0.2087, ctc_loss=0.1394, cr_loss=0.3463, over 20976.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1607, cr_loss=0.3819, over 4094555.27 frames. ], batch size: 50, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:10:47,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=356790.3333333333, ans=0.125 2024-09-15 23:10:47,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-15 23:11:04,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=22.5 2024-09-15 23:11:37,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=356875.3333333333, ans=0.125 2024-09-15 23:11:40,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=356875.3333333333, ans=0.0 2024-09-15 23:12:01,856 INFO [train.py:1198] (1/2) Epoch 20, batch 4550, loss[loss=0.1991, ctc_loss=0.1311, cr_loss=0.34, over 20955.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.159, cr_loss=0.3792, over 4100867.08 frames. ], batch size: 48, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:12:50,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356988.6666666667, ans=0.125 2024-09-15 23:12:56,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=357017.0, ans=0.0 2024-09-15 23:13:13,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357045.3333333333, ans=0.1 2024-09-15 23:13:17,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=357045.3333333333, ans=0.0 2024-09-15 23:13:20,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357045.3333333333, ans=0.1 2024-09-15 23:13:23,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.075e+02 2.185e+02 2.431e+02 3.897e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-15 23:13:23,532 INFO [train.py:1198] (1/2) Epoch 20, batch 4600, loss[loss=0.2791, ctc_loss=0.1914, cr_loss=0.4386, over 19444.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1601, cr_loss=0.3811, over 4093627.90 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:13:23,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=357073.6666666667, ans=0.2 2024-09-15 23:13:25,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=357073.6666666667, ans=0.125 2024-09-15 23:13:32,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=357073.6666666667, ans=0.0 2024-09-15 23:13:54,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=357130.3333333333, ans=0.1 2024-09-15 23:14:00,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357130.3333333333, ans=0.1 2024-09-15 23:14:09,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357158.6666666667, ans=0.125 2024-09-15 23:14:39,541 INFO [train.py:1198] (1/2) Epoch 20, batch 4650, loss[loss=0.2454, ctc_loss=0.1699, cr_loss=0.3777, over 21083.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.3818, over 4093266.20 frames. ], batch size: 59, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:14:49,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=357215.3333333333, ans=0.125 2024-09-15 23:15:08,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=357272.0, ans=0.04949747468305833 2024-09-15 23:15:37,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-15 23:15:54,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.105e+02 2.240e+02 2.465e+02 5.218e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-15 23:15:54,951 INFO [train.py:1198] (1/2) Epoch 20, batch 4700, loss[loss=0.2707, ctc_loss=0.1852, cr_loss=0.4279, over 20861.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.161, cr_loss=0.3815, over 4086205.86 frames. ], batch size: 57, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:15:58,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=357357.0, ans=0.0 2024-09-15 23:16:21,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=357385.3333333333, ans=0.125 2024-09-15 23:16:37,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=357413.6666666667, ans=0.125 2024-09-15 23:17:10,026 INFO [train.py:1198] (1/2) Epoch 20, batch 4750, loss[loss=0.1967, ctc_loss=0.1317, cr_loss=0.3251, over 20972.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3815, over 4090320.36 frames. ], batch size: 49, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:17:40,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=357555.3333333333, ans=0.125 2024-09-15 23:17:57,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=357583.6666666667, ans=0.5 2024-09-15 23:17:58,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357583.6666666667, ans=0.1 2024-09-15 23:18:28,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.067e+02 2.210e+02 2.422e+02 3.253e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-15 23:18:28,486 INFO [train.py:1198] (1/2) Epoch 20, batch 4800, loss[loss=0.2341, ctc_loss=0.1598, cr_loss=0.3716, over 20983.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1606, cr_loss=0.3824, over 4104652.99 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:18:33,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=357640.3333333333, ans=0.025 2024-09-15 23:18:52,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=357668.6666666667, ans=0.125 2024-09-15 23:19:15,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357725.3333333333, ans=0.1 2024-09-15 23:19:21,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-15 23:19:34,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=357753.6666666667, ans=0.025 2024-09-15 23:19:43,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=357753.6666666667, ans=0.05 2024-09-15 23:19:46,407 INFO [train.py:1198] (1/2) Epoch 20, batch 4850, loss[loss=0.2749, ctc_loss=0.1888, cr_loss=0.4301, over 20673.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.3829, over 4108715.70 frames. ], batch size: 68, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:19:50,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=357782.0, ans=0.025 2024-09-15 23:20:40,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=357867.0, ans=0.0 2024-09-15 23:21:01,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.102e+02 2.286e+02 2.422e+02 6.933e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-15 23:21:01,344 INFO [train.py:1198] (1/2) Epoch 20, batch 4900, loss[loss=0.2286, ctc_loss=0.1557, cr_loss=0.3644, over 20639.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3821, over 4103549.98 frames. ], batch size: 68, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:21:06,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357923.6666666667, ans=0.1 2024-09-15 23:21:22,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=357952.0, ans=0.0 2024-09-15 23:21:25,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=357952.0, ans=0.2 2024-09-15 23:21:28,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=357952.0, ans=0.125 2024-09-15 23:21:36,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2024-09-15 23:21:55,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-15 23:22:01,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=358037.0, ans=0.125 2024-09-15 23:22:15,467 INFO [train.py:1198] (1/2) Epoch 20, batch 4950, loss[loss=0.2498, ctc_loss=0.1735, cr_loss=0.3813, over 21066.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3823, over 4081348.85 frames. ], batch size: 59, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:22:37,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-15 23:22:47,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=358122.0, ans=0.125 2024-09-15 23:23:09,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=358150.3333333333, ans=0.2 2024-09-15 23:23:19,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358178.6666666667, ans=0.1 2024-09-15 23:23:30,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.099e+02 2.250e+02 2.616e+02 5.235e+02, threshold=4.499e+02, percent-clipped=1.0 2024-09-15 23:23:30,598 INFO [train.py:1198] (1/2) Epoch 20, batch 5000, loss[loss=0.2473, ctc_loss=0.1685, cr_loss=0.3939, over 20967.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3822, over 4078607.41 frames. ], batch size: 64, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:23:42,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=358207.0, ans=0.0 2024-09-15 23:23:51,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358235.3333333333, ans=0.1 2024-09-15 23:24:08,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358263.6666666667, ans=0.1 2024-09-15 23:24:13,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=358292.0, ans=0.125 2024-09-15 23:24:15,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=358292.0, ans=0.025 2024-09-15 23:24:45,721 INFO [train.py:1198] (1/2) Epoch 20, batch 5050, loss[loss=0.1994, ctc_loss=0.1334, cr_loss=0.3298, over 20963.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1617, cr_loss=0.3831, over 4090062.28 frames. ], batch size: 51, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:24:50,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=358348.6666666667, ans=0.0 2024-09-15 23:25:02,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=358377.0, ans=0.125 2024-09-15 23:25:23,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=358405.3333333333, ans=0.125 2024-09-15 23:26:03,407 INFO [train.py:1198] (1/2) Epoch 20, batch 5100, loss[loss=0.2124, ctc_loss=0.1433, cr_loss=0.3455, over 20978.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.382, over 4083786.14 frames. ], batch size: 52, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:26:04,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.062e+02 2.229e+02 2.442e+02 3.604e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 23:26:26,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358518.6666666667, ans=0.1 2024-09-15 23:26:32,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=358547.0, ans=0.025 2024-09-15 23:26:32,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-15 23:26:50,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358575.3333333333, ans=0.125 2024-09-15 23:27:17,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358603.6666666667, ans=0.125 2024-09-15 23:27:17,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=358603.6666666667, ans=0.04949747468305833 2024-09-15 23:27:21,177 INFO [train.py:1198] (1/2) Epoch 20, batch 5150, loss[loss=0.2608, ctc_loss=0.1784, cr_loss=0.4116, over 21024.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1606, cr_loss=0.3809, over 4086038.46 frames. ], batch size: 63, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:27:27,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358632.0, ans=0.1 2024-09-15 23:27:37,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-15 23:27:46,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2024-09-15 23:28:02,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=358688.6666666667, ans=0.125 2024-09-15 23:28:10,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=358717.0, ans=0.125 2024-09-15 23:28:10,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=358717.0, ans=0.125 2024-09-15 23:28:34,780 INFO [train.py:1198] (1/2) Epoch 20, batch 5200, loss[loss=0.27, ctc_loss=0.1846, cr_loss=0.427, over 18352.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3819, over 4083882.51 frames. ], batch size: 108, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:28:36,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.056e+02 2.149e+02 2.322e+02 4.240e+02, threshold=4.298e+02, percent-clipped=0.0 2024-09-15 23:28:48,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358802.0, ans=0.1 2024-09-15 23:28:55,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=358802.0, ans=0.2 2024-09-15 23:29:07,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=358830.3333333333, ans=0.0 2024-09-15 23:29:40,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-15 23:29:46,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-15 23:29:48,815 INFO [train.py:1198] (1/2) Epoch 20, batch 5250, loss[loss=0.2266, ctc_loss=0.1527, cr_loss=0.3699, over 20969.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3802, over 4085834.11 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:29:58,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=358915.3333333333, ans=0.125 2024-09-15 23:30:09,837 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:30:30,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=358972.0, ans=0.0 2024-09-15 23:30:33,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=359000.3333333333, ans=0.05 2024-09-15 23:30:55,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=359028.6666666667, ans=0.0 2024-09-15 23:31:02,590 INFO [train.py:1198] (1/2) Epoch 20, batch 5300, loss[loss=0.2473, ctc_loss=0.1672, cr_loss=0.4004, over 20974.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1612, cr_loss=0.3819, over 4081249.81 frames. ], batch size: 64, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:31:04,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.060e+02 2.201e+02 2.376e+02 3.191e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-15 23:31:04,331 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:31:16,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=359085.3333333333, ans=0.125 2024-09-15 23:31:52,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-15 23:32:11,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-15 23:32:16,989 INFO [train.py:1198] (1/2) Epoch 20, batch 5350, loss[loss=0.2477, ctc_loss=0.1677, cr_loss=0.3999, over 20948.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1618, cr_loss=0.3819, over 4066236.14 frames. ], batch size: 64, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:32:33,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=359227.0, ans=0.5 2024-09-15 23:32:45,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=359255.3333333333, ans=0.0 2024-09-15 23:33:00,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=359283.6666666667, ans=0.5 2024-09-15 23:33:32,037 INFO [train.py:1198] (1/2) Epoch 20, batch 5400, loss[loss=0.256, ctc_loss=0.1736, cr_loss=0.4121, over 20847.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3811, over 4075653.22 frames. ], batch size: 65, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:33:33,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.056e+02 2.189e+02 2.476e+02 3.815e+02, threshold=4.379e+02, percent-clipped=0.0 2024-09-15 23:33:55,755 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:34:45,927 INFO [train.py:1198] (1/2) Epoch 20, batch 5450, loss[loss=0.2339, ctc_loss=0.1615, cr_loss=0.3621, over 20963.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1611, cr_loss=0.3807, over 4076810.12 frames. ], batch size: 64, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:34:58,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359482.0, ans=0.1 2024-09-15 23:35:43,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=359567.0, ans=0.125 2024-09-15 23:35:53,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359595.3333333333, ans=0.1 2024-09-15 23:35:55,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-15 23:36:03,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=359595.3333333333, ans=0.125 2024-09-15 23:36:06,322 INFO [train.py:1198] (1/2) Epoch 20, batch 5500, loss[loss=0.2433, ctc_loss=0.1633, cr_loss=0.3999, over 20983.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3809, over 4076117.47 frames. ], batch size: 58, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:36:08,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=359623.6666666667, ans=0.125 2024-09-15 23:36:09,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.072e+02 2.194e+02 2.346e+02 3.622e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 23:36:48,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359680.3333333333, ans=0.1 2024-09-15 23:36:57,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=359708.6666666667, ans=0.2 2024-09-15 23:36:59,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=359708.6666666667, ans=0.0 2024-09-15 23:37:17,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=359737.0, ans=0.0 2024-09-15 23:37:21,069 INFO [train.py:1198] (1/2) Epoch 20, batch 5550, loss[loss=0.2455, ctc_loss=0.1689, cr_loss=0.3827, over 20690.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1604, cr_loss=0.3801, over 4079597.41 frames. ], batch size: 71, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:38:16,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-15 23:38:25,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=359878.6666666667, ans=0.2 2024-09-15 23:38:35,409 INFO [train.py:1198] (1/2) Epoch 20, batch 5600, loss[loss=0.2494, ctc_loss=0.1675, cr_loss=0.4093, over 20880.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1605, cr_loss=0.3803, over 4085068.83 frames. ], batch size: 57, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:38:38,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.100e+02 2.212e+02 2.372e+02 4.536e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-15 23:38:54,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-15 23:39:27,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=359992.0, ans=0.0 2024-09-15 23:39:28,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=359992.0, ans=0.125 2024-09-15 23:39:37,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360020.3333333333, ans=0.125 2024-09-15 23:39:37,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=360020.3333333333, ans=0.125 2024-09-15 23:39:46,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=360020.3333333333, ans=0.0 2024-09-15 23:39:50,370 INFO [train.py:1198] (1/2) Epoch 20, batch 5650, loss[loss=0.2551, ctc_loss=0.1702, cr_loss=0.4242, over 20841.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.3803, over 4073430.45 frames. ], batch size: 59, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:40:26,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=360105.3333333333, ans=0.125 2024-09-15 23:40:42,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=360133.6666666667, ans=0.125 2024-09-15 23:40:48,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-15 23:41:00,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=22.5 2024-09-15 23:41:04,317 INFO [train.py:1198] (1/2) Epoch 20, batch 5700, loss[loss=0.2531, ctc_loss=0.1722, cr_loss=0.4041, over 20940.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3819, over 4089944.01 frames. ], batch size: 64, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:41:07,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.066e+02 2.213e+02 2.381e+02 6.070e+02, threshold=4.426e+02, percent-clipped=1.0 2024-09-15 23:41:26,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=360218.6666666667, ans=0.125 2024-09-15 23:41:56,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360275.3333333333, ans=0.125 2024-09-15 23:42:18,297 INFO [train.py:1198] (1/2) Epoch 20, batch 5750, loss[loss=0.2023, ctc_loss=0.1329, cr_loss=0.3469, over 20286.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1606, cr_loss=0.3811, over 4095317.96 frames. ], batch size: 45, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:42:21,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=360332.0, ans=0.125 2024-09-15 23:42:28,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=360332.0, ans=0.2 2024-09-15 23:42:57,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=360388.6666666667, ans=0.125 2024-09-15 23:43:26,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360445.3333333333, ans=0.1 2024-09-15 23:43:32,359 INFO [train.py:1198] (1/2) Epoch 20, batch 5800, loss[loss=0.2814, ctc_loss=0.1952, cr_loss=0.4311, over 18520.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3805, over 4100833.89 frames. ], batch size: 109, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:43:35,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.103e+02 2.226e+02 2.380e+02 4.136e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 23:43:50,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=360502.0, ans=0.125 2024-09-15 23:43:58,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=360502.0, ans=15.0 2024-09-15 23:44:07,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=360530.3333333333, ans=0.0 2024-09-15 23:44:41,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=360587.0, ans=0.125 2024-09-15 23:44:49,839 INFO [train.py:1198] (1/2) Epoch 20, batch 5850, loss[loss=0.2322, ctc_loss=0.1579, cr_loss=0.3715, over 20807.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1608, cr_loss=0.381, over 4092344.12 frames. ], batch size: 53, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:45:10,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=360643.6666666667, ans=0.0 2024-09-15 23:45:23,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-15 23:45:57,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=360728.6666666667, ans=0.125 2024-09-15 23:46:07,110 INFO [train.py:1198] (1/2) Epoch 20, batch 5900, loss[loss=0.3031, ctc_loss=0.2096, cr_loss=0.4676, over 18501.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.38, over 4084458.85 frames. ], batch size: 108, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:46:10,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.107e+02 2.232e+02 2.415e+02 3.409e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-15 23:46:31,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=360785.3333333333, ans=0.125 2024-09-15 23:46:33,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=360785.3333333333, ans=0.0 2024-09-15 23:47:20,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=360898.6666666667, ans=0.125 2024-09-15 23:47:21,587 INFO [train.py:1198] (1/2) Epoch 20, batch 5950, loss[loss=0.2518, ctc_loss=0.1714, cr_loss=0.4018, over 20968.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1594, cr_loss=0.3782, over 4074010.18 frames. ], batch size: 64, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:47:32,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2024-09-15 23:47:33,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360898.6666666667, ans=0.0 2024-09-15 23:47:47,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=360927.0, ans=0.0 2024-09-15 23:47:47,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=360927.0, ans=0.125 2024-09-15 23:48:24,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361012.0, ans=0.125 2024-09-15 23:48:35,933 INFO [train.py:1198] (1/2) Epoch 20, batch 6000, loss[loss=0.2048, ctc_loss=0.137, cr_loss=0.3387, over 20182.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1598, cr_loss=0.3786, over 4076097.51 frames. ], batch size: 45, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:48:35,934 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 23:48:55,180 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7948, 3.5047, 4.2058, 4.1593, 3.5270, 4.1230, 2.9753, 2.9376], device='cuda:1') 2024-09-15 23:48:59,847 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.044, ctc_loss=0.044, cr_loss=1.069e-14, over 944034.00 frames. 2024-09-15 23:48:59,848 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-15 23:49:02,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.070e+02 2.162e+02 2.360e+02 3.870e+02, threshold=4.325e+02, percent-clipped=0.0 2024-09-15 23:49:14,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=361068.6666666667, ans=0.125 2024-09-15 23:49:32,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=361097.0, ans=0.0 2024-09-15 23:49:33,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=361097.0, ans=0.0 2024-09-15 23:49:41,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=361097.0, ans=0.1 2024-09-15 23:50:13,699 INFO [train.py:1198] (1/2) Epoch 20, batch 6050, loss[loss=0.2234, ctc_loss=0.1486, cr_loss=0.374, over 20857.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1597, cr_loss=0.3794, over 4089224.74 frames. ], batch size: 57, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:50:15,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=361182.0, ans=0.0 2024-09-15 23:51:13,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=361295.3333333333, ans=0.025 2024-09-15 23:51:28,466 INFO [train.py:1198] (1/2) Epoch 20, batch 6100, loss[loss=0.2585, ctc_loss=0.178, cr_loss=0.4026, over 19398.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3802, over 4081326.90 frames. ], batch size: 90, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:51:32,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.092e+02 2.238e+02 2.378e+02 3.005e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-15 23:51:34,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361323.6666666667, ans=0.1 2024-09-15 23:51:41,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-15 23:51:52,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=361352.0, ans=0.2 2024-09-15 23:51:56,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-15 23:52:02,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=361380.3333333333, ans=0.125 2024-09-15 23:52:02,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=361380.3333333333, ans=0.0 2024-09-15 23:52:21,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-15 23:52:40,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=361437.0, ans=0.125 2024-09-15 23:52:43,175 INFO [train.py:1198] (1/2) Epoch 20, batch 6150, loss[loss=0.2359, ctc_loss=0.1643, cr_loss=0.3583, over 20288.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.16, cr_loss=0.3793, over 4085926.26 frames. ], batch size: 74, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:52:47,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=361465.3333333333, ans=0.125 2024-09-15 23:53:59,414 INFO [train.py:1198] (1/2) Epoch 20, batch 6200, loss[loss=0.241, ctc_loss=0.1631, cr_loss=0.3893, over 21063.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1603, cr_loss=0.3792, over 4078672.46 frames. ], batch size: 56, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:54:03,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.088e+02 2.244e+02 2.482e+02 3.656e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-15 23:54:14,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=361635.3333333333, ans=0.125 2024-09-15 23:54:59,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361720.3333333333, ans=0.1 2024-09-15 23:55:08,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=361720.3333333333, ans=0.125 2024-09-15 23:55:14,131 INFO [train.py:1198] (1/2) Epoch 20, batch 6250, loss[loss=0.301, ctc_loss=0.2183, cr_loss=0.4133, over 15058.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1603, cr_loss=0.3785, over 4037203.39 frames. ], batch size: 149, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:55:26,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=361748.6666666667, ans=0.025 2024-09-15 23:55:40,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-15 23:55:54,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=361805.3333333333, ans=0.125 2024-09-15 23:56:05,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361833.6666666667, ans=0.0 2024-09-15 23:56:18,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=361862.0, ans=0.125 2024-09-15 23:56:26,861 INFO [train.py:1198] (1/2) Epoch 20, batch 6300, loss[loss=0.2865, ctc_loss=0.2009, cr_loss=0.4278, over 14242.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1637, cr_loss=0.3811, over 3950974.20 frames. ], batch size: 150, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:56:31,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.191e+02 2.344e+02 2.616e+02 4.767e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-15 23:56:36,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2024-09-15 23:56:37,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=361890.3333333333, ans=0.125 2024-09-15 23:56:46,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=361918.6666666667, ans=0.0 2024-09-15 23:57:32,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=362003.6666666667, ans=0.125 2024-09-15 23:57:37,898 INFO [train.py:1198] (1/2) Epoch 20, batch 6350, loss[loss=0.2962, ctc_loss=0.2084, cr_loss=0.4393, over 18144.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1693, cr_loss=0.386, over 3826089.63 frames. ], batch size: 108, lr: 4.12e-03, grad_scale: 8.0 2024-09-15 23:57:45,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362032.0, ans=0.1 2024-09-15 23:58:02,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362060.3333333333, ans=0.125 2024-09-15 23:58:12,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=362088.6666666667, ans=10.0 2024-09-15 23:59:24,879 INFO [train.py:1198] (1/2) Epoch 21, batch 0, loss[loss=0.2047, ctc_loss=0.1379, cr_loss=0.3341, over 21069.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1379, cr_loss=0.3341, over 21069.00 frames. ], batch size: 53, lr: 4.02e-03, grad_scale: 16.0 2024-09-15 23:59:24,879 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-15 23:59:46,393 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.04406, ctc_loss=0.04406, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 23:59:46,393 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 00:00:05,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.361e+02 2.567e+02 2.706e+02 3.216e+02, threshold=5.134e+02, percent-clipped=0.0 2024-09-16 00:00:16,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=362204.8333333333, ans=0.025 2024-09-16 00:00:31,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=362233.1666666667, ans=0.0 2024-09-16 00:00:56,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=362261.5, ans=10.0 2024-09-16 00:01:02,043 INFO [train.py:1198] (1/2) Epoch 21, batch 50, loss[loss=0.2208, ctc_loss=0.147, cr_loss=0.3691, over 20785.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1593, cr_loss=0.3806, over 929377.08 frames. ], batch size: 53, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:01:08,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=362289.8333333333, ans=0.0 2024-09-16 00:01:11,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=362289.8333333333, ans=0.125 2024-09-16 00:01:23,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2024-09-16 00:01:43,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=362346.5, ans=0.125 2024-09-16 00:01:50,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-16 00:02:17,014 INFO [train.py:1198] (1/2) Epoch 21, batch 100, loss[loss=0.2152, ctc_loss=0.1449, cr_loss=0.3511, over 20830.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1628, cr_loss=0.3843, over 1615504.90 frames. ], batch size: 59, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:02:36,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.076e+02 2.185e+02 2.385e+02 3.441e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:03:07,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=362516.5, ans=0.125 2024-09-16 00:03:12,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=22.5 2024-09-16 00:03:33,164 INFO [train.py:1198] (1/2) Epoch 21, batch 150, loss[loss=0.2632, ctc_loss=0.1817, cr_loss=0.4075, over 20655.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1631, cr_loss=0.3838, over 2150330.57 frames. ], batch size: 68, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:04:03,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=362629.8333333333, ans=0.04949747468305833 2024-09-16 00:04:16,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=362629.8333333333, ans=0.2 2024-09-16 00:04:32,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=362658.1666666667, ans=0.125 2024-09-16 00:04:32,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362658.1666666667, ans=0.1 2024-09-16 00:04:34,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=22.5 2024-09-16 00:04:45,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-16 00:04:50,586 INFO [train.py:1198] (1/2) Epoch 21, batch 200, loss[loss=0.2466, ctc_loss=0.1661, cr_loss=0.4024, over 20975.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1626, cr_loss=0.3841, over 2591106.80 frames. ], batch size: 64, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:04:54,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-16 00:05:10,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.091e+02 2.185e+02 2.311e+02 3.861e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:05:40,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=362771.5, ans=0.04949747468305833 2024-09-16 00:06:09,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=362828.1666666667, ans=0.0 2024-09-16 00:06:10,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362828.1666666667, ans=0.125 2024-09-16 00:06:13,739 INFO [train.py:1198] (1/2) Epoch 21, batch 250, loss[loss=0.251, ctc_loss=0.1725, cr_loss=0.3924, over 21067.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.162, cr_loss=0.3829, over 2932819.35 frames. ], batch size: 62, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:06:34,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=362884.8333333333, ans=22.5 2024-09-16 00:06:45,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=362913.1666666667, ans=0.125 2024-09-16 00:07:29,593 INFO [train.py:1198] (1/2) Epoch 21, batch 300, loss[loss=0.2392, ctc_loss=0.1594, cr_loss=0.399, over 20876.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1617, cr_loss=0.3832, over 3183001.94 frames. ], batch size: 57, lr: 4.01e-03, grad_scale: 16.0 2024-09-16 00:07:49,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.066e+02 2.181e+02 2.306e+02 4.450e+02, threshold=4.363e+02, percent-clipped=1.0 2024-09-16 00:07:55,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=363026.5, ans=0.09899494936611666 2024-09-16 00:07:56,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=363026.5, ans=0.025 2024-09-16 00:08:07,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=363054.8333333333, ans=0.0 2024-09-16 00:08:35,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2024-09-16 00:08:39,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363111.5, ans=0.1 2024-09-16 00:08:39,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-16 00:08:45,069 INFO [train.py:1198] (1/2) Epoch 21, batch 350, loss[loss=0.2394, ctc_loss=0.1617, cr_loss=0.3885, over 21087.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3805, over 3371347.38 frames. ], batch size: 59, lr: 4.01e-03, grad_scale: 16.0 2024-09-16 00:09:19,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-09-16 00:09:40,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=363224.8333333333, ans=0.2 2024-09-16 00:09:47,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-16 00:09:54,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=363253.1666666667, ans=0.125 2024-09-16 00:10:00,793 INFO [train.py:1198] (1/2) Epoch 21, batch 400, loss[loss=0.3006, ctc_loss=0.2131, cr_loss=0.4375, over 14877.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3811, over 3504092.43 frames. ], batch size: 149, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:10:20,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.081e+02 2.215e+02 2.452e+02 3.087e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-16 00:10:29,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=363338.1666666667, ans=0.0 2024-09-16 00:10:30,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363338.1666666667, ans=0.125 2024-09-16 00:10:43,725 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:11:13,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=363394.8333333333, ans=0.0 2024-09-16 00:11:14,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=12.0 2024-09-16 00:11:22,590 INFO [train.py:1198] (1/2) Epoch 21, batch 450, loss[loss=0.284, ctc_loss=0.1965, cr_loss=0.4373, over 20657.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1598, cr_loss=0.3799, over 3643976.10 frames. ], batch size: 71, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:11:24,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-16 00:11:45,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=363451.5, ans=0.2 2024-09-16 00:11:45,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363451.5, ans=0.125 2024-09-16 00:12:06,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=363508.1666666667, ans=0.0 2024-09-16 00:12:21,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363536.5, ans=0.125 2024-09-16 00:12:35,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=363536.5, ans=0.0 2024-09-16 00:12:37,912 INFO [train.py:1198] (1/2) Epoch 21, batch 500, loss[loss=0.2348, ctc_loss=0.1591, cr_loss=0.3788, over 20952.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1599, cr_loss=0.3799, over 3737946.61 frames. ], batch size: 48, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:12:57,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.112e+02 2.252e+02 2.412e+02 5.083e+02, threshold=4.504e+02, percent-clipped=2.0 2024-09-16 00:13:05,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=363593.1666666667, ans=0.125 2024-09-16 00:13:53,132 INFO [train.py:1198] (1/2) Epoch 21, batch 550, loss[loss=0.2761, ctc_loss=0.1916, cr_loss=0.4227, over 20677.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1595, cr_loss=0.3794, over 3811889.50 frames. ], batch size: 71, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:14:13,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363734.8333333333, ans=0.1 2024-09-16 00:14:18,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363734.8333333333, ans=0.1 2024-09-16 00:14:26,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=363763.1666666667, ans=0.125 2024-09-16 00:14:53,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.11 vs. limit=15.0 2024-09-16 00:14:55,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363819.8333333333, ans=0.1 2024-09-16 00:14:58,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363819.8333333333, ans=0.0 2024-09-16 00:15:06,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363848.1666666667, ans=0.125 2024-09-16 00:15:07,652 INFO [train.py:1198] (1/2) Epoch 21, batch 600, loss[loss=0.2579, ctc_loss=0.1792, cr_loss=0.3938, over 20626.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1604, cr_loss=0.381, over 3870029.97 frames. ], batch size: 71, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:15:27,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.055e+02 2.279e+02 2.491e+02 5.267e+02, threshold=4.558e+02, percent-clipped=1.0 2024-09-16 00:15:33,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363876.5, ans=0.1 2024-09-16 00:16:23,290 INFO [train.py:1198] (1/2) Epoch 21, batch 650, loss[loss=0.2084, ctc_loss=0.1369, cr_loss=0.3578, over 20963.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1597, cr_loss=0.3799, over 3921445.72 frames. ], batch size: 51, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:16:31,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-16 00:16:35,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=363989.8333333333, ans=0.025 2024-09-16 00:16:35,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=363989.8333333333, ans=0.2 2024-09-16 00:16:53,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364018.1666666667, ans=0.1 2024-09-16 00:17:30,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=364103.1666666667, ans=0.125 2024-09-16 00:17:45,039 INFO [train.py:1198] (1/2) Epoch 21, batch 700, loss[loss=0.2372, ctc_loss=0.1611, cr_loss=0.3803, over 21075.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1588, cr_loss=0.3789, over 3970046.83 frames. ], batch size: 59, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:17:52,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=364131.5, ans=0.0 2024-09-16 00:18:04,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.115e+02 2.240e+02 2.424e+02 3.755e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 00:18:06,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=364159.8333333333, ans=0.2 2024-09-16 00:19:00,226 INFO [train.py:1198] (1/2) Epoch 21, batch 750, loss[loss=0.3075, ctc_loss=0.2204, cr_loss=0.4355, over 14643.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1597, cr_loss=0.3801, over 3993112.00 frames. ], batch size: 149, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:19:18,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=364301.5, ans=0.0 2024-09-16 00:19:37,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-16 00:20:01,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-09-16 00:20:01,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=22.5 2024-09-16 00:20:15,733 INFO [train.py:1198] (1/2) Epoch 21, batch 800, loss[loss=0.2145, ctc_loss=0.1444, cr_loss=0.3504, over 20981.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1601, cr_loss=0.3809, over 4019852.80 frames. ], batch size: 51, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:20:17,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-16 00:20:35,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=364443.1666666667, ans=0.0 2024-09-16 00:20:36,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.112e+02 2.255e+02 2.439e+02 3.218e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-16 00:21:25,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364528.1666666667, ans=0.1 2024-09-16 00:21:30,945 INFO [train.py:1198] (1/2) Epoch 21, batch 850, loss[loss=0.2401, ctc_loss=0.1632, cr_loss=0.3842, over 19500.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.3819, over 4024018.74 frames. ], batch size: 90, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:21:34,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=364556.5, ans=0.2 2024-09-16 00:21:40,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=364556.5, ans=0.125 2024-09-16 00:21:47,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=364584.8333333333, ans=0.125 2024-09-16 00:22:25,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364641.5, ans=0.0 2024-09-16 00:22:52,335 INFO [train.py:1198] (1/2) Epoch 21, batch 900, loss[loss=0.2279, ctc_loss=0.1582, cr_loss=0.3485, over 20577.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1603, cr_loss=0.381, over 4039750.93 frames. ], batch size: 75, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:22:54,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=364698.1666666667, ans=0.125 2024-09-16 00:23:02,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=364698.1666666667, ans=0.125 2024-09-16 00:23:13,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.096e+02 2.211e+02 2.363e+02 4.258e+02, threshold=4.422e+02, percent-clipped=0.0 2024-09-16 00:23:30,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364754.8333333333, ans=0.0 2024-09-16 00:23:51,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364811.5, ans=0.125 2024-09-16 00:23:56,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-09-16 00:24:00,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=364811.5, ans=0.0 2024-09-16 00:24:07,610 INFO [train.py:1198] (1/2) Epoch 21, batch 950, loss[loss=0.2044, ctc_loss=0.1361, cr_loss=0.3413, over 20958.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1594, cr_loss=0.3796, over 4057712.06 frames. ], batch size: 49, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:24:38,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-16 00:24:48,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=364896.5, ans=0.125 2024-09-16 00:24:48,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=364896.5, ans=0.0 2024-09-16 00:24:57,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=364924.8333333333, ans=0.0 2024-09-16 00:25:08,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364953.1666666667, ans=0.1 2024-09-16 00:25:23,199 INFO [train.py:1198] (1/2) Epoch 21, batch 1000, loss[loss=0.2425, ctc_loss=0.167, cr_loss=0.3778, over 20850.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1596, cr_loss=0.3797, over 4076941.36 frames. ], batch size: 65, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:25:29,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-16 00:25:44,008 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.058e+02 2.174e+02 2.304e+02 5.292e+02, threshold=4.348e+02, percent-clipped=1.0 2024-09-16 00:25:58,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=365038.1666666667, ans=0.0 2024-09-16 00:26:39,013 INFO [train.py:1198] (1/2) Epoch 21, batch 1050, loss[loss=0.2462, ctc_loss=0.1715, cr_loss=0.3731, over 20655.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1593, cr_loss=0.3794, over 4082947.33 frames. ], batch size: 66, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:27:28,091 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:27:33,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=365208.1666666667, ans=0.125 2024-09-16 00:27:49,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=365236.5, ans=0.125 2024-09-16 00:27:54,909 INFO [train.py:1198] (1/2) Epoch 21, batch 1100, loss[loss=0.2286, ctc_loss=0.1512, cr_loss=0.3871, over 20918.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3807, over 4094555.80 frames. ], batch size: 60, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:28:10,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365293.1666666667, ans=0.1 2024-09-16 00:28:18,890 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.044e+02 2.214e+02 2.380e+02 3.122e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-16 00:28:42,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-09-16 00:28:45,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=12.0 2024-09-16 00:29:16,241 INFO [train.py:1198] (1/2) Epoch 21, batch 1150, loss[loss=0.2248, ctc_loss=0.153, cr_loss=0.3593, over 21070.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1599, cr_loss=0.3806, over 4097044.50 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2024-09-16 00:29:32,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=365434.8333333333, ans=0.95 2024-09-16 00:29:39,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=365434.8333333333, ans=0.0 2024-09-16 00:29:39,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=365434.8333333333, ans=0.125 2024-09-16 00:29:40,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=365434.8333333333, ans=0.025 2024-09-16 00:29:50,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=365463.1666666667, ans=0.125 2024-09-16 00:30:18,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=365519.8333333333, ans=0.125 2024-09-16 00:30:21,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=365519.8333333333, ans=0.125 2024-09-16 00:30:24,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=22.5 2024-09-16 00:30:31,604 INFO [train.py:1198] (1/2) Epoch 21, batch 1200, loss[loss=0.2221, ctc_loss=0.1483, cr_loss=0.3692, over 20790.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1602, cr_loss=0.3808, over 4080241.80 frames. ], batch size: 53, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:30:31,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=365548.1666666667, ans=0.125 2024-09-16 00:30:32,126 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:30:54,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.073e+02 2.184e+02 2.405e+02 3.363e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-16 00:31:07,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365604.8333333333, ans=0.1 2024-09-16 00:31:13,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=365604.8333333333, ans=0.0 2024-09-16 00:31:28,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=365633.1666666667, ans=0.125 2024-09-16 00:31:47,861 INFO [train.py:1198] (1/2) Epoch 21, batch 1250, loss[loss=0.2459, ctc_loss=0.1661, cr_loss=0.3991, over 20934.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1604, cr_loss=0.3812, over 4086553.72 frames. ], batch size: 60, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:32:03,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=365718.1666666667, ans=0.2 2024-09-16 00:32:23,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=365746.5, ans=0.125 2024-09-16 00:32:40,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=365774.8333333333, ans=0.125 2024-09-16 00:32:50,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=365803.1666666667, ans=0.0 2024-09-16 00:32:55,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-16 00:33:03,731 INFO [train.py:1198] (1/2) Epoch 21, batch 1300, loss[loss=0.2543, ctc_loss=0.1717, cr_loss=0.4131, over 20323.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3821, over 4100637.04 frames. ], batch size: 74, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:33:08,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-16 00:33:25,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.095e+02 2.206e+02 2.422e+02 5.284e+02, threshold=4.411e+02, percent-clipped=2.0 2024-09-16 00:34:24,993 INFO [train.py:1198] (1/2) Epoch 21, batch 1350, loss[loss=0.2655, ctc_loss=0.1823, cr_loss=0.416, over 20994.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1606, cr_loss=0.3811, over 4103503.93 frames. ], batch size: 63, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:34:28,523 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:34:35,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=365973.1666666667, ans=0.125 2024-09-16 00:35:24,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366086.5, ans=0.1 2024-09-16 00:35:33,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=366086.5, ans=0.0 2024-09-16 00:35:40,582 INFO [train.py:1198] (1/2) Epoch 21, batch 1400, loss[loss=0.2566, ctc_loss=0.176, cr_loss=0.4034, over 20047.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.381, over 4110351.39 frames. ], batch size: 80, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:36:03,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.049e+02 2.185e+02 2.335e+02 3.471e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:36:19,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-16 00:36:45,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=366228.1666666667, ans=0.025 2024-09-16 00:36:56,298 INFO [train.py:1198] (1/2) Epoch 21, batch 1450, loss[loss=0.2291, ctc_loss=0.1548, cr_loss=0.372, over 20780.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3823, over 4078726.92 frames. ], batch size: 53, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:37:10,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=366284.8333333333, ans=0.0 2024-09-16 00:37:29,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=366313.1666666667, ans=0.0 2024-09-16 00:37:34,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-16 00:38:12,281 INFO [train.py:1198] (1/2) Epoch 21, batch 1500, loss[loss=0.2265, ctc_loss=0.1521, cr_loss=0.3722, over 20962.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.3818, over 4073670.30 frames. ], batch size: 64, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:38:20,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366398.1666666667, ans=0.125 2024-09-16 00:38:29,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=366426.5, ans=0.125 2024-09-16 00:38:35,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.054e+02 2.218e+02 2.422e+02 4.010e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 00:39:17,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=366511.5, ans=0.0 2024-09-16 00:39:27,734 INFO [train.py:1198] (1/2) Epoch 21, batch 1550, loss[loss=0.2374, ctc_loss=0.1608, cr_loss=0.3833, over 20666.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1612, cr_loss=0.3819, over 4077767.90 frames. ], batch size: 68, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:39:35,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-16 00:39:39,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=366539.8333333333, ans=0.2 2024-09-16 00:39:42,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=366539.8333333333, ans=0.0 2024-09-16 00:39:59,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=366568.1666666667, ans=0.125 2024-09-16 00:39:59,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=366568.1666666667, ans=0.2 2024-09-16 00:40:08,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-16 00:40:48,711 INFO [train.py:1198] (1/2) Epoch 21, batch 1600, loss[loss=0.2709, ctc_loss=0.1927, cr_loss=0.3908, over 14466.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3817, over 4086258.68 frames. ], batch size: 149, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:40:55,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=366681.5, ans=0.125 2024-09-16 00:41:11,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.123e+02 2.260e+02 2.442e+02 6.542e+02, threshold=4.520e+02, percent-clipped=2.0 2024-09-16 00:41:14,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366709.8333333333, ans=0.1 2024-09-16 00:41:47,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=366794.8333333333, ans=0.0 2024-09-16 00:42:04,357 INFO [train.py:1198] (1/2) Epoch 21, batch 1650, loss[loss=0.2424, ctc_loss=0.1615, cr_loss=0.4046, over 20951.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1602, cr_loss=0.3805, over 4100499.20 frames. ], batch size: 60, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:43:05,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-16 00:43:17,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.35 vs. limit=10.0 2024-09-16 00:43:20,017 INFO [train.py:1198] (1/2) Epoch 21, batch 1700, loss[loss=0.2466, ctc_loss=0.1662, cr_loss=0.4016, over 21038.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1598, cr_loss=0.3796, over 4097864.83 frames. ], batch size: 62, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:43:25,093 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:43:44,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.129e+02 2.284e+02 2.469e+02 7.459e+02, threshold=4.568e+02, percent-clipped=1.0 2024-09-16 00:43:44,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=366993.1666666667, ans=0.125 2024-09-16 00:44:07,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=367049.8333333333, ans=0.2 2024-09-16 00:44:11,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=367049.8333333333, ans=0.0 2024-09-16 00:44:35,304 INFO [train.py:1198] (1/2) Epoch 21, batch 1750, loss[loss=0.2192, ctc_loss=0.1464, cr_loss=0.3638, over 20960.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1608, cr_loss=0.3808, over 4081709.60 frames. ], batch size: 51, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:44:35,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=12.0 2024-09-16 00:44:43,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367106.5, ans=0.125 2024-09-16 00:45:00,312 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:45:04,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=367163.1666666667, ans=0.025 2024-09-16 00:45:27,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367191.5, ans=0.0 2024-09-16 00:45:39,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=367191.5, ans=0.2 2024-09-16 00:45:44,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=367219.8333333333, ans=0.125 2024-09-16 00:45:56,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367248.1666666667, ans=0.125 2024-09-16 00:45:57,276 INFO [train.py:1198] (1/2) Epoch 21, batch 1800, loss[loss=0.193, ctc_loss=0.1288, cr_loss=0.3208, over 21064.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1603, cr_loss=0.3792, over 4080587.69 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:46:15,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=367276.5, ans=0.07 2024-09-16 00:46:21,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.108e+02 2.242e+02 2.421e+02 3.650e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 00:46:22,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=367276.5, ans=0.02 2024-09-16 00:46:40,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=367304.8333333333, ans=0.2 2024-09-16 00:46:47,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367333.1666666667, ans=0.0 2024-09-16 00:47:01,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=367361.5, ans=0.2 2024-09-16 00:47:02,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-09-16 00:47:03,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2024-09-16 00:47:13,500 INFO [train.py:1198] (1/2) Epoch 21, batch 1850, loss[loss=0.2318, ctc_loss=0.1529, cr_loss=0.3945, over 21051.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1606, cr_loss=0.3807, over 4084767.36 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:47:24,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=367389.8333333333, ans=0.2 2024-09-16 00:47:32,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=12.0 2024-09-16 00:47:47,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=367446.5, ans=0.0 2024-09-16 00:47:50,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=367446.5, ans=0.0 2024-09-16 00:47:56,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367446.5, ans=0.125 2024-09-16 00:48:01,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367474.8333333333, ans=0.125 2024-09-16 00:48:07,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=367474.8333333333, ans=0.0 2024-09-16 00:48:28,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-16 00:48:29,187 INFO [train.py:1198] (1/2) Epoch 21, batch 1900, loss[loss=0.2431, ctc_loss=0.1623, cr_loss=0.4042, over 21021.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1599, cr_loss=0.38, over 4096872.93 frames. ], batch size: 63, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:48:38,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=367531.5, ans=0.125 2024-09-16 00:48:46,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367559.8333333333, ans=0.0 2024-09-16 00:48:53,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.057e+02 2.225e+02 2.414e+02 3.049e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 00:48:59,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=367588.1666666667, ans=0.125 2024-09-16 00:49:07,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-09-16 00:49:32,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=367644.8333333333, ans=0.04949747468305833 2024-09-16 00:49:44,790 INFO [train.py:1198] (1/2) Epoch 21, batch 1950, loss[loss=0.2279, ctc_loss=0.1526, cr_loss=0.3769, over 20785.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1596, cr_loss=0.3793, over 4087040.79 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:49:56,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367673.1666666667, ans=0.125 2024-09-16 00:50:07,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367701.5, ans=0.1 2024-09-16 00:50:24,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367729.8333333333, ans=0.125 2024-09-16 00:51:02,804 INFO [train.py:1198] (1/2) Epoch 21, batch 2000, loss[loss=0.2346, ctc_loss=0.1592, cr_loss=0.3767, over 20949.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1601, cr_loss=0.3801, over 4086524.51 frames. ], batch size: 60, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:51:20,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-16 00:51:30,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.083e+02 2.267e+02 2.405e+02 5.239e+02, threshold=4.535e+02, percent-clipped=1.0 2024-09-16 00:52:21,475 INFO [train.py:1198] (1/2) Epoch 21, batch 2050, loss[loss=0.2564, ctc_loss=0.1775, cr_loss=0.3943, over 20359.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1607, cr_loss=0.3805, over 4076109.08 frames. ], batch size: 74, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:52:36,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=367984.8333333333, ans=0.125 2024-09-16 00:53:00,961 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:53:31,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368069.8333333333, ans=0.1 2024-09-16 00:53:36,507 INFO [train.py:1198] (1/2) Epoch 21, batch 2100, loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3539, over 20971.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1613, cr_loss=0.3812, over 4077233.46 frames. ], batch size: 52, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:54:00,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.150e+02 2.286e+02 2.511e+02 3.500e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-16 00:54:06,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=368154.8333333333, ans=0.125 2024-09-16 00:54:51,880 INFO [train.py:1198] (1/2) Epoch 21, batch 2150, loss[loss=0.2425, ctc_loss=0.1618, cr_loss=0.4031, over 20783.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1609, cr_loss=0.3805, over 4074495.15 frames. ], batch size: 56, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:54:53,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368239.8333333333, ans=0.1 2024-09-16 00:55:00,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2024-09-16 00:55:14,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=368268.1666666667, ans=0.125 2024-09-16 00:55:29,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-16 00:55:38,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-16 00:55:57,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=368353.1666666667, ans=0.2 2024-09-16 00:56:07,507 INFO [train.py:1198] (1/2) Epoch 21, batch 2200, loss[loss=0.1974, ctc_loss=0.1344, cr_loss=0.315, over 20886.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1603, cr_loss=0.3791, over 4081215.24 frames. ], batch size: 54, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:56:34,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.062e+02 2.210e+02 2.347e+02 4.367e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 00:57:08,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=368466.5, ans=0.0 2024-09-16 00:57:17,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=368494.8333333333, ans=0.05 2024-09-16 00:57:29,035 INFO [train.py:1198] (1/2) Epoch 21, batch 2250, loss[loss=0.2379, ctc_loss=0.1603, cr_loss=0.3876, over 21021.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1602, cr_loss=0.3795, over 4087357.52 frames. ], batch size: 62, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 00:57:33,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=368523.1666666667, ans=15.0 2024-09-16 00:58:05,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=368579.8333333333, ans=0.0 2024-09-16 00:58:08,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=368579.8333333333, ans=0.025 2024-09-16 00:58:44,585 INFO [train.py:1198] (1/2) Epoch 21, batch 2300, loss[loss=0.1962, ctc_loss=0.1287, cr_loss=0.3379, over 20972.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1602, cr_loss=0.3793, over 4078658.08 frames. ], batch size: 52, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 00:58:59,996 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:59:08,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.066e+02 2.193e+02 2.339e+02 3.928e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-16 00:59:17,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-09-16 00:59:26,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=368721.5, ans=0.125 2024-09-16 00:59:31,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=368749.8333333333, ans=0.125 2024-09-16 00:59:50,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=368778.1666666667, ans=0.0 2024-09-16 00:59:53,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=368778.1666666667, ans=0.125 2024-09-16 00:59:59,379 INFO [train.py:1198] (1/2) Epoch 21, batch 2350, loss[loss=0.2451, ctc_loss=0.1689, cr_loss=0.3813, over 21017.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1604, cr_loss=0.38, over 4083701.46 frames. ], batch size: 61, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:00:05,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=368806.5, ans=0.125 2024-09-16 01:00:10,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=368806.5, ans=0.125 2024-09-16 01:00:13,211 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:00:19,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=368834.8333333333, ans=0.125 2024-09-16 01:01:01,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=368919.8333333333, ans=0.0 2024-09-16 01:01:13,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=368948.1666666667, ans=0.0 2024-09-16 01:01:14,722 INFO [train.py:1198] (1/2) Epoch 21, batch 2400, loss[loss=0.2175, ctc_loss=0.1441, cr_loss=0.367, over 20992.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.38, over 4080344.30 frames. ], batch size: 52, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:01:33,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=368976.5, ans=0.2 2024-09-16 01:01:39,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.007e+02 2.140e+02 2.274e+02 3.323e+02, threshold=4.280e+02, percent-clipped=0.0 2024-09-16 01:01:50,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=369004.8333333333, ans=0.0 2024-09-16 01:02:23,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=369061.5, ans=0.2 2024-09-16 01:02:32,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=369089.8333333333, ans=0.125 2024-09-16 01:02:37,014 INFO [train.py:1198] (1/2) Epoch 21, batch 2450, loss[loss=0.2256, ctc_loss=0.1521, cr_loss=0.3678, over 20016.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1595, cr_loss=0.3785, over 4073208.84 frames. ], batch size: 44, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:02:40,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369089.8333333333, ans=0.125 2024-09-16 01:03:16,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=369146.5, ans=0.2 2024-09-16 01:03:25,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=369174.8333333333, ans=0.0 2024-09-16 01:03:28,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=369174.8333333333, ans=0.125 2024-09-16 01:03:31,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=369174.8333333333, ans=0.125 2024-09-16 01:03:40,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369203.1666666667, ans=0.125 2024-09-16 01:03:52,633 INFO [train.py:1198] (1/2) Epoch 21, batch 2500, loss[loss=0.2633, ctc_loss=0.1776, cr_loss=0.4283, over 21011.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.3799, over 4070583.13 frames. ], batch size: 61, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:04:16,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.033e+02 2.204e+02 2.375e+02 3.703e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-16 01:04:24,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=369288.1666666667, ans=0.125 2024-09-16 01:04:38,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369316.5, ans=0.1 2024-09-16 01:04:50,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369316.5, ans=0.1 2024-09-16 01:04:55,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=15.0 2024-09-16 01:05:08,603 INFO [train.py:1198] (1/2) Epoch 21, batch 2550, loss[loss=0.2205, ctc_loss=0.1489, cr_loss=0.3583, over 20961.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.3803, over 4079729.14 frames. ], batch size: 50, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:05:10,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=369373.1666666667, ans=0.025 2024-09-16 01:05:15,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369373.1666666667, ans=0.125 2024-09-16 01:05:42,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369429.8333333333, ans=0.125 2024-09-16 01:05:52,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=369458.1666666667, ans=0.025 2024-09-16 01:06:24,353 INFO [train.py:1198] (1/2) Epoch 21, batch 2600, loss[loss=0.2456, ctc_loss=0.1676, cr_loss=0.3899, over 20678.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1606, cr_loss=0.3802, over 4077442.51 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:06:38,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369543.1666666667, ans=0.125 2024-09-16 01:06:47,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-16 01:06:47,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.089e+02 2.212e+02 2.406e+02 3.912e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-16 01:07:06,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=369571.5, ans=0.0 2024-09-16 01:07:36,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369628.1666666667, ans=0.1 2024-09-16 01:07:39,551 INFO [train.py:1198] (1/2) Epoch 21, batch 2650, loss[loss=0.2793, ctc_loss=0.185, cr_loss=0.4714, over 20841.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1615, cr_loss=0.382, over 4079126.69 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:07:47,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=369656.5, ans=0.04949747468305833 2024-09-16 01:08:04,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-09-16 01:08:23,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=369713.1666666667, ans=0.125 2024-09-16 01:08:32,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=12.0 2024-09-16 01:09:00,890 INFO [train.py:1198] (1/2) Epoch 21, batch 2700, loss[loss=0.2379, ctc_loss=0.1625, cr_loss=0.377, over 20809.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3802, over 4078500.53 frames. ], batch size: 53, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:09:04,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369798.1666666667, ans=0.125 2024-09-16 01:09:07,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=369798.1666666667, ans=0.125 2024-09-16 01:09:11,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=369798.1666666667, ans=0.05 2024-09-16 01:09:22,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.85 vs. limit=22.5 2024-09-16 01:09:24,794 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.104e+02 2.249e+02 2.407e+02 3.415e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 01:09:28,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=369826.5, ans=0.025 2024-09-16 01:09:37,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=369854.8333333333, ans=0.2 2024-09-16 01:09:43,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:10:16,036 INFO [train.py:1198] (1/2) Epoch 21, batch 2750, loss[loss=0.2356, ctc_loss=0.1626, cr_loss=0.3649, over 21001.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1598, cr_loss=0.3798, over 4089273.95 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:10:17,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=369939.8333333333, ans=0.125 2024-09-16 01:10:17,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=369939.8333333333, ans=0.05 2024-09-16 01:10:23,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=369939.8333333333, ans=0.1 2024-09-16 01:11:06,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-16 01:11:22,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=370053.1666666667, ans=0.0 2024-09-16 01:11:22,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=370053.1666666667, ans=0.025 2024-09-16 01:11:31,798 INFO [train.py:1198] (1/2) Epoch 21, batch 2800, loss[loss=0.2646, ctc_loss=0.1824, cr_loss=0.411, over 20974.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1602, cr_loss=0.3803, over 4071897.92 frames. ], batch size: 64, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:11:37,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=370081.5, ans=0.2 2024-09-16 01:11:37,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=370081.5, ans=0.125 2024-09-16 01:11:39,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=370081.5, ans=0.125 2024-09-16 01:11:44,197 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:11:56,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.021e+02 2.135e+02 2.361e+02 3.966e+02, threshold=4.271e+02, percent-clipped=0.0 2024-09-16 01:12:04,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370138.1666666667, ans=0.1 2024-09-16 01:12:04,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=370138.1666666667, ans=0.125 2024-09-16 01:12:27,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370166.5, ans=0.1 2024-09-16 01:12:37,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-09-16 01:12:46,729 INFO [train.py:1198] (1/2) Epoch 21, batch 2850, loss[loss=0.2407, ctc_loss=0.1647, cr_loss=0.3802, over 20640.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1607, cr_loss=0.3812, over 4077781.07 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:13:42,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=370308.1666666667, ans=0.2 2024-09-16 01:13:49,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=370336.5, ans=22.5 2024-09-16 01:14:08,451 INFO [train.py:1198] (1/2) Epoch 21, batch 2900, loss[loss=0.2378, ctc_loss=0.1644, cr_loss=0.3667, over 19408.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1593, cr_loss=0.3784, over 4085434.96 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:14:17,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=370364.8333333333, ans=0.0 2024-09-16 01:14:33,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.111e+02 2.267e+02 2.480e+02 4.240e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-16 01:14:40,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 01:14:41,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=370421.5, ans=0.125 2024-09-16 01:14:57,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.87 vs. limit=10.0 2024-09-16 01:14:59,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-16 01:15:23,489 INFO [train.py:1198] (1/2) Epoch 21, batch 2950, loss[loss=0.2313, ctc_loss=0.1545, cr_loss=0.3837, over 20824.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1595, cr_loss=0.3795, over 4078520.92 frames. ], batch size: 59, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:15:23,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=370506.5, ans=0.125 2024-09-16 01:16:39,247 INFO [train.py:1198] (1/2) Epoch 21, batch 3000, loss[loss=0.2637, ctc_loss=0.1796, cr_loss=0.4202, over 20671.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1598, cr_loss=0.3795, over 4064893.82 frames. ], batch size: 66, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:16:39,248 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 01:17:03,256 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.04362, ctc_loss=0.04362, cr_loss=1.096e-14, over 944034.00 frames. 2024-09-16 01:17:03,256 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 01:17:14,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=370648.1666666667, ans=0.04949747468305833 2024-09-16 01:17:20,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=370676.5, ans=0.0 2024-09-16 01:17:29,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.079e+02 2.219e+02 2.383e+02 3.230e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 01:17:38,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=370704.8333333333, ans=0.05 2024-09-16 01:18:19,934 INFO [train.py:1198] (1/2) Epoch 21, batch 3050, loss[loss=0.2319, ctc_loss=0.1565, cr_loss=0.3771, over 20881.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1595, cr_loss=0.3791, over 4069495.60 frames. ], batch size: 57, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:18:35,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=370818.1666666667, ans=15.0 2024-09-16 01:18:50,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=370846.5, ans=0.0 2024-09-16 01:18:53,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=370846.5, ans=0.125 2024-09-16 01:19:13,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-16 01:19:26,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-16 01:19:41,098 INFO [train.py:1198] (1/2) Epoch 21, batch 3100, loss[loss=0.2335, ctc_loss=0.1613, cr_loss=0.361, over 21069.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.16, cr_loss=0.3803, over 4087073.62 frames. ], batch size: 53, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:19:52,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=370931.5, ans=0.125 2024-09-16 01:20:01,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=370959.8333333333, ans=0.0 2024-09-16 01:20:07,155 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.122e+02 2.256e+02 2.443e+02 3.813e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-16 01:20:17,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2024-09-16 01:20:19,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.45 vs. limit=22.5 2024-09-16 01:20:24,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=370988.1666666667, ans=0.0 2024-09-16 01:20:28,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=371016.5, ans=0.125 2024-09-16 01:20:56,713 INFO [train.py:1198] (1/2) Epoch 21, batch 3150, loss[loss=0.2487, ctc_loss=0.1681, cr_loss=0.4032, over 20980.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3815, over 4089085.60 frames. ], batch size: 55, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:21:17,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=22.5 2024-09-16 01:21:58,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-16 01:22:12,961 INFO [train.py:1198] (1/2) Epoch 21, batch 3200, loss[loss=0.2769, ctc_loss=0.1913, cr_loss=0.4276, over 18233.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1597, cr_loss=0.3796, over 4090383.41 frames. ], batch size: 108, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:22:13,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=371214.8333333333, ans=0.05 2024-09-16 01:22:19,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=371214.8333333333, ans=0.025 2024-09-16 01:22:32,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=371243.1666666667, ans=0.125 2024-09-16 01:22:38,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.057e+02 2.187e+02 2.378e+02 3.203e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 01:22:49,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371271.5, ans=0.1 2024-09-16 01:22:50,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=371271.5, ans=0.09899494936611666 2024-09-16 01:23:14,842 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:23:28,173 INFO [train.py:1198] (1/2) Epoch 21, batch 3250, loss[loss=0.2524, ctc_loss=0.1732, cr_loss=0.3961, over 19528.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3817, over 4100585.88 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:23:28,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-16 01:23:43,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=371384.8333333333, ans=0.125 2024-09-16 01:24:41,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371469.8333333333, ans=0.1 2024-09-16 01:24:43,661 INFO [train.py:1198] (1/2) Epoch 21, batch 3300, loss[loss=0.2571, ctc_loss=0.1737, cr_loss=0.4172, over 20977.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3819, over 4094535.75 frames. ], batch size: 64, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:25:08,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=371526.5, ans=0.125 2024-09-16 01:25:11,661 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.100e+02 2.271e+02 2.405e+02 4.346e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 01:25:13,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=371526.5, ans=0.125 2024-09-16 01:25:31,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371554.8333333333, ans=0.1 2024-09-16 01:25:43,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=371583.1666666667, ans=0.0 2024-09-16 01:26:04,541 INFO [train.py:1198] (1/2) Epoch 21, batch 3350, loss[loss=0.222, ctc_loss=0.1496, cr_loss=0.3621, over 21049.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1604, cr_loss=0.3805, over 4099222.81 frames. ], batch size: 56, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:27:03,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=371753.1666666667, ans=0.0 2024-09-16 01:27:17,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371753.1666666667, ans=0.1 2024-09-16 01:27:19,802 INFO [train.py:1198] (1/2) Epoch 21, batch 3400, loss[loss=0.2672, ctc_loss=0.1805, cr_loss=0.4339, over 20833.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1603, cr_loss=0.3813, over 4099496.92 frames. ], batch size: 65, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:27:42,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=371809.8333333333, ans=0.125 2024-09-16 01:27:46,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 2.026e+02 2.152e+02 2.295e+02 2.892e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-16 01:28:10,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-16 01:28:15,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=12.0 2024-09-16 01:28:35,262 INFO [train.py:1198] (1/2) Epoch 21, batch 3450, loss[loss=0.2376, ctc_loss=0.1584, cr_loss=0.3959, over 20958.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1602, cr_loss=0.3814, over 4099611.60 frames. ], batch size: 58, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:28:40,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=371923.1666666667, ans=0.125 2024-09-16 01:29:35,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=372036.5, ans=0.125 2024-09-16 01:29:41,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=372036.5, ans=0.125 2024-09-16 01:29:42,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-09-16 01:29:48,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-16 01:29:50,187 INFO [train.py:1198] (1/2) Epoch 21, batch 3500, loss[loss=0.2571, ctc_loss=0.1771, cr_loss=0.4001, over 20368.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1607, cr_loss=0.3824, over 4099858.35 frames. ], batch size: 74, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:29:52,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=372064.8333333333, ans=0.2 2024-09-16 01:30:10,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=372093.1666666667, ans=0.2 2024-09-16 01:30:13,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=372093.1666666667, ans=0.2 2024-09-16 01:30:17,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.129e+02 2.358e+02 2.544e+02 4.972e+02, threshold=4.715e+02, percent-clipped=1.0 2024-09-16 01:30:23,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=372121.5, ans=0.0 2024-09-16 01:30:25,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=372121.5, ans=0.0 2024-09-16 01:30:27,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2024-09-16 01:31:09,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-16 01:31:10,831 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:31:11,891 INFO [train.py:1198] (1/2) Epoch 21, batch 3550, loss[loss=0.2566, ctc_loss=0.1759, cr_loss=0.4034, over 21011.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1608, cr_loss=0.3828, over 4097725.61 frames. ], batch size: 63, lr: 3.96e-03, grad_scale: 16.0 2024-09-16 01:31:57,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372291.5, ans=0.1 2024-09-16 01:31:58,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=372291.5, ans=10.0 2024-09-16 01:31:59,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-16 01:32:16,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=15.0 2024-09-16 01:32:18,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=372319.8333333333, ans=0.0 2024-09-16 01:32:27,593 INFO [train.py:1198] (1/2) Epoch 21, batch 3600, loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3758, over 21016.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1597, cr_loss=0.3803, over 4087425.81 frames. ], batch size: 61, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:32:54,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.070e+02 2.186e+02 2.346e+02 3.810e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-16 01:33:37,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=12.0 2024-09-16 01:33:42,612 INFO [train.py:1198] (1/2) Epoch 21, batch 3650, loss[loss=0.2549, ctc_loss=0.1739, cr_loss=0.405, over 20644.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1603, cr_loss=0.3811, over 4091575.76 frames. ], batch size: 68, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:33:59,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=372518.1666666667, ans=0.0 2024-09-16 01:34:02,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372518.1666666667, ans=0.0 2024-09-16 01:34:04,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372518.1666666667, ans=0.0 2024-09-16 01:34:07,279 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:34:10,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=372518.1666666667, ans=0.125 2024-09-16 01:34:17,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=372546.5, ans=0.025 2024-09-16 01:34:38,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=372574.8333333333, ans=0.0 2024-09-16 01:34:58,123 INFO [train.py:1198] (1/2) Epoch 21, batch 3700, loss[loss=0.2141, ctc_loss=0.144, cr_loss=0.3505, over 20896.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1612, cr_loss=0.3828, over 4088625.93 frames. ], batch size: 54, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:34:59,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372631.5, ans=0.0 2024-09-16 01:35:25,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.065e+02 2.188e+02 2.433e+02 2.909e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 01:36:16,456 INFO [train.py:1198] (1/2) Epoch 21, batch 3750, loss[loss=0.2228, ctc_loss=0.148, cr_loss=0.3744, over 20879.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1615, cr_loss=0.3831, over 4082322.53 frames. ], batch size: 54, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:36:18,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-16 01:36:39,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=372801.5, ans=0.07 2024-09-16 01:36:58,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=372829.8333333333, ans=0.0 2024-09-16 01:37:05,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-16 01:37:33,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=372914.8333333333, ans=0.125 2024-09-16 01:37:34,528 INFO [train.py:1198] (1/2) Epoch 21, batch 3800, loss[loss=0.2044, ctc_loss=0.1355, cr_loss=0.3444, over 20997.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1607, cr_loss=0.382, over 4077036.29 frames. ], batch size: 52, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:37:36,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2024-09-16 01:37:52,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=372943.1666666667, ans=0.2 2024-09-16 01:37:58,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=372943.1666666667, ans=0.2 2024-09-16 01:37:58,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=372943.1666666667, ans=0.125 2024-09-16 01:38:01,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.082e+02 2.245e+02 2.424e+02 3.263e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 01:38:04,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=372971.5, ans=0.2 2024-09-16 01:38:09,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=372971.5, ans=0.125 2024-09-16 01:38:12,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=372971.5, ans=0.125 2024-09-16 01:38:16,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=372971.5, ans=0.025 2024-09-16 01:38:19,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=372999.8333333333, ans=0.125 2024-09-16 01:38:36,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-09-16 01:38:49,239 INFO [train.py:1198] (1/2) Epoch 21, batch 3850, loss[loss=0.2124, ctc_loss=0.1426, cr_loss=0.3491, over 20879.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1595, cr_loss=0.3807, over 4079894.01 frames. ], batch size: 54, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:38:57,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=373056.5, ans=0.0 2024-09-16 01:39:18,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-16 01:39:20,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=22.5 2024-09-16 01:40:04,851 INFO [train.py:1198] (1/2) Epoch 21, batch 3900, loss[loss=0.2694, ctc_loss=0.1838, cr_loss=0.4282, over 20987.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1594, cr_loss=0.3804, over 4096057.29 frames. ], batch size: 67, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:40:08,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=373198.1666666667, ans=0.0 2024-09-16 01:40:26,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-16 01:40:32,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.094e+02 2.177e+02 2.401e+02 3.013e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 01:41:20,309 INFO [train.py:1198] (1/2) Epoch 21, batch 3950, loss[loss=0.2305, ctc_loss=0.1548, cr_loss=0.3786, over 21063.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1597, cr_loss=0.381, over 4100235.87 frames. ], batch size: 59, lr: 3.96e-03, grad_scale: 16.0 2024-09-16 01:41:22,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-16 01:41:41,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=373368.1666666667, ans=0.0 2024-09-16 01:41:49,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=22.5 2024-09-16 01:41:52,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373396.5, ans=0.1 2024-09-16 01:42:25,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373453.1666666667, ans=0.1 2024-09-16 01:42:41,950 INFO [train.py:1198] (1/2) Epoch 21, batch 4000, loss[loss=0.2459, ctc_loss=0.1684, cr_loss=0.3877, over 20778.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1606, cr_loss=0.3829, over 4109038.80 frames. ], batch size: 53, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:42:42,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373481.5, ans=0.125 2024-09-16 01:42:56,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373509.8333333333, ans=0.125 2024-09-16 01:43:02,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-16 01:43:10,956 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.173e+02 2.321e+02 2.558e+02 5.399e+02, threshold=4.642e+02, percent-clipped=1.0 2024-09-16 01:43:32,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=373566.5, ans=0.125 2024-09-16 01:43:43,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=373594.8333333333, ans=0.0 2024-09-16 01:43:57,950 INFO [train.py:1198] (1/2) Epoch 21, batch 4050, loss[loss=0.2847, ctc_loss=0.1945, cr_loss=0.4513, over 19945.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1615, cr_loss=0.3842, over 4098551.73 frames. ], batch size: 80, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:44:01,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-16 01:44:10,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373623.1666666667, ans=0.0 2024-09-16 01:44:34,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=373679.8333333333, ans=0.0 2024-09-16 01:44:34,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-09-16 01:44:36,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-16 01:44:50,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=373708.1666666667, ans=0.125 2024-09-16 01:45:09,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=373736.5, ans=0.0 2024-09-16 01:45:13,531 INFO [train.py:1198] (1/2) Epoch 21, batch 4100, loss[loss=0.1993, ctc_loss=0.1343, cr_loss=0.3253, over 19886.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1613, cr_loss=0.3842, over 4098950.26 frames. ], batch size: 44, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:45:25,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=373764.8333333333, ans=0.0 2024-09-16 01:45:42,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.049e+02 2.219e+02 2.381e+02 3.056e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 01:46:04,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=373849.8333333333, ans=0.125 2024-09-16 01:46:29,362 INFO [train.py:1198] (1/2) Epoch 21, batch 4150, loss[loss=0.1881, ctc_loss=0.1241, cr_loss=0.3202, over 20959.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1607, cr_loss=0.3833, over 4103069.15 frames. ], batch size: 48, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:46:46,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373934.8333333333, ans=0.1 2024-09-16 01:47:34,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=374019.8333333333, ans=0.0 2024-09-16 01:47:49,057 INFO [train.py:1198] (1/2) Epoch 21, batch 4200, loss[loss=0.2339, ctc_loss=0.1584, cr_loss=0.3777, over 21061.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1613, cr_loss=0.383, over 4091821.33 frames. ], batch size: 62, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:48:20,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.113e+02 2.269e+02 2.405e+02 6.091e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-16 01:48:30,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374104.8333333333, ans=0.0 2024-09-16 01:48:30,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-16 01:48:34,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374104.8333333333, ans=0.1 2024-09-16 01:48:57,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=374161.5, ans=0.125 2024-09-16 01:49:07,669 INFO [train.py:1198] (1/2) Epoch 21, batch 4250, loss[loss=0.2326, ctc_loss=0.1574, cr_loss=0.3763, over 20864.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.16, cr_loss=0.3817, over 4105083.43 frames. ], batch size: 65, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:49:34,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=374218.1666666667, ans=0.0 2024-09-16 01:49:51,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=374274.8333333333, ans=0.05 2024-09-16 01:50:22,974 INFO [train.py:1198] (1/2) Epoch 21, batch 4300, loss[loss=0.2176, ctc_loss=0.1448, cr_loss=0.3643, over 21052.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1602, cr_loss=0.3818, over 4092178.35 frames. ], batch size: 56, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:50:51,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.136e+02 2.245e+02 2.437e+02 3.095e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 01:51:10,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374416.5, ans=0.125 2024-09-16 01:51:35,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=374444.8333333333, ans=0.125 2024-09-16 01:51:38,712 INFO [train.py:1198] (1/2) Epoch 21, batch 4350, loss[loss=0.2223, ctc_loss=0.1492, cr_loss=0.3655, over 20693.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1595, cr_loss=0.3805, over 4084111.23 frames. ], batch size: 71, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:51:42,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=374473.1666666667, ans=0.125 2024-09-16 01:51:51,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-16 01:52:28,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-16 01:52:51,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374586.5, ans=0.125 2024-09-16 01:52:54,630 INFO [train.py:1198] (1/2) Epoch 21, batch 4400, loss[loss=0.2583, ctc_loss=0.1804, cr_loss=0.3896, over 20558.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1597, cr_loss=0.3813, over 4098285.46 frames. ], batch size: 75, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:53:01,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-16 01:53:11,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=374643.1666666667, ans=0.125 2024-09-16 01:53:19,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=374643.1666666667, ans=0.2 2024-09-16 01:53:19,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=22.5 2024-09-16 01:53:24,959 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.036e+02 2.162e+02 2.306e+02 4.854e+02, threshold=4.325e+02, percent-clipped=1.0 2024-09-16 01:53:27,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2024-09-16 01:53:40,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=374671.5, ans=0.2 2024-09-16 01:54:16,274 INFO [train.py:1198] (1/2) Epoch 21, batch 4450, loss[loss=0.1913, ctc_loss=0.1284, cr_loss=0.3142, over 20996.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1605, cr_loss=0.382, over 4095749.21 frames. ], batch size: 52, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:54:28,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374756.5, ans=0.1 2024-09-16 01:54:31,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=374784.8333333333, ans=0.025 2024-09-16 01:54:42,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 01:55:07,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374841.5, ans=0.1 2024-09-16 01:55:07,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=374841.5, ans=0.125 2024-09-16 01:55:24,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=374869.8333333333, ans=0.125 2024-09-16 01:55:31,278 INFO [train.py:1198] (1/2) Epoch 21, batch 4500, loss[loss=0.294, ctc_loss=0.204, cr_loss=0.4501, over 18252.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1596, cr_loss=0.3811, over 4104177.55 frames. ], batch size: 108, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:55:35,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=374898.1666666667, ans=0.125 2024-09-16 01:55:54,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=374926.5, ans=0.2 2024-09-16 01:56:00,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=374954.8333333333, ans=0.0 2024-09-16 01:56:01,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.082e+02 2.260e+02 2.396e+02 3.171e+02, threshold=4.519e+02, percent-clipped=0.0 2024-09-16 01:56:25,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=374983.1666666667, ans=0.125 2024-09-16 01:56:32,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.95 vs. limit=10.0 2024-09-16 01:56:40,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375011.5, ans=0.125 2024-09-16 01:56:42,291 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:56:46,426 INFO [train.py:1198] (1/2) Epoch 21, batch 4550, loss[loss=0.2994, ctc_loss=0.2186, cr_loss=0.4043, over 14330.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1588, cr_loss=0.3791, over 4103052.84 frames. ], batch size: 149, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:56:46,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=375039.8333333333, ans=0.125 2024-09-16 01:56:55,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=375039.8333333333, ans=10.0 2024-09-16 01:57:02,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-16 01:57:15,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375096.5, ans=0.125 2024-09-16 01:57:29,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=375096.5, ans=0.125 2024-09-16 01:57:39,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=375124.8333333333, ans=0.2 2024-09-16 01:57:59,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=375153.1666666667, ans=0.0 2024-09-16 01:58:02,358 INFO [train.py:1198] (1/2) Epoch 21, batch 4600, loss[loss=0.2578, ctc_loss=0.1745, cr_loss=0.4166, over 21023.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3795, over 4094900.62 frames. ], batch size: 63, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:58:14,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=375181.5, ans=0.125 2024-09-16 01:58:25,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375209.8333333333, ans=0.0 2024-09-16 01:58:32,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.086e+02 2.249e+02 2.488e+02 4.137e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-16 01:58:50,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375266.5, ans=0.125 2024-09-16 01:59:06,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=375294.8333333333, ans=0.035 2024-09-16 01:59:10,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=375294.8333333333, ans=0.0 2024-09-16 01:59:19,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=375323.1666666667, ans=0.125 2024-09-16 01:59:20,561 INFO [train.py:1198] (1/2) Epoch 21, batch 4650, loss[loss=0.2622, ctc_loss=0.181, cr_loss=0.4062, over 18096.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1601, cr_loss=0.3805, over 4089395.68 frames. ], batch size: 108, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:59:24,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=375323.1666666667, ans=0.07 2024-09-16 02:00:15,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-16 02:00:17,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=375408.1666666667, ans=0.2 2024-09-16 02:00:27,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375436.5, ans=0.1 2024-09-16 02:00:38,536 INFO [train.py:1198] (1/2) Epoch 21, batch 4700, loss[loss=0.2434, ctc_loss=0.167, cr_loss=0.3817, over 20856.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.16, cr_loss=0.3799, over 4095809.40 frames. ], batch size: 57, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:00:48,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-16 02:01:08,884 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.048e+02 2.187e+02 2.365e+02 3.837e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 02:01:10,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=375521.5, ans=0.125 2024-09-16 02:01:40,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=375578.1666666667, ans=0.125 2024-09-16 02:01:44,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-16 02:01:54,201 INFO [train.py:1198] (1/2) Epoch 21, batch 4750, loss[loss=0.2453, ctc_loss=0.1677, cr_loss=0.3881, over 21039.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1594, cr_loss=0.3791, over 4094781.34 frames. ], batch size: 63, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:02:00,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=375606.5, ans=0.125 2024-09-16 02:02:00,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-16 02:02:06,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=375606.5, ans=0.0 2024-09-16 02:02:16,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375634.8333333333, ans=0.0 2024-09-16 02:02:23,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-16 02:02:45,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=375691.5, ans=0.125 2024-09-16 02:03:05,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=375719.8333333333, ans=0.125 2024-09-16 02:03:09,666 INFO [train.py:1198] (1/2) Epoch 21, batch 4800, loss[loss=0.2052, ctc_loss=0.1364, cr_loss=0.3441, over 20996.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1591, cr_loss=0.3788, over 4091120.39 frames. ], batch size: 52, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:03:39,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=375804.8333333333, ans=0.0 2024-09-16 02:03:40,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.054e+02 2.212e+02 2.381e+02 6.295e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-16 02:03:40,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=375804.8333333333, ans=0.125 2024-09-16 02:04:24,117 INFO [train.py:1198] (1/2) Epoch 21, batch 4850, loss[loss=0.2607, ctc_loss=0.1801, cr_loss=0.4032, over 20998.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3804, over 4096391.89 frames. ], batch size: 61, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:04:26,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=375889.8333333333, ans=0.125 2024-09-16 02:04:30,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375889.8333333333, ans=0.1 2024-09-16 02:04:32,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=375889.8333333333, ans=0.125 2024-09-16 02:05:10,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=375974.8333333333, ans=0.07 2024-09-16 02:05:15,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=375974.8333333333, ans=0.125 2024-09-16 02:05:34,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=376003.1666666667, ans=0.125 2024-09-16 02:05:41,386 INFO [train.py:1198] (1/2) Epoch 21, batch 4900, loss[loss=0.2679, ctc_loss=0.1844, cr_loss=0.4174, over 18446.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1602, cr_loss=0.3804, over 4093059.26 frames. ], batch size: 108, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:06:07,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376059.8333333333, ans=0.125 2024-09-16 02:06:15,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.137e+02 2.284e+02 2.465e+02 3.194e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-16 02:06:20,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376088.1666666667, ans=0.1 2024-09-16 02:06:38,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-16 02:06:59,019 INFO [train.py:1198] (1/2) Epoch 21, batch 4950, loss[loss=0.236, ctc_loss=0.1612, cr_loss=0.3742, over 20981.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1601, cr_loss=0.3806, over 4092783.79 frames. ], batch size: 58, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:07:25,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2024-09-16 02:07:28,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=376229.8333333333, ans=0.125 2024-09-16 02:07:33,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376229.8333333333, ans=0.1 2024-09-16 02:08:13,239 INFO [train.py:1198] (1/2) Epoch 21, batch 5000, loss[loss=0.1899, ctc_loss=0.1267, cr_loss=0.3159, over 20334.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1597, cr_loss=0.3789, over 4078037.02 frames. ], batch size: 45, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:08:34,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=22.5 2024-09-16 02:08:44,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.126e+02 2.282e+02 2.513e+02 6.959e+02, threshold=4.564e+02, percent-clipped=1.0 2024-09-16 02:08:51,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376371.5, ans=0.125 2024-09-16 02:09:11,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376428.1666666667, ans=0.125 2024-09-16 02:09:27,733 INFO [train.py:1198] (1/2) Epoch 21, batch 5050, loss[loss=0.2221, ctc_loss=0.1517, cr_loss=0.3522, over 20990.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.16, cr_loss=0.3801, over 4083644.90 frames. ], batch size: 55, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:09:40,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=376456.5, ans=0.125 2024-09-16 02:09:49,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-16 02:10:11,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=376541.5, ans=0.2 2024-09-16 02:10:38,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=376569.8333333333, ans=0.125 2024-09-16 02:10:42,609 INFO [train.py:1198] (1/2) Epoch 21, batch 5100, loss[loss=0.2382, ctc_loss=0.1609, cr_loss=0.3866, over 21017.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1589, cr_loss=0.3786, over 4094186.38 frames. ], batch size: 61, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:11:02,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=376626.5, ans=0.125 2024-09-16 02:11:14,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.063e+02 2.284e+02 2.488e+02 3.450e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-16 02:11:33,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=376683.1666666667, ans=0.125 2024-09-16 02:11:40,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-09-16 02:11:45,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=376711.5, ans=0.2 2024-09-16 02:11:57,224 INFO [train.py:1198] (1/2) Epoch 21, batch 5150, loss[loss=0.2009, ctc_loss=0.1347, cr_loss=0.331, over 20985.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1595, cr_loss=0.3797, over 4084284.72 frames. ], batch size: 48, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:11:59,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=15.0 2024-09-16 02:12:12,583 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:13:11,488 INFO [train.py:1198] (1/2) Epoch 21, batch 5200, loss[loss=0.2187, ctc_loss=0.1492, cr_loss=0.3476, over 21064.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1589, cr_loss=0.3796, over 4099175.83 frames. ], batch size: 53, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:13:28,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=376909.8333333333, ans=0.0 2024-09-16 02:13:32,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=376909.8333333333, ans=0.2 2024-09-16 02:13:42,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.055e+02 2.176e+02 2.401e+02 3.523e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-16 02:14:00,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376966.5, ans=0.1 2024-09-16 02:14:25,608 INFO [train.py:1198] (1/2) Epoch 21, batch 5250, loss[loss=0.2346, ctc_loss=0.156, cr_loss=0.3931, over 20882.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1599, cr_loss=0.3819, over 4096399.45 frames. ], batch size: 54, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:14:42,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2024-09-16 02:14:44,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=377051.5, ans=0.125 2024-09-16 02:15:42,015 INFO [train.py:1198] (1/2) Epoch 21, batch 5300, loss[loss=0.2317, ctc_loss=0.1591, cr_loss=0.3632, over 20971.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.16, cr_loss=0.3818, over 4104334.41 frames. ], batch size: 55, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:16:00,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-16 02:16:04,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=377193.1666666667, ans=0.125 2024-09-16 02:16:14,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377221.5, ans=0.125 2024-09-16 02:16:15,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.107e+02 2.202e+02 2.389e+02 4.329e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 02:16:58,956 INFO [train.py:1198] (1/2) Epoch 21, batch 5350, loss[loss=0.2296, ctc_loss=0.1566, cr_loss=0.3648, over 21068.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1608, cr_loss=0.3825, over 4099540.84 frames. ], batch size: 53, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:17:09,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=377306.5, ans=0.125 2024-09-16 02:18:13,440 INFO [train.py:1198] (1/2) Epoch 21, batch 5400, loss[loss=0.2429, ctc_loss=0.1644, cr_loss=0.3924, over 20789.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1605, cr_loss=0.3808, over 4093922.90 frames. ], batch size: 56, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:18:27,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=377476.5, ans=0.125 2024-09-16 02:18:29,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=22.5 2024-09-16 02:18:44,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.066e+02 2.231e+02 2.420e+02 4.440e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 02:18:58,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=377533.1666666667, ans=0.125 2024-09-16 02:18:59,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=377533.1666666667, ans=0.0 2024-09-16 02:19:23,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=377561.5, ans=0.2 2024-09-16 02:19:28,005 INFO [train.py:1198] (1/2) Epoch 21, batch 5450, loss[loss=0.2225, ctc_loss=0.1473, cr_loss=0.376, over 20959.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1613, cr_loss=0.3827, over 4079958.14 frames. ], batch size: 49, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:20:06,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=377646.5, ans=0.125 2024-09-16 02:20:12,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=22.5 2024-09-16 02:20:28,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=377703.1666666667, ans=0.125 2024-09-16 02:20:36,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=377703.1666666667, ans=0.0 2024-09-16 02:20:43,683 INFO [train.py:1198] (1/2) Epoch 21, batch 5500, loss[loss=0.2404, ctc_loss=0.163, cr_loss=0.3868, over 20658.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3825, over 4077055.92 frames. ], batch size: 68, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:21:14,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.084e+02 2.255e+02 2.419e+02 3.239e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-16 02:21:33,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=377816.5, ans=0.125 2024-09-16 02:21:52,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=377844.8333333333, ans=0.2 2024-09-16 02:21:57,884 INFO [train.py:1198] (1/2) Epoch 21, batch 5550, loss[loss=0.1957, ctc_loss=0.1315, cr_loss=0.3209, over 20980.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.161, cr_loss=0.3826, over 4088171.90 frames. ], batch size: 48, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:22:25,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=377901.5, ans=0.0 2024-09-16 02:22:58,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-16 02:23:13,116 INFO [train.py:1198] (1/2) Epoch 21, batch 5600, loss[loss=0.1908, ctc_loss=0.1256, cr_loss=0.3258, over 20964.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1602, cr_loss=0.3811, over 4096174.86 frames. ], batch size: 50, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:23:40,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2024-09-16 02:23:44,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.060e+02 2.251e+02 2.461e+02 3.109e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 02:23:49,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=22.5 2024-09-16 02:24:08,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=378099.8333333333, ans=0.125 2024-09-16 02:24:20,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378128.1666666667, ans=0.125 2024-09-16 02:24:27,498 INFO [train.py:1198] (1/2) Epoch 21, batch 5650, loss[loss=0.1927, ctc_loss=0.1261, cr_loss=0.3332, over 20240.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1593, cr_loss=0.3795, over 4080875.66 frames. ], batch size: 45, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:25:17,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=378241.5, ans=0.0 2024-09-16 02:25:32,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-16 02:25:46,500 INFO [train.py:1198] (1/2) Epoch 21, batch 5700, loss[loss=0.304, ctc_loss=0.2196, cr_loss=0.4222, over 14190.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1591, cr_loss=0.3793, over 4081991.31 frames. ], batch size: 149, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:25:53,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-09-16 02:26:19,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.067e+02 2.175e+02 2.427e+02 4.126e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-16 02:26:22,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=378354.8333333333, ans=0.125 2024-09-16 02:26:35,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-16 02:27:00,813 INFO [train.py:1198] (1/2) Epoch 21, batch 5750, loss[loss=0.2218, ctc_loss=0.15, cr_loss=0.359, over 19958.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1593, cr_loss=0.3794, over 4084465.91 frames. ], batch size: 44, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:27:01,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=378439.8333333333, ans=0.125 2024-09-16 02:28:10,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378553.1666666667, ans=0.1 2024-09-16 02:28:14,709 INFO [train.py:1198] (1/2) Epoch 21, batch 5800, loss[loss=0.2834, ctc_loss=0.1959, cr_loss=0.4374, over 20036.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3812, over 4084347.58 frames. ], batch size: 80, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:28:16,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378581.5, ans=0.125 2024-09-16 02:28:47,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.185e+02 2.427e+02 2.708e+02 9.486e+02, threshold=4.854e+02, percent-clipped=1.0 2024-09-16 02:28:57,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=378666.5, ans=0.0 2024-09-16 02:29:11,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=378666.5, ans=0.2 2024-09-16 02:29:28,549 INFO [train.py:1198] (1/2) Epoch 21, batch 5850, loss[loss=0.2532, ctc_loss=0.1744, cr_loss=0.3939, over 20123.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1605, cr_loss=0.3809, over 4081166.04 frames. ], batch size: 80, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:29:39,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=378723.1666666667, ans=0.125 2024-09-16 02:30:23,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=378808.1666666667, ans=0.125 2024-09-16 02:30:23,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=378808.1666666667, ans=0.125 2024-09-16 02:30:42,831 INFO [train.py:1198] (1/2) Epoch 21, batch 5900, loss[loss=0.2373, ctc_loss=0.1648, cr_loss=0.3628, over 20176.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3811, over 4089901.74 frames. ], batch size: 80, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:31:00,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378893.1666666667, ans=0.125 2024-09-16 02:31:15,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.082e+02 2.243e+02 2.520e+02 3.993e+02, threshold=4.486e+02, percent-clipped=0.0 2024-09-16 02:31:39,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=378949.8333333333, ans=0.125 2024-09-16 02:31:50,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=378978.1666666667, ans=0.125 2024-09-16 02:31:57,351 INFO [train.py:1198] (1/2) Epoch 21, batch 5950, loss[loss=0.2537, ctc_loss=0.1727, cr_loss=0.4048, over 21078.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1605, cr_loss=0.3813, over 4086724.57 frames. ], batch size: 59, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:31:57,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=379006.5, ans=0.125 2024-09-16 02:31:59,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2024-09-16 02:32:30,496 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:32:43,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=379091.5, ans=0.0 2024-09-16 02:33:06,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379119.8333333333, ans=0.1 2024-09-16 02:33:11,646 INFO [train.py:1198] (1/2) Epoch 21, batch 6000, loss[loss=0.2243, ctc_loss=0.1491, cr_loss=0.3763, over 20774.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1608, cr_loss=0.3813, over 4079172.00 frames. ], batch size: 56, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:33:11,646 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 02:33:35,846 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.04328, ctc_loss=0.04328, cr_loss=1.081e-14, over 944034.00 frames. 2024-09-16 02:33:35,846 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 02:34:09,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.070e+02 2.217e+02 2.364e+02 3.560e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 02:34:35,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=379261.5, ans=0.05 2024-09-16 02:34:36,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=379261.5, ans=0.2 2024-09-16 02:34:37,859 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:34:50,900 INFO [train.py:1198] (1/2) Epoch 21, batch 6050, loss[loss=0.2353, ctc_loss=0.1623, cr_loss=0.3653, over 21033.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1609, cr_loss=0.3817, over 4074622.56 frames. ], batch size: 62, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:35:03,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=379289.8333333333, ans=0.2 2024-09-16 02:35:04,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=379318.1666666667, ans=0.125 2024-09-16 02:35:38,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=22.5 2024-09-16 02:35:42,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=379374.8333333333, ans=0.2 2024-09-16 02:36:05,688 INFO [train.py:1198] (1/2) Epoch 21, batch 6100, loss[loss=0.2299, ctc_loss=0.1558, cr_loss=0.3706, over 20982.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1607, cr_loss=0.382, over 4081263.41 frames. ], batch size: 55, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:36:05,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:36:17,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=379431.5, ans=0.04949747468305833 2024-09-16 02:36:29,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=379459.8333333333, ans=0.125 2024-09-16 02:36:37,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.052e+02 2.178e+02 2.395e+02 3.200e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 02:36:44,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-16 02:36:46,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379488.1666666667, ans=0.1 2024-09-16 02:36:50,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379516.5, ans=0.125 2024-09-16 02:36:54,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=379516.5, ans=0.125 2024-09-16 02:37:13,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=379544.8333333333, ans=0.05 2024-09-16 02:37:19,232 INFO [train.py:1198] (1/2) Epoch 21, batch 6150, loss[loss=0.2605, ctc_loss=0.181, cr_loss=0.3975, over 18418.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1612, cr_loss=0.3835, over 4073998.92 frames. ], batch size: 108, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:37:21,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2024-09-16 02:37:28,410 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.101e-03 2024-09-16 02:37:31,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-09-16 02:38:22,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2024-09-16 02:38:29,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=379686.5, ans=0.0 2024-09-16 02:38:33,673 INFO [train.py:1198] (1/2) Epoch 21, batch 6200, loss[loss=0.2268, ctc_loss=0.1543, cr_loss=0.3624, over 21075.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1615, cr_loss=0.3837, over 4060191.45 frames. ], batch size: 59, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:38:52,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=379743.1666666667, ans=0.125 2024-09-16 02:39:06,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.119e+02 2.225e+02 2.492e+02 3.167e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-16 02:39:33,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-09-16 02:39:40,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-16 02:39:48,926 INFO [train.py:1198] (1/2) Epoch 21, batch 6250, loss[loss=0.256, ctc_loss=0.1738, cr_loss=0.4108, over 20678.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1615, cr_loss=0.3836, over 4054709.24 frames. ], batch size: 71, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:40:07,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=379884.8333333333, ans=0.125 2024-09-16 02:40:33,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=379941.5, ans=0.125 2024-09-16 02:40:41,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=379941.5, ans=0.0 2024-09-16 02:41:03,087 INFO [train.py:1198] (1/2) Epoch 21, batch 6300, loss[loss=0.3112, ctc_loss=0.2281, cr_loss=0.4153, over 14662.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1629, cr_loss=0.3845, over 4006173.35 frames. ], batch size: 149, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:41:14,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=379998.1666666667, ans=22.5 2024-09-16 02:41:18,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=380026.5, ans=0.0 2024-09-16 02:41:36,041 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.198e+02 2.332e+02 2.541e+02 4.122e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-16 02:42:01,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-16 02:42:17,516 INFO [train.py:1198] (1/2) Epoch 21, batch 6350, loss[loss=0.2379, ctc_loss=0.1577, cr_loss=0.4013, over 21032.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1655, cr_loss=0.3865, over 3936033.90 frames. ], batch size: 62, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:42:32,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=380168.1666666667, ans=0.0 2024-09-16 02:42:39,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=380168.1666666667, ans=0.04949747468305833 2024-09-16 02:44:01,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2024-09-16 02:44:03,431 INFO [train.py:1198] (1/2) Epoch 22, batch 0, loss[loss=0.2317, ctc_loss=0.1577, cr_loss=0.3698, over 20934.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1577, cr_loss=0.3698, over 20934.00 frames. ], batch size: 60, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:44:03,432 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 02:44:21,816 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.04351, ctc_loss=0.04351, cr_loss=1.173e-14, over 944034.00 frames. 2024-09-16 02:44:21,816 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 02:44:29,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=380256.0, ans=0.125 2024-09-16 02:44:33,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=380256.0, ans=0.125 2024-09-16 02:44:38,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=380284.3333333333, ans=0.0 2024-09-16 02:45:11,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.163e+02 2.356e+02 2.596e+02 6.775e+02, threshold=4.712e+02, percent-clipped=1.0 2024-09-16 02:45:34,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380369.3333333333, ans=0.1 2024-09-16 02:45:42,590 INFO [train.py:1198] (1/2) Epoch 22, batch 50, loss[loss=0.2582, ctc_loss=0.1756, cr_loss=0.4132, over 20811.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1585, cr_loss=0.3777, over 927918.75 frames. ], batch size: 59, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:45:58,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=380426.0, ans=0.2 2024-09-16 02:46:23,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=380454.3333333333, ans=0.125 2024-09-16 02:46:32,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-16 02:46:36,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380482.6666666667, ans=0.125 2024-09-16 02:46:50,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-16 02:46:57,749 INFO [train.py:1198] (1/2) Epoch 22, batch 100, loss[loss=0.2171, ctc_loss=0.1456, cr_loss=0.3572, over 20981.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1589, cr_loss=0.3802, over 1626595.35 frames. ], batch size: 55, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:47:04,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=380539.3333333333, ans=0.0 2024-09-16 02:47:19,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=380567.6666666667, ans=0.0 2024-09-16 02:47:23,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=380567.6666666667, ans=0.125 2024-09-16 02:47:43,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380624.3333333333, ans=0.1 2024-09-16 02:47:44,613 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.058e+02 2.210e+02 2.381e+02 2.814e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 02:47:51,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=380624.3333333333, ans=0.125 2024-09-16 02:48:12,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-16 02:48:13,640 INFO [train.py:1198] (1/2) Epoch 22, batch 150, loss[loss=0.217, ctc_loss=0.1456, cr_loss=0.3573, over 20945.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1592, cr_loss=0.3813, over 2180185.71 frames. ], batch size: 49, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:49:01,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=380766.0, ans=0.125 2024-09-16 02:49:18,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=380794.3333333333, ans=0.0 2024-09-16 02:49:29,960 INFO [train.py:1198] (1/2) Epoch 22, batch 200, loss[loss=0.1939, ctc_loss=0.1289, cr_loss=0.3248, over 21076.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1595, cr_loss=0.3819, over 2599422.07 frames. ], batch size: 53, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:49:59,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=380879.3333333333, ans=0.125 2024-09-16 02:50:06,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-16 02:50:20,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.068e+02 2.188e+02 2.417e+02 6.616e+02, threshold=4.376e+02, percent-clipped=1.0 2024-09-16 02:50:25,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380907.6666666667, ans=0.1 2024-09-16 02:50:25,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=380907.6666666667, ans=0.0 2024-09-16 02:50:49,265 INFO [train.py:1198] (1/2) Epoch 22, batch 250, loss[loss=0.242, ctc_loss=0.1641, cr_loss=0.3896, over 21024.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1602, cr_loss=0.383, over 2944493.64 frames. ], batch size: 61, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:50:53,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2024-09-16 02:50:57,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=380964.3333333333, ans=0.0 2024-09-16 02:52:07,671 INFO [train.py:1198] (1/2) Epoch 22, batch 300, loss[loss=0.2471, ctc_loss=0.1718, cr_loss=0.3765, over 20314.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1601, cr_loss=0.3832, over 3199817.64 frames. ], batch size: 74, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:52:25,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=381134.3333333333, ans=0.125 2024-09-16 02:52:37,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=381162.6666666667, ans=0.025 2024-09-16 02:52:48,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=381162.6666666667, ans=0.0 2024-09-16 02:52:53,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.113e+02 2.187e+02 2.404e+02 3.032e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-16 02:53:22,399 INFO [train.py:1198] (1/2) Epoch 22, batch 350, loss[loss=0.2362, ctc_loss=0.1569, cr_loss=0.3965, over 21054.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.16, cr_loss=0.3837, over 3394271.46 frames. ], batch size: 56, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:53:22,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=381247.6666666667, ans=0.125 2024-09-16 02:53:24,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381247.6666666667, ans=0.1 2024-09-16 02:53:48,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2024-09-16 02:53:48,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=381276.0, ans=15.0 2024-09-16 02:53:49,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=381276.0, ans=0.2 2024-09-16 02:54:16,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381332.6666666667, ans=0.125 2024-09-16 02:54:37,862 INFO [train.py:1198] (1/2) Epoch 22, batch 400, loss[loss=0.2481, ctc_loss=0.1688, cr_loss=0.3964, over 21006.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.159, cr_loss=0.3818, over 3557170.80 frames. ], batch size: 61, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:54:40,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-16 02:55:02,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381417.6666666667, ans=0.1 2024-09-16 02:55:26,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.102e+02 2.216e+02 2.517e+02 3.356e+02, threshold=4.432e+02, percent-clipped=0.0 2024-09-16 02:55:29,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=381474.3333333333, ans=0.125 2024-09-16 02:55:38,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381502.6666666667, ans=0.125 2024-09-16 02:55:54,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=381502.6666666667, ans=12.0 2024-09-16 02:55:56,497 INFO [train.py:1198] (1/2) Epoch 22, batch 450, loss[loss=0.2479, ctc_loss=0.1676, cr_loss=0.4015, over 20840.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1589, cr_loss=0.3815, over 3668221.54 frames. ], batch size: 65, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:56:27,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381587.6666666667, ans=0.1 2024-09-16 02:57:15,132 INFO [train.py:1198] (1/2) Epoch 22, batch 500, loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3799, over 20760.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1595, cr_loss=0.3823, over 3751962.66 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:57:16,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381672.6666666667, ans=0.125 2024-09-16 02:57:30,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381701.0, ans=0.125 2024-09-16 02:57:32,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=381701.0, ans=0.125 2024-09-16 02:57:33,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=381701.0, ans=0.5 2024-09-16 02:57:33,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-16 02:57:56,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=381729.3333333333, ans=0.0 2024-09-16 02:58:01,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2024-09-16 02:58:03,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-16 02:58:03,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.101e+02 2.234e+02 2.387e+02 4.953e+02, threshold=4.468e+02, percent-clipped=1.0 2024-09-16 02:58:20,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381786.0, ans=0.125 2024-09-16 02:58:21,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-09-16 02:58:31,141 INFO [train.py:1198] (1/2) Epoch 22, batch 550, loss[loss=0.2389, ctc_loss=0.1618, cr_loss=0.3855, over 21077.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1594, cr_loss=0.3822, over 3837945.64 frames. ], batch size: 59, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:58:40,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-09-16 02:59:04,569 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:59:10,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=381871.0, ans=0.0 2024-09-16 02:59:46,296 INFO [train.py:1198] (1/2) Epoch 22, batch 600, loss[loss=0.2252, ctc_loss=0.1506, cr_loss=0.3731, over 20963.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3794, over 3894812.55 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:00:26,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-16 03:00:34,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.066e+02 2.193e+02 2.282e+02 3.663e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-16 03:01:02,090 INFO [train.py:1198] (1/2) Epoch 22, batch 650, loss[loss=0.2094, ctc_loss=0.1424, cr_loss=0.3354, over 20783.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1583, cr_loss=0.3793, over 3922877.51 frames. ], batch size: 53, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:01:04,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=382097.6666666667, ans=0.035 2024-09-16 03:01:16,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=382126.0, ans=0.0 2024-09-16 03:01:23,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=382126.0, ans=0.125 2024-09-16 03:02:21,183 INFO [train.py:1198] (1/2) Epoch 22, batch 700, loss[loss=0.2493, ctc_loss=0.1674, cr_loss=0.4098, over 20981.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3787, over 3966137.77 frames. ], batch size: 55, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:02:57,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=382296.0, ans=0.125 2024-09-16 03:03:12,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.074e+02 2.249e+02 2.464e+02 3.829e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-16 03:03:15,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=382324.3333333333, ans=0.125 2024-09-16 03:03:39,062 INFO [train.py:1198] (1/2) Epoch 22, batch 750, loss[loss=0.2018, ctc_loss=0.1355, cr_loss=0.3316, over 20983.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3787, over 3997954.71 frames. ], batch size: 52, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:03:44,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=382381.0, ans=0.0 2024-09-16 03:03:51,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=382381.0, ans=0.0 2024-09-16 03:03:54,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-09-16 03:03:56,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382409.3333333333, ans=0.125 2024-09-16 03:04:00,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2024-09-16 03:04:26,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-16 03:04:39,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=382494.3333333333, ans=0.125 2024-09-16 03:04:40,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=382494.3333333333, ans=0.0 2024-09-16 03:04:54,149 INFO [train.py:1198] (1/2) Epoch 22, batch 800, loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3708, over 20767.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1572, cr_loss=0.3787, over 4031993.65 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:04:57,646 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:05:11,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=382551.0, ans=0.0 2024-09-16 03:05:20,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=382551.0, ans=0.0 2024-09-16 03:05:21,867 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:05:29,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=382579.3333333333, ans=0.04949747468305833 2024-09-16 03:05:42,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.044e+02 2.212e+02 2.409e+02 3.838e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-16 03:05:42,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=382607.6666666667, ans=0.125 2024-09-16 03:05:43,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-09-16 03:05:50,371 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:05:50,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=382607.6666666667, ans=15.0 2024-09-16 03:05:51,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=22.5 2024-09-16 03:06:02,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=382636.0, ans=0.125 2024-09-16 03:06:09,905 INFO [train.py:1198] (1/2) Epoch 22, batch 850, loss[loss=0.2078, ctc_loss=0.1404, cr_loss=0.337, over 21037.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3777, over 4058620.29 frames. ], batch size: 62, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:06:11,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=382664.3333333333, ans=0.0 2024-09-16 03:06:19,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=382664.3333333333, ans=0.125 2024-09-16 03:06:58,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=382749.3333333333, ans=0.0 2024-09-16 03:07:25,291 INFO [train.py:1198] (1/2) Epoch 22, batch 900, loss[loss=0.2607, ctc_loss=0.176, cr_loss=0.4235, over 20631.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1563, cr_loss=0.3763, over 4075060.20 frames. ], batch size: 66, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:08:05,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=382862.6666666667, ans=0.2 2024-09-16 03:08:12,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=382891.0, ans=0.0 2024-09-16 03:08:17,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.051e+02 2.198e+02 2.389e+02 6.083e+02, threshold=4.396e+02, percent-clipped=1.0 2024-09-16 03:08:20,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=382891.0, ans=0.125 2024-09-16 03:08:44,626 INFO [train.py:1198] (1/2) Epoch 22, batch 950, loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.3802, over 20977.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1568, cr_loss=0.3769, over 4076649.51 frames. ], batch size: 58, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:08:53,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=382947.6666666667, ans=0.125 2024-09-16 03:09:14,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=382976.0, ans=0.0 2024-09-16 03:09:16,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-09-16 03:09:34,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=383032.6666666667, ans=0.125 2024-09-16 03:10:02,885 INFO [train.py:1198] (1/2) Epoch 22, batch 1000, loss[loss=0.2865, ctc_loss=0.199, cr_loss=0.4376, over 18246.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3787, over 4082802.21 frames. ], batch size: 108, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:10:13,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383089.3333333333, ans=0.125 2024-09-16 03:10:33,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383146.0, ans=0.125 2024-09-16 03:10:51,295 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.084e+02 2.204e+02 2.351e+02 3.756e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-16 03:10:54,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383174.3333333333, ans=0.125 2024-09-16 03:10:56,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=383174.3333333333, ans=0.0 2024-09-16 03:11:18,593 INFO [train.py:1198] (1/2) Epoch 22, batch 1050, loss[loss=0.2021, ctc_loss=0.133, cr_loss=0.3453, over 20992.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1576, cr_loss=0.3777, over 4082756.58 frames. ], batch size: 52, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:11:19,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:11:21,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-16 03:11:26,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383231.0, ans=0.1 2024-09-16 03:11:35,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383259.3333333333, ans=0.125 2024-09-16 03:11:44,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=383259.3333333333, ans=0.0 2024-09-16 03:12:05,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-16 03:12:08,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2024-09-16 03:12:14,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2024-09-16 03:12:18,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=383344.3333333333, ans=0.125 2024-09-16 03:12:33,214 INFO [train.py:1198] (1/2) Epoch 22, batch 1100, loss[loss=0.241, ctc_loss=0.1641, cr_loss=0.3847, over 21023.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1584, cr_loss=0.3791, over 4089074.85 frames. ], batch size: 61, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:13:09,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383429.3333333333, ans=0.1 2024-09-16 03:13:24,400 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.108e+02 2.238e+02 2.414e+02 3.006e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 03:13:45,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=383486.0, ans=0.0 2024-09-16 03:13:51,150 INFO [train.py:1198] (1/2) Epoch 22, batch 1150, loss[loss=0.2385, ctc_loss=0.164, cr_loss=0.3724, over 20989.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1598, cr_loss=0.381, over 4086457.95 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:14:37,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-16 03:14:54,616 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:15:11,172 INFO [train.py:1198] (1/2) Epoch 22, batch 1200, loss[loss=0.2191, ctc_loss=0.148, cr_loss=0.3556, over 21005.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1597, cr_loss=0.3808, over 4094132.39 frames. ], batch size: 52, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:15:47,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383712.6666666667, ans=0.1 2024-09-16 03:15:54,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=383741.0, ans=0.2 2024-09-16 03:15:58,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.103e+02 2.212e+02 2.374e+02 2.854e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 03:15:59,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=383741.0, ans=0.125 2024-09-16 03:16:25,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.16 vs. limit=10.0 2024-09-16 03:16:26,110 INFO [train.py:1198] (1/2) Epoch 22, batch 1250, loss[loss=0.2611, ctc_loss=0.181, cr_loss=0.4007, over 19432.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3818, over 4092526.10 frames. ], batch size: 90, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:16:33,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=383797.6666666667, ans=0.125 2024-09-16 03:16:54,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=383854.3333333333, ans=0.125 2024-09-16 03:17:41,421 INFO [train.py:1198] (1/2) Epoch 22, batch 1300, loss[loss=0.2108, ctc_loss=0.1396, cr_loss=0.3562, over 20952.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1597, cr_loss=0.3803, over 4086220.14 frames. ], batch size: 49, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:18:13,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=383996.0, ans=0.125 2024-09-16 03:18:31,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.059e+02 2.183e+02 2.355e+02 2.900e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-16 03:18:33,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=384024.3333333333, ans=0.125 2024-09-16 03:18:38,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-16 03:19:00,535 INFO [train.py:1198] (1/2) Epoch 22, batch 1350, loss[loss=0.2403, ctc_loss=0.1603, cr_loss=0.4001, over 21028.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1589, cr_loss=0.3797, over 4087289.87 frames. ], batch size: 62, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:19:02,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=384081.0, ans=0.125 2024-09-16 03:19:20,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384109.3333333333, ans=0.125 2024-09-16 03:19:26,317 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:20:09,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-09-16 03:20:16,103 INFO [train.py:1198] (1/2) Epoch 22, batch 1400, loss[loss=0.2125, ctc_loss=0.1416, cr_loss=0.3544, over 20980.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3794, over 4092674.37 frames. ], batch size: 48, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:20:28,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-16 03:20:33,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=384251.0, ans=0.0 2024-09-16 03:20:43,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=384251.0, ans=0.125 2024-09-16 03:20:50,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=22.5 2024-09-16 03:20:51,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=384279.3333333333, ans=0.0 2024-09-16 03:21:09,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.042e+02 2.214e+02 2.345e+02 3.120e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 03:21:23,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-16 03:21:34,902 INFO [train.py:1198] (1/2) Epoch 22, batch 1450, loss[loss=0.2317, ctc_loss=0.1574, cr_loss=0.3716, over 21045.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1586, cr_loss=0.3794, over 4089668.83 frames. ], batch size: 62, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:22:11,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.65 vs. limit=10.0 2024-09-16 03:22:25,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=384449.3333333333, ans=0.0 2024-09-16 03:22:50,645 INFO [train.py:1198] (1/2) Epoch 22, batch 1500, loss[loss=0.2393, ctc_loss=0.162, cr_loss=0.3869, over 20978.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.158, cr_loss=0.3793, over 4102685.20 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:23:05,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=384534.3333333333, ans=0.025 2024-09-16 03:23:19,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-09-16 03:23:22,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=384562.6666666667, ans=0.125 2024-09-16 03:23:28,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=384562.6666666667, ans=0.0 2024-09-16 03:23:39,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.187e+02 2.319e+02 2.511e+02 4.105e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-16 03:23:40,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=384591.0, ans=0.125 2024-09-16 03:23:49,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=384619.3333333333, ans=0.2 2024-09-16 03:24:05,657 INFO [train.py:1198] (1/2) Epoch 22, batch 1550, loss[loss=0.2238, ctc_loss=0.1517, cr_loss=0.3607, over 21054.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1575, cr_loss=0.3781, over 4101012.91 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:24:10,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=384647.6666666667, ans=0.2 2024-09-16 03:24:24,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=384676.0, ans=0.125 2024-09-16 03:24:39,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=384704.3333333333, ans=0.125 2024-09-16 03:24:41,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=384704.3333333333, ans=0.125 2024-09-16 03:25:18,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=384761.0, ans=0.125 2024-09-16 03:25:25,144 INFO [train.py:1198] (1/2) Epoch 22, batch 1600, loss[loss=0.2293, ctc_loss=0.1551, cr_loss=0.3712, over 20955.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3788, over 4098689.32 frames. ], batch size: 50, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:26:07,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=384846.0, ans=0.025 2024-09-16 03:26:14,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.090e+02 2.207e+02 2.363e+02 3.056e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-16 03:26:18,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384874.3333333333, ans=0.1 2024-09-16 03:26:28,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=384902.6666666667, ans=0.125 2024-09-16 03:26:43,496 INFO [train.py:1198] (1/2) Epoch 22, batch 1650, loss[loss=0.2307, ctc_loss=0.1528, cr_loss=0.3899, over 20830.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3797, over 4110125.60 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:26:43,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=384931.0, ans=0.125 2024-09-16 03:26:54,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384931.0, ans=0.1 2024-09-16 03:27:08,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=384959.3333333333, ans=0.2 2024-09-16 03:27:16,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=384987.6666666667, ans=0.5 2024-09-16 03:27:59,590 INFO [train.py:1198] (1/2) Epoch 22, batch 1700, loss[loss=0.2367, ctc_loss=0.1606, cr_loss=0.3802, over 21059.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3794, over 4109884.98 frames. ], batch size: 53, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:28:15,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=385101.0, ans=0.0 2024-09-16 03:28:36,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385129.3333333333, ans=0.1 2024-09-16 03:28:49,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.062e+02 2.242e+02 2.396e+02 5.065e+02, threshold=4.484e+02, percent-clipped=1.0 2024-09-16 03:28:54,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385157.6666666667, ans=0.0 2024-09-16 03:29:10,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-16 03:29:14,555 INFO [train.py:1198] (1/2) Epoch 22, batch 1750, loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3771, over 21071.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1581, cr_loss=0.3798, over 4114415.49 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:29:17,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385214.3333333333, ans=0.1 2024-09-16 03:29:24,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=22.5 2024-09-16 03:29:42,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=385242.6666666667, ans=0.125 2024-09-16 03:29:58,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=385299.3333333333, ans=0.0 2024-09-16 03:30:31,086 INFO [train.py:1198] (1/2) Epoch 22, batch 1800, loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3727, over 20776.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1586, cr_loss=0.3805, over 4099952.33 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:30:37,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=385356.0, ans=0.125 2024-09-16 03:31:10,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=385412.6666666667, ans=0.0 2024-09-16 03:31:25,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.081e+02 2.211e+02 2.365e+02 3.519e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-16 03:31:30,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=385441.0, ans=0.125 2024-09-16 03:31:49,732 INFO [train.py:1198] (1/2) Epoch 22, batch 1850, loss[loss=0.2815, ctc_loss=0.191, cr_loss=0.4527, over 20961.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1597, cr_loss=0.3823, over 4097515.97 frames. ], batch size: 64, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:33:08,519 INFO [train.py:1198] (1/2) Epoch 22, batch 1900, loss[loss=0.2896, ctc_loss=0.2072, cr_loss=0.412, over 14097.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1595, cr_loss=0.3814, over 4096308.84 frames. ], batch size: 151, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:33:14,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=385639.3333333333, ans=0.125 2024-09-16 03:33:14,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=385639.3333333333, ans=0.125 2024-09-16 03:33:26,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=385667.6666666667, ans=0.025 2024-09-16 03:33:38,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=385696.0, ans=0.0 2024-09-16 03:33:50,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=385696.0, ans=10.0 2024-09-16 03:33:52,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=385724.3333333333, ans=0.2 2024-09-16 03:33:59,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.069e+02 2.212e+02 2.397e+02 5.471e+02, threshold=4.425e+02, percent-clipped=1.0 2024-09-16 03:34:13,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=385752.6666666667, ans=0.0 2024-09-16 03:34:23,639 INFO [train.py:1198] (1/2) Epoch 22, batch 1950, loss[loss=0.2393, ctc_loss=0.1615, cr_loss=0.389, over 21004.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1589, cr_loss=0.3812, over 4101407.79 frames. ], batch size: 63, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:34:35,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=385781.0, ans=0.125 2024-09-16 03:34:54,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-16 03:35:21,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-16 03:35:26,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=385894.3333333333, ans=0.2 2024-09-16 03:35:29,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=385894.3333333333, ans=0.125 2024-09-16 03:35:39,606 INFO [train.py:1198] (1/2) Epoch 22, batch 2000, loss[loss=0.2538, ctc_loss=0.1772, cr_loss=0.3831, over 19494.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3801, over 4110412.70 frames. ], batch size: 90, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:35:56,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=385951.0, ans=0.0 2024-09-16 03:36:16,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385979.3333333333, ans=0.1 2024-09-16 03:36:33,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.060e+02 2.177e+02 2.279e+02 3.176e+02, threshold=4.353e+02, percent-clipped=0.0 2024-09-16 03:36:58,266 INFO [train.py:1198] (1/2) Epoch 22, batch 2050, loss[loss=0.2107, ctc_loss=0.1405, cr_loss=0.3506, over 20980.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3778, over 4112393.29 frames. ], batch size: 50, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:37:01,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=386064.3333333333, ans=0.0 2024-09-16 03:37:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=386092.6666666667, ans=0.0 2024-09-16 03:37:13,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=386092.6666666667, ans=0.125 2024-09-16 03:37:15,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=386092.6666666667, ans=0.025 2024-09-16 03:37:18,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=386092.6666666667, ans=0.0 2024-09-16 03:37:49,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=386149.3333333333, ans=0.2 2024-09-16 03:38:16,602 INFO [train.py:1198] (1/2) Epoch 22, batch 2100, loss[loss=0.1943, ctc_loss=0.1283, cr_loss=0.3303, over 21045.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3789, over 4111343.15 frames. ], batch size: 53, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:38:21,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386206.0, ans=0.1 2024-09-16 03:38:52,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386262.6666666667, ans=0.1 2024-09-16 03:39:08,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.117e+02 2.300e+02 2.525e+02 5.578e+02, threshold=4.599e+02, percent-clipped=2.0 2024-09-16 03:39:29,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386319.3333333333, ans=0.1 2024-09-16 03:39:32,022 INFO [train.py:1198] (1/2) Epoch 22, batch 2150, loss[loss=0.2803, ctc_loss=0.1916, cr_loss=0.4435, over 20658.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1572, cr_loss=0.3786, over 4108639.86 frames. ], batch size: 68, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:39:35,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=386347.6666666667, ans=0.2 2024-09-16 03:40:16,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=386432.6666666667, ans=0.0 2024-09-16 03:40:47,735 INFO [train.py:1198] (1/2) Epoch 22, batch 2200, loss[loss=0.2318, ctc_loss=0.1563, cr_loss=0.3776, over 20873.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.3782, over 4110490.67 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:40:48,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-09-16 03:41:07,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=386517.6666666667, ans=0.025 2024-09-16 03:41:13,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=386517.6666666667, ans=0.125 2024-09-16 03:41:15,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=386517.6666666667, ans=0.125 2024-09-16 03:41:32,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=386574.3333333333, ans=0.125 2024-09-16 03:41:39,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.125e+02 2.244e+02 2.457e+02 3.951e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-16 03:42:04,012 INFO [train.py:1198] (1/2) Epoch 22, batch 2250, loss[loss=0.2205, ctc_loss=0.1473, cr_loss=0.3661, over 20847.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3792, over 4106594.52 frames. ], batch size: 59, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:42:05,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386631.0, ans=0.125 2024-09-16 03:42:07,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386631.0, ans=0.1 2024-09-16 03:42:17,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=386631.0, ans=0.0 2024-09-16 03:43:15,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=386744.3333333333, ans=0.0 2024-09-16 03:43:22,209 INFO [train.py:1198] (1/2) Epoch 22, batch 2300, loss[loss=0.2587, ctc_loss=0.1775, cr_loss=0.4059, over 21056.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3797, over 4108324.41 frames. ], batch size: 59, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:43:25,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386772.6666666667, ans=0.125 2024-09-16 03:43:55,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=386829.3333333333, ans=0.125 2024-09-16 03:44:02,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-16 03:44:16,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.076e+02 2.209e+02 2.397e+02 3.724e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 03:44:17,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=386857.6666666667, ans=0.125 2024-09-16 03:44:41,198 INFO [train.py:1198] (1/2) Epoch 22, batch 2350, loss[loss=0.2595, ctc_loss=0.1752, cr_loss=0.4215, over 20928.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.158, cr_loss=0.3799, over 4107573.94 frames. ], batch size: 60, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:44:45,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-16 03:45:16,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=386971.0, ans=0.125 2024-09-16 03:45:20,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-16 03:45:32,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=386999.3333333333, ans=0.2 2024-09-16 03:45:38,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-16 03:45:49,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-16 03:45:56,643 INFO [train.py:1198] (1/2) Epoch 22, batch 2400, loss[loss=0.2571, ctc_loss=0.1746, cr_loss=0.4125, over 20984.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1582, cr_loss=0.3792, over 4102595.05 frames. ], batch size: 64, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:46:16,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-16 03:46:47,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.146e+02 2.286e+02 2.479e+02 3.833e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-16 03:46:53,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387141.0, ans=0.125 2024-09-16 03:47:01,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=387169.3333333333, ans=0.125 2024-09-16 03:47:11,657 INFO [train.py:1198] (1/2) Epoch 22, batch 2450, loss[loss=0.1964, ctc_loss=0.1295, cr_loss=0.3348, over 20943.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1588, cr_loss=0.3801, over 4099096.39 frames. ], batch size: 50, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:47:13,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=387197.6666666667, ans=0.125 2024-09-16 03:47:42,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=387254.3333333333, ans=0.125 2024-09-16 03:47:50,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=387254.3333333333, ans=22.5 2024-09-16 03:47:55,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-09-16 03:48:17,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387311.0, ans=0.1 2024-09-16 03:48:29,991 INFO [train.py:1198] (1/2) Epoch 22, batch 2500, loss[loss=0.2371, ctc_loss=0.1636, cr_loss=0.3676, over 20818.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1595, cr_loss=0.381, over 4084813.25 frames. ], batch size: 59, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:49:19,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-16 03:49:20,384 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:49:24,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.111e+02 2.264e+02 2.466e+02 3.770e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 03:49:37,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-16 03:49:38,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387452.6666666667, ans=0.125 2024-09-16 03:49:48,867 INFO [train.py:1198] (1/2) Epoch 22, batch 2550, loss[loss=0.2088, ctc_loss=0.1375, cr_loss=0.3564, over 20975.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.159, cr_loss=0.3806, over 4088577.09 frames. ], batch size: 52, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:49:52,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387481.0, ans=0.0 2024-09-16 03:50:13,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=387509.3333333333, ans=0.025 2024-09-16 03:51:04,677 INFO [train.py:1198] (1/2) Epoch 22, batch 2600, loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3693, over 20969.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1575, cr_loss=0.378, over 4092759.75 frames. ], batch size: 55, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:51:29,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=387651.0, ans=0.0 2024-09-16 03:51:55,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.061e+02 2.189e+02 2.377e+02 4.956e+02, threshold=4.378e+02, percent-clipped=1.0 2024-09-16 03:52:15,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=387736.0, ans=0.125 2024-09-16 03:52:20,064 INFO [train.py:1198] (1/2) Epoch 22, batch 2650, loss[loss=0.2266, ctc_loss=0.1521, cr_loss=0.3725, over 20871.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1569, cr_loss=0.3764, over 4101828.87 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:52:20,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.95 vs. limit=10.0 2024-09-16 03:52:29,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387764.3333333333, ans=0.0 2024-09-16 03:52:41,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387792.6666666667, ans=0.1 2024-09-16 03:53:08,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387849.3333333333, ans=0.125 2024-09-16 03:53:33,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=387906.0, ans=0.125 2024-09-16 03:53:35,156 INFO [train.py:1198] (1/2) Epoch 22, batch 2700, loss[loss=0.1913, ctc_loss=0.1271, cr_loss=0.3214, over 20973.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1581, cr_loss=0.3794, over 4106085.66 frames. ], batch size: 51, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:53:45,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387906.0, ans=0.1 2024-09-16 03:54:13,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=387962.6666666667, ans=0.0 2024-09-16 03:54:25,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=387991.0, ans=0.125 2024-09-16 03:54:26,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=387991.0, ans=0.05 2024-09-16 03:54:26,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=387991.0, ans=0.0 2024-09-16 03:54:29,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.062e+02 2.198e+02 2.349e+02 3.495e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-16 03:54:48,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=388019.3333333333, ans=0.125 2024-09-16 03:54:51,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.69 vs. limit=10.0 2024-09-16 03:54:53,748 INFO [train.py:1198] (1/2) Epoch 22, batch 2750, loss[loss=0.201, ctc_loss=0.1341, cr_loss=0.3344, over 20997.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1583, cr_loss=0.3794, over 4097270.05 frames. ], batch size: 52, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:54:55,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=388047.6666666667, ans=0.0 2024-09-16 03:55:08,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=388047.6666666667, ans=0.125 2024-09-16 03:55:52,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-16 03:56:05,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=388161.0, ans=0.125 2024-09-16 03:56:06,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-16 03:56:11,440 INFO [train.py:1198] (1/2) Epoch 22, batch 2800, loss[loss=0.2558, ctc_loss=0.1764, cr_loss=0.3967, over 18438.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1584, cr_loss=0.3791, over 4088595.63 frames. ], batch size: 108, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:57:02,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.043e+02 2.192e+02 2.324e+02 3.361e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-16 03:57:14,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=388302.6666666667, ans=0.125 2024-09-16 03:57:17,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=388302.6666666667, ans=0.125 2024-09-16 03:57:26,415 INFO [train.py:1198] (1/2) Epoch 22, batch 2850, loss[loss=0.2396, ctc_loss=0.1627, cr_loss=0.3846, over 20837.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1587, cr_loss=0.3798, over 4092237.49 frames. ], batch size: 65, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:57:34,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=388331.0, ans=0.0 2024-09-16 03:57:49,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=388359.3333333333, ans=0.125 2024-09-16 03:58:01,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388387.6666666667, ans=0.1 2024-09-16 03:58:12,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=388416.0, ans=0.125 2024-09-16 03:58:12,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=15.0 2024-09-16 03:58:19,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=388416.0, ans=0.0 2024-09-16 03:58:42,378 INFO [train.py:1198] (1/2) Epoch 22, batch 2900, loss[loss=0.2087, ctc_loss=0.1385, cr_loss=0.3512, over 19499.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3806, over 4086541.37 frames. ], batch size: 43, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:58:47,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=388472.6666666667, ans=0.04949747468305833 2024-09-16 03:58:59,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=388501.0, ans=0.0 2024-09-16 03:59:07,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-16 03:59:36,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.053e+02 2.203e+02 2.348e+02 3.053e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 03:59:49,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388586.0, ans=0.1 2024-09-16 04:00:01,270 INFO [train.py:1198] (1/2) Epoch 22, batch 2950, loss[loss=0.2598, ctc_loss=0.1757, cr_loss=0.4204, over 20960.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3794, over 4104844.17 frames. ], batch size: 64, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:00:18,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388642.6666666667, ans=0.1 2024-09-16 04:01:03,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-16 04:01:09,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=388727.6666666667, ans=0.0 2024-09-16 04:01:19,948 INFO [train.py:1198] (1/2) Epoch 22, batch 3000, loss[loss=0.1887, ctc_loss=0.1277, cr_loss=0.3048, over 20958.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1591, cr_loss=0.3808, over 4090075.74 frames. ], batch size: 50, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:01:19,948 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 04:01:38,097 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1361, 3.2279, 2.5304, 3.0425], device='cuda:1') 2024-09-16 04:01:45,012 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.04351, ctc_loss=0.04351, cr_loss=1.114e-14, over 944034.00 frames. 2024-09-16 04:01:45,012 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 04:02:00,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=388784.3333333333, ans=0.2 2024-09-16 04:02:03,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=388784.3333333333, ans=0.2 2024-09-16 04:02:36,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.099e+02 2.248e+02 2.396e+02 4.773e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-16 04:02:40,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=388841.0, ans=0.0 2024-09-16 04:02:52,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=388869.3333333333, ans=0.0 2024-09-16 04:02:58,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388869.3333333333, ans=0.1 2024-09-16 04:03:01,136 INFO [train.py:1198] (1/2) Epoch 22, batch 3050, loss[loss=0.2382, ctc_loss=0.162, cr_loss=0.3812, over 20337.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3804, over 4090291.76 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:03:16,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=388926.0, ans=0.125 2024-09-16 04:04:17,353 INFO [train.py:1198] (1/2) Epoch 22, batch 3100, loss[loss=0.1902, ctc_loss=0.1267, cr_loss=0.3176, over 20963.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1598, cr_loss=0.3809, over 4081010.10 frames. ], batch size: 51, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:04:31,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-09-16 04:04:54,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-16 04:04:57,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389096.0, ans=0.125 2024-09-16 04:05:10,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=389124.3333333333, ans=10.0 2024-09-16 04:05:11,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.102e+02 2.307e+02 2.423e+02 3.397e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-16 04:05:22,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389152.6666666667, ans=0.1 2024-09-16 04:05:36,204 INFO [train.py:1198] (1/2) Epoch 22, batch 3150, loss[loss=0.2385, ctc_loss=0.1629, cr_loss=0.3778, over 20964.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1598, cr_loss=0.3808, over 4085568.63 frames. ], batch size: 58, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:05:44,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-09-16 04:06:00,756 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:06:19,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2024-09-16 04:06:32,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=389266.0, ans=0.125 2024-09-16 04:06:35,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389266.0, ans=0.125 2024-09-16 04:06:54,390 INFO [train.py:1198] (1/2) Epoch 22, batch 3200, loss[loss=0.2745, ctc_loss=0.1926, cr_loss=0.4097, over 14665.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1598, cr_loss=0.3815, over 4085643.08 frames. ], batch size: 149, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:06:54,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389322.6666666667, ans=0.1 2024-09-16 04:07:18,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=389351.0, ans=0.125 2024-09-16 04:07:18,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=389351.0, ans=0.5 2024-09-16 04:07:45,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.107e+02 2.270e+02 2.473e+02 4.706e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-16 04:08:09,611 INFO [train.py:1198] (1/2) Epoch 22, batch 3250, loss[loss=0.2195, ctc_loss=0.1457, cr_loss=0.3689, over 20783.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1591, cr_loss=0.3805, over 4074905.92 frames. ], batch size: 53, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:08:15,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389464.3333333333, ans=0.1 2024-09-16 04:08:40,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-16 04:08:59,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=389549.3333333333, ans=0.125 2024-09-16 04:09:11,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389577.6666666667, ans=0.1 2024-09-16 04:09:25,389 INFO [train.py:1198] (1/2) Epoch 22, batch 3300, loss[loss=0.2153, ctc_loss=0.1477, cr_loss=0.3381, over 20978.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3795, over 4082938.11 frames. ], batch size: 50, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:09:58,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=389662.6666666667, ans=0.0 2024-09-16 04:10:17,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.038e+02 2.206e+02 2.374e+02 3.484e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-16 04:10:41,575 INFO [train.py:1198] (1/2) Epoch 22, batch 3350, loss[loss=0.2615, ctc_loss=0.1817, cr_loss=0.3993, over 20976.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1584, cr_loss=0.3787, over 4086586.21 frames. ], batch size: 64, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:10:55,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389747.6666666667, ans=0.1 2024-09-16 04:11:41,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=389832.6666666667, ans=0.125 2024-09-16 04:11:59,836 INFO [train.py:1198] (1/2) Epoch 22, batch 3400, loss[loss=0.2393, ctc_loss=0.162, cr_loss=0.3865, over 21054.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1589, cr_loss=0.379, over 4080065.79 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:12:54,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.077e+02 2.179e+02 2.335e+02 3.315e+02, threshold=4.357e+02, percent-clipped=0.0 2024-09-16 04:13:12,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390002.6666666667, ans=0.1 2024-09-16 04:13:15,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-16 04:13:17,973 INFO [train.py:1198] (1/2) Epoch 22, batch 3450, loss[loss=0.2435, ctc_loss=0.1629, cr_loss=0.4026, over 20887.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1595, cr_loss=0.3802, over 4062259.36 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:13:18,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=390031.0, ans=0.125 2024-09-16 04:13:22,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=390031.0, ans=0.125 2024-09-16 04:13:38,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=390059.3333333333, ans=12.0 2024-09-16 04:13:42,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=390059.3333333333, ans=0.125 2024-09-16 04:14:07,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-16 04:14:33,744 INFO [train.py:1198] (1/2) Epoch 22, batch 3500, loss[loss=0.2361, ctc_loss=0.161, cr_loss=0.3753, over 20344.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1599, cr_loss=0.3809, over 4065579.45 frames. ], batch size: 74, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:14:46,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=390172.6666666667, ans=0.125 2024-09-16 04:15:01,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-16 04:15:04,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=390229.3333333333, ans=0.0 2024-09-16 04:15:17,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=390257.6666666667, ans=0.2 2024-09-16 04:15:24,898 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.119e+02 2.223e+02 2.406e+02 4.097e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-16 04:15:29,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=390257.6666666667, ans=0.0 2024-09-16 04:15:49,155 INFO [train.py:1198] (1/2) Epoch 22, batch 3550, loss[loss=0.2308, ctc_loss=0.1543, cr_loss=0.3826, over 21045.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1601, cr_loss=0.3815, over 4064225.89 frames. ], batch size: 62, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:15:57,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=390314.3333333333, ans=0.0 2024-09-16 04:17:08,501 INFO [train.py:1198] (1/2) Epoch 22, batch 3600, loss[loss=0.1973, ctc_loss=0.1287, cr_loss=0.3427, over 19564.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1595, cr_loss=0.3818, over 4064589.71 frames. ], batch size: 43, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:17:25,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2024-09-16 04:17:41,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-16 04:17:45,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=390512.6666666667, ans=0.0 2024-09-16 04:17:57,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=390541.0, ans=0.125 2024-09-16 04:18:03,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.081e+02 2.244e+02 2.465e+02 3.596e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-16 04:18:27,093 INFO [train.py:1198] (1/2) Epoch 22, batch 3650, loss[loss=0.2348, ctc_loss=0.1587, cr_loss=0.3804, over 20913.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.16, cr_loss=0.3826, over 4076523.37 frames. ], batch size: 54, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:18:57,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=390654.3333333333, ans=0.2 2024-09-16 04:19:05,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=390654.3333333333, ans=0.125 2024-09-16 04:19:15,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-16 04:19:39,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2024-09-16 04:19:42,996 INFO [train.py:1198] (1/2) Epoch 22, batch 3700, loss[loss=0.1894, ctc_loss=0.1244, cr_loss=0.3246, over 20956.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1601, cr_loss=0.3827, over 4067046.90 frames. ], batch size: 50, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:20:06,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390767.6666666667, ans=0.1 2024-09-16 04:20:07,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=390767.6666666667, ans=0.125 2024-09-16 04:20:15,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=390796.0, ans=0.125 2024-09-16 04:20:27,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=390824.3333333333, ans=0.125 2024-09-16 04:20:34,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.089e+02 2.219e+02 2.413e+02 3.077e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 04:20:59,014 INFO [train.py:1198] (1/2) Epoch 22, batch 3750, loss[loss=0.2072, ctc_loss=0.1389, cr_loss=0.3415, over 19899.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.16, cr_loss=0.3833, over 4079693.84 frames. ], batch size: 44, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:20:59,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=390881.0, ans=0.0 2024-09-16 04:21:00,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=390881.0, ans=0.2 2024-09-16 04:21:09,963 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:21:17,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=390909.3333333333, ans=0.0 2024-09-16 04:21:30,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-16 04:21:43,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=390966.0, ans=0.0 2024-09-16 04:21:44,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=390966.0, ans=0.0 2024-09-16 04:21:59,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-09-16 04:22:07,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=390994.3333333333, ans=0.035 2024-09-16 04:22:15,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390994.3333333333, ans=0.1 2024-09-16 04:22:18,445 INFO [train.py:1198] (1/2) Epoch 22, batch 3800, loss[loss=0.2066, ctc_loss=0.1382, cr_loss=0.3423, over 20961.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1601, cr_loss=0.3834, over 4084691.84 frames. ], batch size: 50, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:22:29,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391022.6666666667, ans=0.1 2024-09-16 04:22:44,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=391051.0, ans=0.025 2024-09-16 04:23:01,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-16 04:23:11,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.063e+02 2.175e+02 2.345e+02 2.892e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-16 04:23:31,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=391136.0, ans=0.0 2024-09-16 04:23:34,011 INFO [train.py:1198] (1/2) Epoch 22, batch 3850, loss[loss=0.2141, ctc_loss=0.142, cr_loss=0.3607, over 21071.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1596, cr_loss=0.3823, over 4083129.16 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:23:43,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391164.3333333333, ans=0.125 2024-09-16 04:23:55,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=391192.6666666667, ans=0.125 2024-09-16 04:24:10,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-16 04:24:13,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=391221.0, ans=0.125 2024-09-16 04:24:31,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=391249.3333333333, ans=0.125 2024-09-16 04:24:52,226 INFO [train.py:1198] (1/2) Epoch 22, batch 3900, loss[loss=0.2542, ctc_loss=0.1702, cr_loss=0.4201, over 21034.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1592, cr_loss=0.3813, over 4078260.94 frames. ], batch size: 63, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:25:45,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.101e+02 2.235e+02 2.416e+02 4.416e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-16 04:26:07,839 INFO [train.py:1198] (1/2) Epoch 22, batch 3950, loss[loss=0.2398, ctc_loss=0.1621, cr_loss=0.3886, over 21026.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1604, cr_loss=0.3825, over 4070741.10 frames. ], batch size: 62, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:26:40,112 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:27:11,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=391561.0, ans=0.0 2024-09-16 04:27:17,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=391561.0, ans=0.2 2024-09-16 04:27:23,321 INFO [train.py:1198] (1/2) Epoch 22, batch 4000, loss[loss=0.2696, ctc_loss=0.1856, cr_loss=0.4203, over 20644.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1608, cr_loss=0.3828, over 4053548.50 frames. ], batch size: 68, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:27:44,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391617.6666666667, ans=0.125 2024-09-16 04:28:19,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.124e+02 2.337e+02 2.533e+02 3.708e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-16 04:28:33,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=391702.6666666667, ans=0.2 2024-09-16 04:28:40,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391731.0, ans=0.125 2024-09-16 04:28:42,042 INFO [train.py:1198] (1/2) Epoch 22, batch 4050, loss[loss=0.2531, ctc_loss=0.177, cr_loss=0.3802, over 19311.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1601, cr_loss=0.3824, over 4063422.67 frames. ], batch size: 90, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:28:42,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=391731.0, ans=0.125 2024-09-16 04:28:48,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=391731.0, ans=0.125 2024-09-16 04:28:57,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=391759.3333333333, ans=0.0 2024-09-16 04:29:07,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=391759.3333333333, ans=0.125 2024-09-16 04:29:23,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=391787.6666666667, ans=0.125 2024-09-16 04:29:29,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=391816.0, ans=0.125 2024-09-16 04:29:36,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=391816.0, ans=0.125 2024-09-16 04:29:44,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391844.3333333333, ans=0.1 2024-09-16 04:30:00,564 INFO [train.py:1198] (1/2) Epoch 22, batch 4100, loss[loss=0.2661, ctc_loss=0.1809, cr_loss=0.426, over 18474.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1606, cr_loss=0.3838, over 4071125.43 frames. ], batch size: 108, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:30:39,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-16 04:30:53,212 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.155e+02 2.278e+02 2.484e+02 4.525e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 04:30:55,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=391957.6666666667, ans=0.125 2024-09-16 04:31:08,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=391986.0, ans=0.0 2024-09-16 04:31:10,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=391986.0, ans=0.025 2024-09-16 04:31:15,710 INFO [train.py:1198] (1/2) Epoch 22, batch 4150, loss[loss=0.2341, ctc_loss=0.1538, cr_loss=0.4012, over 20938.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1599, cr_loss=0.3825, over 4079143.20 frames. ], batch size: 60, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:31:20,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392014.3333333333, ans=0.1 2024-09-16 04:31:38,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392042.6666666667, ans=0.125 2024-09-16 04:31:41,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-16 04:31:46,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=22.5 2024-09-16 04:32:31,128 INFO [train.py:1198] (1/2) Epoch 22, batch 4200, loss[loss=0.213, ctc_loss=0.1409, cr_loss=0.3602, over 20959.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1596, cr_loss=0.3821, over 4076441.69 frames. ], batch size: 50, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:32:50,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-16 04:33:02,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=392212.6666666667, ans=0.035 2024-09-16 04:33:02,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=22.5 2024-09-16 04:33:20,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=392241.0, ans=0.125 2024-09-16 04:33:27,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.081e+02 2.204e+02 2.322e+02 3.465e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-16 04:33:30,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=392241.0, ans=0.0 2024-09-16 04:33:39,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=392269.3333333333, ans=0.04949747468305833 2024-09-16 04:33:41,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=392269.3333333333, ans=0.0 2024-09-16 04:33:47,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=392269.3333333333, ans=0.0 2024-09-16 04:33:50,094 INFO [train.py:1198] (1/2) Epoch 22, batch 4250, loss[loss=0.2601, ctc_loss=0.1774, cr_loss=0.4134, over 20968.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3807, over 4084067.28 frames. ], batch size: 64, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:34:02,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=392297.6666666667, ans=0.025 2024-09-16 04:34:18,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-09-16 04:34:20,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=392354.3333333333, ans=0.0 2024-09-16 04:34:20,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=392354.3333333333, ans=0.0 2024-09-16 04:34:22,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-16 04:34:35,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=392382.6666666667, ans=0.125 2024-09-16 04:34:43,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=392382.6666666667, ans=0.125 2024-09-16 04:34:49,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=392411.0, ans=0.125 2024-09-16 04:35:01,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=392411.0, ans=0.125 2024-09-16 04:35:09,029 INFO [train.py:1198] (1/2) Epoch 22, batch 4300, loss[loss=0.3354, ctc_loss=0.2432, cr_loss=0.4612, over 13979.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1594, cr_loss=0.3803, over 4068507.74 frames. ], batch size: 149, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:35:28,756 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:35:31,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=392467.6666666667, ans=0.0 2024-09-16 04:35:40,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=392496.0, ans=0.125 2024-09-16 04:36:01,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.147e+02 2.313e+02 2.532e+02 3.312e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-16 04:36:06,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=392524.3333333333, ans=0.0 2024-09-16 04:36:12,676 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:36:24,254 INFO [train.py:1198] (1/2) Epoch 22, batch 4350, loss[loss=0.27, ctc_loss=0.1866, cr_loss=0.4169, over 20997.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1593, cr_loss=0.3806, over 4071646.69 frames. ], batch size: 63, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:36:53,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-16 04:37:27,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=392694.3333333333, ans=0.125 2024-09-16 04:37:39,389 INFO [train.py:1198] (1/2) Epoch 22, batch 4400, loss[loss=0.2223, ctc_loss=0.1494, cr_loss=0.3642, over 20883.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1595, cr_loss=0.3803, over 4085379.61 frames. ], batch size: 57, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:37:59,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=392751.0, ans=0.0 2024-09-16 04:38:08,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=392779.3333333333, ans=0.0 2024-09-16 04:38:09,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=392779.3333333333, ans=0.2 2024-09-16 04:38:16,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-16 04:38:30,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=392807.6666666667, ans=0.035 2024-09-16 04:38:33,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.079e+02 2.214e+02 2.396e+02 2.802e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 04:38:33,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 04:38:49,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-16 04:38:56,686 INFO [train.py:1198] (1/2) Epoch 22, batch 4450, loss[loss=0.2412, ctc_loss=0.1608, cr_loss=0.402, over 20994.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1605, cr_loss=0.3821, over 4077818.34 frames. ], batch size: 52, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:39:06,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=392864.3333333333, ans=0.0 2024-09-16 04:39:24,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-16 04:39:33,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=392921.0, ans=0.125 2024-09-16 04:39:34,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=392921.0, ans=0.2 2024-09-16 04:39:54,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392949.3333333333, ans=0.1 2024-09-16 04:40:12,768 INFO [train.py:1198] (1/2) Epoch 22, batch 4500, loss[loss=0.2395, ctc_loss=0.1615, cr_loss=0.3903, over 21036.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1594, cr_loss=0.3803, over 4088591.12 frames. ], batch size: 62, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:40:28,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=393034.3333333333, ans=0.2 2024-09-16 04:40:50,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=393062.6666666667, ans=0.0 2024-09-16 04:41:09,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.066e+02 2.202e+02 2.310e+02 7.366e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-16 04:41:18,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=393119.3333333333, ans=0.04949747468305833 2024-09-16 04:41:31,055 INFO [train.py:1198] (1/2) Epoch 22, batch 4550, loss[loss=0.2577, ctc_loss=0.1794, cr_loss=0.3916, over 20658.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1601, cr_loss=0.3815, over 4088407.76 frames. ], batch size: 66, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:41:53,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=393176.0, ans=0.0 2024-09-16 04:41:55,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393176.0, ans=0.1 2024-09-16 04:42:33,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-09-16 04:42:46,291 INFO [train.py:1198] (1/2) Epoch 22, batch 4600, loss[loss=0.2508, ctc_loss=0.1658, cr_loss=0.425, over 21035.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1603, cr_loss=0.3828, over 4091279.69 frames. ], batch size: 62, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:42:57,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=393289.3333333333, ans=0.0 2024-09-16 04:43:12,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393317.6666666667, ans=0.1 2024-09-16 04:43:40,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.069e+02 2.229e+02 2.420e+02 3.166e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 04:44:00,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=393431.0, ans=0.0 2024-09-16 04:44:01,858 INFO [train.py:1198] (1/2) Epoch 22, batch 4650, loss[loss=0.2012, ctc_loss=0.1282, cr_loss=0.3648, over 20988.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1591, cr_loss=0.3815, over 4098125.35 frames. ], batch size: 51, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:44:16,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=393459.3333333333, ans=0.0 2024-09-16 04:44:41,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-16 04:45:20,468 INFO [train.py:1198] (1/2) Epoch 22, batch 4700, loss[loss=0.2028, ctc_loss=0.1325, cr_loss=0.3514, over 20957.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3806, over 4094388.15 frames. ], batch size: 50, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:45:34,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=393601.0, ans=0.0 2024-09-16 04:45:40,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=393601.0, ans=0.125 2024-09-16 04:46:10,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=393657.6666666667, ans=0.125 2024-09-16 04:46:14,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.113e+02 2.231e+02 2.424e+02 3.101e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 04:46:38,925 INFO [train.py:1198] (1/2) Epoch 22, batch 4750, loss[loss=0.2297, ctc_loss=0.1557, cr_loss=0.3697, over 21029.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1595, cr_loss=0.3812, over 4083785.94 frames. ], batch size: 63, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:46:51,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393714.3333333333, ans=0.125 2024-09-16 04:47:08,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=393771.0, ans=0.0 2024-09-16 04:47:18,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=393771.0, ans=0.125 2024-09-16 04:47:38,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=393827.6666666667, ans=0.125 2024-09-16 04:47:46,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393827.6666666667, ans=0.1 2024-09-16 04:47:54,976 INFO [train.py:1198] (1/2) Epoch 22, batch 4800, loss[loss=0.2124, ctc_loss=0.1402, cr_loss=0.3611, over 19864.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.159, cr_loss=0.3811, over 4097353.71 frames. ], batch size: 44, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:47:57,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-16 04:48:16,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=393884.3333333333, ans=0.125 2024-09-16 04:48:49,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.072e+02 2.229e+02 2.410e+02 4.609e+02, threshold=4.458e+02, percent-clipped=1.0 2024-09-16 04:49:10,475 INFO [train.py:1198] (1/2) Epoch 22, batch 4850, loss[loss=0.1989, ctc_loss=0.1306, cr_loss=0.3414, over 21006.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1591, cr_loss=0.3811, over 4090866.81 frames. ], batch size: 48, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:49:31,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=394026.0, ans=0.5 2024-09-16 04:49:54,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=394082.6666666667, ans=0.2 2024-09-16 04:50:06,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=394082.6666666667, ans=0.025 2024-09-16 04:50:19,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394111.0, ans=0.1 2024-09-16 04:50:28,136 INFO [train.py:1198] (1/2) Epoch 22, batch 4900, loss[loss=0.2476, ctc_loss=0.166, cr_loss=0.4079, over 20965.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1604, cr_loss=0.3831, over 4083218.01 frames. ], batch size: 64, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:50:36,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.21 vs. limit=10.0 2024-09-16 04:50:52,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=394167.6666666667, ans=0.0 2024-09-16 04:51:09,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=394196.0, ans=0.0 2024-09-16 04:51:18,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=394224.3333333333, ans=0.2 2024-09-16 04:51:20,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=394224.3333333333, ans=0.5 2024-09-16 04:51:23,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.117e+02 2.269e+02 2.499e+02 3.241e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 04:51:43,334 INFO [train.py:1198] (1/2) Epoch 22, batch 4950, loss[loss=0.2742, ctc_loss=0.1892, cr_loss=0.4253, over 18051.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1597, cr_loss=0.3822, over 4097365.71 frames. ], batch size: 108, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:51:46,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=394281.0, ans=0.07 2024-09-16 04:51:48,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=394281.0, ans=0.2 2024-09-16 04:52:00,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=394309.3333333333, ans=0.125 2024-09-16 04:52:04,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=394309.3333333333, ans=0.125 2024-09-16 04:52:18,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=394337.6666666667, ans=0.125 2024-09-16 04:52:31,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=22.5 2024-09-16 04:52:57,355 INFO [train.py:1198] (1/2) Epoch 22, batch 5000, loss[loss=0.2362, ctc_loss=0.1604, cr_loss=0.3787, over 21028.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1594, cr_loss=0.3817, over 4096124.74 frames. ], batch size: 62, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:53:06,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=394422.6666666667, ans=0.0 2024-09-16 04:53:15,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=394451.0, ans=0.125 2024-09-16 04:53:18,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394451.0, ans=0.125 2024-09-16 04:53:20,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=15.0 2024-09-16 04:53:24,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=394451.0, ans=0.0 2024-09-16 04:53:25,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=394479.3333333333, ans=0.125 2024-09-16 04:53:49,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=394507.6666666667, ans=0.125 2024-09-16 04:53:55,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.090e+02 2.194e+02 2.309e+02 3.699e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-16 04:54:09,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=394536.0, ans=0.0 2024-09-16 04:54:14,746 INFO [train.py:1198] (1/2) Epoch 22, batch 5050, loss[loss=0.2852, ctc_loss=0.194, cr_loss=0.4559, over 18359.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1603, cr_loss=0.3835, over 4090222.58 frames. ], batch size: 108, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:54:27,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394564.3333333333, ans=0.125 2024-09-16 04:54:28,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=394592.6666666667, ans=0.0 2024-09-16 04:54:29,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=394592.6666666667, ans=0.0 2024-09-16 04:54:46,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=394621.0, ans=0.125 2024-09-16 04:54:59,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394649.3333333333, ans=0.125 2024-09-16 04:55:05,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394649.3333333333, ans=0.0 2024-09-16 04:55:29,224 INFO [train.py:1198] (1/2) Epoch 22, batch 5100, loss[loss=0.2226, ctc_loss=0.1468, cr_loss=0.3794, over 20799.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.16, cr_loss=0.3824, over 4075626.83 frames. ], batch size: 53, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:56:24,313 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.134e+02 2.275e+02 2.438e+02 4.042e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-16 04:56:35,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=394819.3333333333, ans=0.125 2024-09-16 04:56:43,732 INFO [train.py:1198] (1/2) Epoch 22, batch 5150, loss[loss=0.2478, ctc_loss=0.166, cr_loss=0.4092, over 21022.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1603, cr_loss=0.3831, over 4071337.11 frames. ], batch size: 63, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:56:54,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=394847.6666666667, ans=0.2 2024-09-16 04:56:57,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2024-09-16 04:57:11,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=394904.3333333333, ans=0.0 2024-09-16 04:57:16,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394904.3333333333, ans=0.1 2024-09-16 04:57:41,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=394961.0, ans=0.125 2024-09-16 04:57:57,831 INFO [train.py:1198] (1/2) Epoch 22, batch 5200, loss[loss=0.237, ctc_loss=0.1565, cr_loss=0.4026, over 20796.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1601, cr_loss=0.3827, over 4069771.16 frames. ], batch size: 53, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:58:19,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=395017.6666666667, ans=0.0 2024-09-16 04:58:39,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=395046.0, ans=8.0 2024-09-16 04:58:53,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.099e+02 2.191e+02 2.359e+02 3.179e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-16 04:58:53,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2024-09-16 04:59:12,571 INFO [train.py:1198] (1/2) Epoch 22, batch 5250, loss[loss=0.2278, ctc_loss=0.1518, cr_loss=0.3796, over 21039.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1591, cr_loss=0.3812, over 4073267.48 frames. ], batch size: 62, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:59:20,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=395131.0, ans=0.2 2024-09-16 04:59:25,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=395131.0, ans=0.2 2024-09-16 04:59:42,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2024-09-16 05:00:29,387 INFO [train.py:1198] (1/2) Epoch 22, batch 5300, loss[loss=0.2418, ctc_loss=0.163, cr_loss=0.3943, over 20654.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3798, over 4081175.81 frames. ], batch size: 71, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:00:29,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-16 05:00:46,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=395301.0, ans=0.125 2024-09-16 05:00:57,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=395329.3333333333, ans=0.2 2024-09-16 05:01:15,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395357.6666666667, ans=0.1 2024-09-16 05:01:21,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395357.6666666667, ans=0.125 2024-09-16 05:01:23,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=22.5 2024-09-16 05:01:23,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.048e+02 2.194e+02 2.320e+02 3.593e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-16 05:01:30,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-16 05:01:43,187 INFO [train.py:1198] (1/2) Epoch 22, batch 5350, loss[loss=0.2731, ctc_loss=0.1952, cr_loss=0.3896, over 14295.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3805, over 4083240.53 frames. ], batch size: 149, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:02:08,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=395442.6666666667, ans=0.0 2024-09-16 05:02:39,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395499.3333333333, ans=0.1 2024-09-16 05:02:57,269 INFO [train.py:1198] (1/2) Epoch 22, batch 5400, loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3704, over 21070.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1594, cr_loss=0.3812, over 4074954.76 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:03:06,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=395556.0, ans=0.0 2024-09-16 05:03:15,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=395584.3333333333, ans=0.125 2024-09-16 05:03:16,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=395584.3333333333, ans=0.0 2024-09-16 05:03:18,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=395584.3333333333, ans=0.2 2024-09-16 05:03:26,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=395584.3333333333, ans=0.025 2024-09-16 05:03:47,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395641.0, ans=0.125 2024-09-16 05:03:54,461 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.069e+02 2.240e+02 2.415e+02 3.523e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 05:03:56,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395641.0, ans=0.125 2024-09-16 05:04:06,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=395669.3333333333, ans=0.125 2024-09-16 05:04:13,815 INFO [train.py:1198] (1/2) Epoch 22, batch 5450, loss[loss=0.2478, ctc_loss=0.1705, cr_loss=0.3864, over 21081.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1591, cr_loss=0.3802, over 4078665.94 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:04:21,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-09-16 05:04:35,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395726.0, ans=0.1 2024-09-16 05:04:57,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=395782.6666666667, ans=0.5 2024-09-16 05:05:19,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-09-16 05:05:28,120 INFO [train.py:1198] (1/2) Epoch 22, batch 5500, loss[loss=0.2569, ctc_loss=0.1751, cr_loss=0.4093, over 20631.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1595, cr_loss=0.3813, over 4090124.67 frames. ], batch size: 71, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:05:37,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=395839.3333333333, ans=0.125 2024-09-16 05:06:13,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=395924.3333333333, ans=0.0 2024-09-16 05:06:23,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.061e+02 2.214e+02 2.357e+02 4.725e+02, threshold=4.428e+02, percent-clipped=1.0 2024-09-16 05:06:42,624 INFO [train.py:1198] (1/2) Epoch 22, batch 5550, loss[loss=0.246, ctc_loss=0.1655, cr_loss=0.4026, over 21040.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1579, cr_loss=0.3793, over 4100443.80 frames. ], batch size: 62, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:06:50,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=395981.0, ans=0.1 2024-09-16 05:07:15,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2024-09-16 05:07:25,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-16 05:07:31,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=396066.0, ans=0.125 2024-09-16 05:07:36,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-16 05:07:56,463 INFO [train.py:1198] (1/2) Epoch 22, batch 5600, loss[loss=0.2317, ctc_loss=0.1546, cr_loss=0.3855, over 21073.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3785, over 4108394.86 frames. ], batch size: 59, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:08:03,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=396122.6666666667, ans=0.0 2024-09-16 05:08:48,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=396207.6666666667, ans=0.2 2024-09-16 05:08:51,038 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.075e+02 2.188e+02 2.369e+02 3.808e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-16 05:09:03,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=396236.0, ans=0.07 2024-09-16 05:09:10,485 INFO [train.py:1198] (1/2) Epoch 22, batch 5650, loss[loss=0.2267, ctc_loss=0.1512, cr_loss=0.3775, over 20804.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3783, over 4110779.25 frames. ], batch size: 53, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:10:15,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-16 05:10:19,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=396377.6666666667, ans=0.125 2024-09-16 05:10:21,166 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:10:24,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=396377.6666666667, ans=0.0 2024-09-16 05:10:26,696 INFO [train.py:1198] (1/2) Epoch 22, batch 5700, loss[loss=0.2042, ctc_loss=0.1405, cr_loss=0.3187, over 20972.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.3781, over 4105111.12 frames. ], batch size: 51, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:10:38,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396406.0, ans=0.0 2024-09-16 05:10:45,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-16 05:11:11,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=396491.0, ans=0.5 2024-09-16 05:11:21,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.065e+02 2.212e+02 2.371e+02 4.969e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 05:11:35,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=396519.3333333333, ans=0.0 2024-09-16 05:11:40,868 INFO [train.py:1198] (1/2) Epoch 22, batch 5750, loss[loss=0.1859, ctc_loss=0.1226, cr_loss=0.3163, over 20997.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3787, over 4089955.48 frames. ], batch size: 48, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:11:51,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396547.6666666667, ans=0.125 2024-09-16 05:12:16,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396604.3333333333, ans=0.0 2024-09-16 05:12:28,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=396632.6666666667, ans=0.125 2024-09-16 05:12:42,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=396661.0, ans=0.0 2024-09-16 05:12:58,165 INFO [train.py:1198] (1/2) Epoch 22, batch 5800, loss[loss=0.2691, ctc_loss=0.1871, cr_loss=0.4098, over 19448.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3801, over 4088931.04 frames. ], batch size: 90, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:13:05,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396689.3333333333, ans=0.125 2024-09-16 05:13:53,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.119e+02 2.223e+02 2.388e+02 4.323e+02, threshold=4.445e+02, percent-clipped=1.0 2024-09-16 05:13:57,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=396802.6666666667, ans=0.125 2024-09-16 05:14:13,396 INFO [train.py:1198] (1/2) Epoch 22, batch 5850, loss[loss=0.2123, ctc_loss=0.1391, cr_loss=0.366, over 20784.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1591, cr_loss=0.3808, over 4088568.39 frames. ], batch size: 53, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:14:27,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=22.5 2024-09-16 05:14:37,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=396859.3333333333, ans=0.0 2024-09-16 05:15:03,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=396916.0, ans=0.09899494936611666 2024-09-16 05:15:11,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=396944.3333333333, ans=0.0 2024-09-16 05:15:13,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-16 05:15:25,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396944.3333333333, ans=0.1 2024-09-16 05:15:28,113 INFO [train.py:1198] (1/2) Epoch 22, batch 5900, loss[loss=0.2211, ctc_loss=0.1504, cr_loss=0.3538, over 21051.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1586, cr_loss=0.3807, over 4092165.33 frames. ], batch size: 62, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:15:46,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=397001.0, ans=0.0 2024-09-16 05:16:23,261 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.101e+02 2.202e+02 2.308e+02 3.334e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 05:16:25,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-16 05:16:32,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=397086.0, ans=0.125 2024-09-16 05:16:42,600 INFO [train.py:1198] (1/2) Epoch 22, batch 5950, loss[loss=0.2644, ctc_loss=0.1828, cr_loss=0.4083, over 18350.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.38, over 4091119.03 frames. ], batch size: 108, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:16:44,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=397114.3333333333, ans=0.125 2024-09-16 05:16:50,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=397114.3333333333, ans=0.0 2024-09-16 05:16:59,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=12.0 2024-09-16 05:17:20,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=397171.0, ans=0.125 2024-09-16 05:17:23,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=397171.0, ans=0.0 2024-09-16 05:17:57,325 INFO [train.py:1198] (1/2) Epoch 22, batch 6000, loss[loss=0.2245, ctc_loss=0.1507, cr_loss=0.3692, over 20976.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3794, over 4090633.68 frames. ], batch size: 58, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:17:57,325 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 05:18:07,089 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9467, 5.6389, 5.3862, 5.0978], device='cuda:1') 2024-09-16 05:18:14,184 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1446, 4.1382, 4.0456, 3.6084], device='cuda:1') 2024-09-16 05:18:16,023 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6143, 4.3834, 4.1734, 3.9308], device='cuda:1') 2024-09-16 05:18:19,331 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.04273, ctc_loss=0.04273, cr_loss=1.154e-14, over 944034.00 frames. 2024-09-16 05:18:19,332 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 05:18:21,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=397256.0, ans=0.0 2024-09-16 05:18:37,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397284.3333333333, ans=0.1 2024-09-16 05:18:56,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=397312.6666666667, ans=0.2 2024-09-16 05:18:59,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397312.6666666667, ans=0.1 2024-09-16 05:19:11,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=397341.0, ans=0.125 2024-09-16 05:19:11,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-16 05:19:13,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.118e+02 2.278e+02 2.424e+02 4.575e+02, threshold=4.557e+02, percent-clipped=1.0 2024-09-16 05:19:29,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-16 05:19:33,235 INFO [train.py:1198] (1/2) Epoch 22, batch 6050, loss[loss=0.2529, ctc_loss=0.1739, cr_loss=0.3947, over 20669.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1588, cr_loss=0.3805, over 4090343.36 frames. ], batch size: 68, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:19:53,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=397426.0, ans=0.125 2024-09-16 05:20:28,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397482.6666666667, ans=0.1 2024-09-16 05:20:46,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=397511.0, ans=0.2 2024-09-16 05:20:46,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=397511.0, ans=0.0 2024-09-16 05:20:49,198 INFO [train.py:1198] (1/2) Epoch 22, batch 6100, loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3744, over 21056.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1594, cr_loss=0.3816, over 4095409.11 frames. ], batch size: 59, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:21:44,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.115e+02 2.288e+02 2.485e+02 5.440e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-16 05:21:50,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=397652.6666666667, ans=0.125 2024-09-16 05:22:00,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397652.6666666667, ans=0.125 2024-09-16 05:22:03,537 INFO [train.py:1198] (1/2) Epoch 22, batch 6150, loss[loss=0.2144, ctc_loss=0.1435, cr_loss=0.3546, over 19867.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.159, cr_loss=0.3797, over 4072979.70 frames. ], batch size: 44, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:22:10,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-16 05:22:11,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=397681.0, ans=0.05 2024-09-16 05:22:34,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=397737.6666666667, ans=0.125 2024-09-16 05:22:55,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=397766.0, ans=0.0 2024-09-16 05:23:09,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=397794.3333333333, ans=0.0 2024-09-16 05:23:15,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=397794.3333333333, ans=0.0 2024-09-16 05:23:17,978 INFO [train.py:1198] (1/2) Epoch 22, batch 6200, loss[loss=0.2321, ctc_loss=0.1583, cr_loss=0.369, over 20104.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1587, cr_loss=0.3783, over 4051634.20 frames. ], batch size: 80, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:23:22,855 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:24:12,404 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.126e+02 2.251e+02 2.431e+02 4.639e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-16 05:24:31,326 INFO [train.py:1198] (1/2) Epoch 22, batch 6250, loss[loss=0.2195, ctc_loss=0.1463, cr_loss=0.3661, over 20816.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1604, cr_loss=0.381, over 4031734.43 frames. ], batch size: 53, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:24:49,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397992.6666666667, ans=0.1 2024-09-16 05:25:11,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=398021.0, ans=0.125 2024-09-16 05:25:30,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=398077.6666666667, ans=0.0 2024-09-16 05:25:41,030 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:25:41,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-16 05:25:44,876 INFO [train.py:1198] (1/2) Epoch 22, batch 6300, loss[loss=0.2571, ctc_loss=0.1751, cr_loss=0.4102, over 20020.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1628, cr_loss=0.3838, over 3995637.00 frames. ], batch size: 80, lr: 3.74e-03, grad_scale: 32.0 2024-09-16 05:26:03,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=398134.3333333333, ans=0.025 2024-09-16 05:26:13,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=398162.6666666667, ans=0.125 2024-09-16 05:26:15,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-16 05:26:23,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=398162.6666666667, ans=0.125 2024-09-16 05:26:39,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.118e+02 2.336e+02 2.606e+02 6.355e+02, threshold=4.673e+02, percent-clipped=1.0 2024-09-16 05:26:48,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-16 05:26:49,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398219.3333333333, ans=0.1 2024-09-16 05:26:58,204 INFO [train.py:1198] (1/2) Epoch 22, batch 6350, loss[loss=0.2579, ctc_loss=0.1852, cr_loss=0.3637, over 14964.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1671, cr_loss=0.3872, over 3881796.53 frames. ], batch size: 149, lr: 3.74e-03, grad_scale: 32.0 2024-09-16 05:27:05,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=12.0 2024-09-16 05:27:18,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=398276.0, ans=0.04949747468305833 2024-09-16 05:27:54,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-09-16 05:27:54,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-09-16 05:28:45,753 INFO [train.py:1198] (1/2) Epoch 23, batch 0, loss[loss=0.2745, ctc_loss=0.1878, cr_loss=0.4334, over 20218.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1878, cr_loss=0.4334, over 20218.00 frames. ], batch size: 74, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:28:45,754 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 05:29:06,945 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.04295, ctc_loss=0.04295, cr_loss=1.117e-14, over 944034.00 frames. 2024-09-16 05:29:06,946 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 05:29:31,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=398392.1666666667, ans=0.125 2024-09-16 05:30:06,480 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:30:16,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.227e+02 2.595e+02 2.855e+02 4.695e+02, threshold=5.191e+02, percent-clipped=1.0 2024-09-16 05:30:22,598 INFO [train.py:1198] (1/2) Epoch 23, batch 50, loss[loss=0.2287, ctc_loss=0.1575, cr_loss=0.3559, over 20882.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1599, cr_loss=0.3844, over 921247.11 frames. ], batch size: 54, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:30:26,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-16 05:30:36,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398533.8333333333, ans=0.125 2024-09-16 05:30:44,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=398533.8333333333, ans=0.125 2024-09-16 05:30:47,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=398533.8333333333, ans=0.025 2024-09-16 05:30:54,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=398562.1666666667, ans=0.125 2024-09-16 05:31:25,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=398618.8333333333, ans=0.04949747468305833 2024-09-16 05:31:38,501 INFO [train.py:1198] (1/2) Epoch 23, batch 100, loss[loss=0.2731, ctc_loss=0.1904, cr_loss=0.4135, over 19417.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1598, cr_loss=0.3842, over 1627995.96 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:31:40,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=398647.1666666667, ans=0.125 2024-09-16 05:31:50,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=398647.1666666667, ans=0.05 2024-09-16 05:32:47,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.069e+02 2.165e+02 2.353e+02 4.020e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-16 05:32:53,778 INFO [train.py:1198] (1/2) Epoch 23, batch 150, loss[loss=0.2093, ctc_loss=0.1413, cr_loss=0.34, over 20967.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.159, cr_loss=0.3808, over 2157668.26 frames. ], batch size: 55, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:33:22,520 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:33:48,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=398873.8333333333, ans=0.125 2024-09-16 05:34:11,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=398902.1666666667, ans=0.0 2024-09-16 05:34:13,689 INFO [train.py:1198] (1/2) Epoch 23, batch 200, loss[loss=0.2131, ctc_loss=0.141, cr_loss=0.3606, over 20963.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1587, cr_loss=0.3805, over 2593131.95 frames. ], batch size: 51, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:34:50,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-16 05:35:06,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399015.5, ans=0.125 2024-09-16 05:35:17,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399043.8333333333, ans=0.1 2024-09-16 05:35:26,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.121e+02 2.238e+02 2.415e+02 3.835e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-16 05:35:29,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.820e-02 2024-09-16 05:35:32,221 INFO [train.py:1198] (1/2) Epoch 23, batch 250, loss[loss=0.2602, ctc_loss=0.1762, cr_loss=0.4201, over 19640.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3802, over 2919954.12 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:35:33,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.77 vs. limit=22.5 2024-09-16 05:35:52,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.67 vs. limit=10.0 2024-09-16 05:36:18,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=399157.1666666667, ans=0.2 2024-09-16 05:36:48,154 INFO [train.py:1198] (1/2) Epoch 23, batch 300, loss[loss=0.2246, ctc_loss=0.1542, cr_loss=0.3519, over 20961.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.158, cr_loss=0.3803, over 3187924.89 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:36:48,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-16 05:37:21,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=399270.5, ans=0.125 2024-09-16 05:37:36,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=399298.8333333333, ans=10.0 2024-09-16 05:37:54,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=399327.1666666667, ans=0.0 2024-09-16 05:37:54,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=399327.1666666667, ans=0.125 2024-09-16 05:37:58,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.116e+02 2.237e+02 2.381e+02 3.373e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 05:38:04,329 INFO [train.py:1198] (1/2) Epoch 23, batch 350, loss[loss=0.1902, ctc_loss=0.1241, cr_loss=0.3304, over 20928.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3792, over 3385996.33 frames. ], batch size: 49, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:38:04,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399355.5, ans=0.1 2024-09-16 05:38:37,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-16 05:38:55,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-16 05:38:57,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=399440.5, ans=0.125 2024-09-16 05:39:07,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-16 05:39:23,301 INFO [train.py:1198] (1/2) Epoch 23, batch 400, loss[loss=0.2076, ctc_loss=0.1386, cr_loss=0.345, over 20317.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3794, over 3546937.23 frames. ], batch size: 45, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:39:29,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=399497.1666666667, ans=0.125 2024-09-16 05:39:35,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399497.1666666667, ans=0.125 2024-09-16 05:39:40,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-16 05:39:55,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=399553.8333333333, ans=0.0 2024-09-16 05:40:31,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.137e+02 2.279e+02 2.469e+02 3.568e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-16 05:40:40,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399638.8333333333, ans=0.125 2024-09-16 05:40:41,307 INFO [train.py:1198] (1/2) Epoch 23, batch 450, loss[loss=0.2383, ctc_loss=0.1621, cr_loss=0.3811, over 21029.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1584, cr_loss=0.3802, over 3669604.97 frames. ], batch size: 63, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:40:46,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-16 05:41:38,000 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:41:51,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=399752.1666666667, ans=0.0 2024-09-16 05:41:57,297 INFO [train.py:1198] (1/2) Epoch 23, batch 500, loss[loss=0.2616, ctc_loss=0.1759, cr_loss=0.4284, over 20838.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1576, cr_loss=0.3792, over 3774292.90 frames. ], batch size: 65, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:42:24,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399808.8333333333, ans=0.1 2024-09-16 05:42:50,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=399865.5, ans=0.025 2024-09-16 05:43:06,088 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.053e+02 2.177e+02 2.332e+02 4.710e+02, threshold=4.355e+02, percent-clipped=1.0 2024-09-16 05:43:06,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=399893.8333333333, ans=0.0 2024-09-16 05:43:11,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-16 05:43:12,209 INFO [train.py:1198] (1/2) Epoch 23, batch 550, loss[loss=0.2637, ctc_loss=0.1811, cr_loss=0.4128, over 19682.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3798, over 3846034.02 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:43:26,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=399950.5, ans=0.125 2024-09-16 05:43:40,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-09-16 05:43:45,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=399978.8333333333, ans=0.0 2024-09-16 05:44:05,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=12.0 2024-09-16 05:44:08,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400007.1666666667, ans=0.125 2024-09-16 05:44:11,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=400035.5, ans=0.2 2024-09-16 05:44:27,136 INFO [train.py:1198] (1/2) Epoch 23, batch 600, loss[loss=0.2133, ctc_loss=0.1437, cr_loss=0.3482, over 20778.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1583, cr_loss=0.3807, over 3906833.15 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:44:33,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=400063.8333333333, ans=0.125 2024-09-16 05:44:49,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=400092.1666666667, ans=0.125 2024-09-16 05:45:05,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=400120.5, ans=0.125 2024-09-16 05:45:40,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.087e+02 2.248e+02 2.403e+02 3.401e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 05:45:45,424 INFO [train.py:1198] (1/2) Epoch 23, batch 650, loss[loss=0.1674, ctc_loss=0.1112, cr_loss=0.281, over 20949.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1573, cr_loss=0.3801, over 3961859.11 frames. ], batch size: 49, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:46:51,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400318.8333333333, ans=0.1 2024-09-16 05:47:04,633 INFO [train.py:1198] (1/2) Epoch 23, batch 700, loss[loss=0.2544, ctc_loss=0.1731, cr_loss=0.4065, over 20970.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1575, cr_loss=0.38, over 3990266.69 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:47:08,092 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:47:17,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400347.1666666667, ans=0.1 2024-09-16 05:47:21,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=400375.5, ans=0.2 2024-09-16 05:47:32,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-16 05:47:54,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2024-09-16 05:48:03,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=400460.5, ans=0.1 2024-09-16 05:48:15,936 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.106e+02 2.327e+02 2.598e+02 4.216e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-16 05:48:20,512 INFO [train.py:1198] (1/2) Epoch 23, batch 750, loss[loss=0.2113, ctc_loss=0.1422, cr_loss=0.3455, over 21006.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.3782, over 4009062.81 frames. ], batch size: 52, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:48:29,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=400488.8333333333, ans=0.07 2024-09-16 05:48:58,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=400545.5, ans=0.0 2024-09-16 05:49:00,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-16 05:49:00,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-09-16 05:49:04,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=400573.8333333333, ans=0.025 2024-09-16 05:49:36,008 INFO [train.py:1198] (1/2) Epoch 23, batch 800, loss[loss=0.1972, ctc_loss=0.1334, cr_loss=0.3189, over 20963.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1576, cr_loss=0.3788, over 4015147.75 frames. ], batch size: 51, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:49:36,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=400630.5, ans=0.125 2024-09-16 05:49:39,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400630.5, ans=0.125 2024-09-16 05:50:00,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=400658.8333333333, ans=0.125 2024-09-16 05:50:50,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.111e+02 2.250e+02 2.471e+02 5.388e+02, threshold=4.500e+02, percent-clipped=1.0 2024-09-16 05:50:54,915 INFO [train.py:1198] (1/2) Epoch 23, batch 850, loss[loss=0.2654, ctc_loss=0.1758, cr_loss=0.4482, over 21024.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1583, cr_loss=0.3807, over 4034058.38 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:50:58,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400772.1666666667, ans=0.1 2024-09-16 05:51:05,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=400772.1666666667, ans=0.125 2024-09-16 05:51:13,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=400800.5, ans=0.125 2024-09-16 05:51:23,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=400828.8333333333, ans=0.0 2024-09-16 05:51:39,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=400857.1666666667, ans=0.125 2024-09-16 05:52:01,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=400885.5, ans=0.2 2024-09-16 05:52:13,018 INFO [train.py:1198] (1/2) Epoch 23, batch 900, loss[loss=0.218, ctc_loss=0.1458, cr_loss=0.3611, over 20877.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1595, cr_loss=0.382, over 4035599.74 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:52:14,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=400913.8333333333, ans=0.125 2024-09-16 05:52:32,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=400942.1666666667, ans=0.2 2024-09-16 05:52:52,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=400970.5, ans=0.0 2024-09-16 05:52:53,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400970.5, ans=0.1 2024-09-16 05:53:22,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=401027.1666666667, ans=0.125 2024-09-16 05:53:23,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.095e+02 2.213e+02 2.436e+02 3.425e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 05:53:28,447 INFO [train.py:1198] (1/2) Epoch 23, batch 950, loss[loss=0.212, ctc_loss=0.1408, cr_loss=0.3563, over 20960.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1581, cr_loss=0.3802, over 4058542.67 frames. ], batch size: 52, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:53:28,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=401055.5, ans=0.125 2024-09-16 05:53:52,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401083.8333333333, ans=0.125 2024-09-16 05:54:43,177 INFO [train.py:1198] (1/2) Epoch 23, batch 1000, loss[loss=0.2408, ctc_loss=0.1638, cr_loss=0.3852, over 20888.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1584, cr_loss=0.3808, over 4068675.67 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:54:46,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=401197.1666666667, ans=0.05 2024-09-16 05:55:09,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=401225.5, ans=0.125 2024-09-16 05:55:30,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=401282.1666666667, ans=0.0 2024-09-16 05:55:49,955 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:55:49,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=401310.5, ans=0.2 2024-09-16 05:55:53,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.067e+02 2.185e+02 2.324e+02 4.884e+02, threshold=4.370e+02, percent-clipped=2.0 2024-09-16 05:55:55,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=401310.5, ans=0.125 2024-09-16 05:55:58,434 INFO [train.py:1198] (1/2) Epoch 23, batch 1050, loss[loss=0.2193, ctc_loss=0.1471, cr_loss=0.3614, over 20860.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3806, over 4076723.81 frames. ], batch size: 54, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:55:58,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401338.8333333333, ans=0.1 2024-09-16 05:56:12,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=401367.1666666667, ans=0.0 2024-09-16 05:56:34,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2024-09-16 05:56:37,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=401395.5, ans=0.125 2024-09-16 05:56:51,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=401423.8333333333, ans=0.025 2024-09-16 05:56:53,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-09-16 05:57:11,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=12.0 2024-09-16 05:57:16,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=12.0 2024-09-16 05:57:17,089 INFO [train.py:1198] (1/2) Epoch 23, batch 1100, loss[loss=0.2227, ctc_loss=0.15, cr_loss=0.3635, over 20997.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.158, cr_loss=0.3791, over 4069150.83 frames. ], batch size: 52, lr: 3.65e-03, grad_scale: 16.0 2024-09-16 05:57:48,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=401508.8333333333, ans=0.0 2024-09-16 05:57:51,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=401537.1666666667, ans=0.125 2024-09-16 05:57:51,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=401537.1666666667, ans=0.125 2024-09-16 05:58:00,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=401537.1666666667, ans=0.125 2024-09-16 05:58:14,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=401565.5, ans=0.125 2024-09-16 05:58:19,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401593.8333333333, ans=0.1 2024-09-16 05:58:32,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.065e+02 2.227e+02 2.358e+02 2.746e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-16 05:58:34,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=401622.1666666667, ans=0.125 2024-09-16 05:58:35,992 INFO [train.py:1198] (1/2) Epoch 23, batch 1150, loss[loss=0.224, ctc_loss=0.1508, cr_loss=0.366, over 21040.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1579, cr_loss=0.3784, over 4059092.59 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 16.0 2024-09-16 05:58:36,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=401622.1666666667, ans=0.0 2024-09-16 05:59:02,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=401650.5, ans=0.2 2024-09-16 05:59:02,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=401650.5, ans=0.125 2024-09-16 05:59:23,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=401707.1666666667, ans=0.0 2024-09-16 05:59:34,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=401735.5, ans=0.125 2024-09-16 05:59:51,433 INFO [train.py:1198] (1/2) Epoch 23, batch 1200, loss[loss=0.2072, ctc_loss=0.1381, cr_loss=0.3454, over 21086.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1576, cr_loss=0.3782, over 4070449.12 frames. ], batch size: 53, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:00:06,409 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:00:06,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=401792.1666666667, ans=0.0 2024-09-16 06:00:09,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=401792.1666666667, ans=0.0 2024-09-16 06:00:18,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401792.1666666667, ans=0.125 2024-09-16 06:00:36,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-16 06:01:03,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.076e+02 2.193e+02 2.323e+02 8.340e+02, threshold=4.385e+02, percent-clipped=1.0 2024-09-16 06:01:06,095 INFO [train.py:1198] (1/2) Epoch 23, batch 1250, loss[loss=0.2097, ctc_loss=0.1412, cr_loss=0.3425, over 20980.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3785, over 4084238.10 frames. ], batch size: 55, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:01:24,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=401933.8333333333, ans=0.0 2024-09-16 06:01:28,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-16 06:02:15,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=402018.8333333333, ans=0.125 2024-09-16 06:02:23,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=402047.1666666667, ans=0.2 2024-09-16 06:02:24,195 INFO [train.py:1198] (1/2) Epoch 23, batch 1300, loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3778, over 20950.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1574, cr_loss=0.3777, over 4089093.81 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:02:53,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402103.8333333333, ans=0.125 2024-09-16 06:02:58,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-16 06:03:09,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=402132.1666666667, ans=0.0 2024-09-16 06:03:40,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.101e+02 2.238e+02 2.376e+02 3.312e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 06:03:43,360 INFO [train.py:1198] (1/2) Epoch 23, batch 1350, loss[loss=0.2607, ctc_loss=0.1762, cr_loss=0.423, over 20852.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1573, cr_loss=0.3778, over 4090807.53 frames. ], batch size: 65, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:04:04,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=402217.1666666667, ans=0.125 2024-09-16 06:04:11,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-09-16 06:04:58,914 INFO [train.py:1198] (1/2) Epoch 23, batch 1400, loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3689, over 20911.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3787, over 4088535.01 frames. ], batch size: 54, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:05:03,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=402330.5, ans=0.0 2024-09-16 06:05:26,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-16 06:05:35,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=402387.1666666667, ans=0.025 2024-09-16 06:05:47,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=402415.5, ans=0.125 2024-09-16 06:05:49,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=402415.5, ans=0.2 2024-09-16 06:06:11,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.121e+02 2.285e+02 2.456e+02 3.388e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-16 06:06:11,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=402443.8333333333, ans=0.125 2024-09-16 06:06:14,093 INFO [train.py:1198] (1/2) Epoch 23, batch 1450, loss[loss=0.2284, ctc_loss=0.1516, cr_loss=0.384, over 20941.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3789, over 4093666.64 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:06:20,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=402472.1666666667, ans=0.125 2024-09-16 06:06:54,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=402528.8333333333, ans=0.04949747468305833 2024-09-16 06:06:56,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=402528.8333333333, ans=0.125 2024-09-16 06:07:01,059 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:07:08,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=402557.1666666667, ans=0.125 2024-09-16 06:07:08,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402557.1666666667, ans=0.125 2024-09-16 06:07:18,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=402585.5, ans=0.05 2024-09-16 06:07:29,952 INFO [train.py:1198] (1/2) Epoch 23, batch 1500, loss[loss=0.2564, ctc_loss=0.1731, cr_loss=0.4163, over 20971.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1577, cr_loss=0.3796, over 4090488.60 frames. ], batch size: 64, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:07:46,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-16 06:08:02,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=402670.5, ans=0.02 2024-09-16 06:08:04,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-16 06:08:07,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=402670.5, ans=0.125 2024-09-16 06:08:08,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=402670.5, ans=0.0 2024-09-16 06:08:16,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=402670.5, ans=0.2 2024-09-16 06:08:32,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=402698.8333333333, ans=0.125 2024-09-16 06:08:47,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.099e+02 2.250e+02 2.440e+02 4.560e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-16 06:08:47,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=402727.1666666667, ans=0.125 2024-09-16 06:08:48,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=402755.5, ans=0.125 2024-09-16 06:08:50,187 INFO [train.py:1198] (1/2) Epoch 23, batch 1550, loss[loss=0.1913, ctc_loss=0.1273, cr_loss=0.32, over 21073.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1572, cr_loss=0.3788, over 4095843.30 frames. ], batch size: 53, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:09:25,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=402812.1666666667, ans=0.0 2024-09-16 06:09:38,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=402840.5, ans=0.2 2024-09-16 06:09:41,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=402840.5, ans=0.0 2024-09-16 06:10:09,572 INFO [train.py:1198] (1/2) Epoch 23, batch 1600, loss[loss=0.2218, ctc_loss=0.149, cr_loss=0.3638, over 20856.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3781, over 4095273.86 frames. ], batch size: 57, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:10:34,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=402925.5, ans=0.125 2024-09-16 06:10:49,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402953.8333333333, ans=0.125 2024-09-16 06:11:21,936 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.113e+02 2.293e+02 2.561e+02 4.068e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-16 06:11:25,060 INFO [train.py:1198] (1/2) Epoch 23, batch 1650, loss[loss=0.228, ctc_loss=0.1533, cr_loss=0.3732, over 20821.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3793, over 4094261.64 frames. ], batch size: 59, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:11:30,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=403038.8333333333, ans=0.125 2024-09-16 06:11:37,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-09-16 06:11:52,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403067.1666666667, ans=0.0 2024-09-16 06:12:24,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403152.1666666667, ans=0.1 2024-09-16 06:12:40,660 INFO [train.py:1198] (1/2) Epoch 23, batch 1700, loss[loss=0.2551, ctc_loss=0.1719, cr_loss=0.4164, over 20676.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1572, cr_loss=0.379, over 4109754.69 frames. ], batch size: 68, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:13:03,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=403208.8333333333, ans=0.125 2024-09-16 06:13:29,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403265.5, ans=0.1 2024-09-16 06:13:35,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=403265.5, ans=0.125 2024-09-16 06:13:39,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=403265.5, ans=0.0 2024-09-16 06:13:52,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=403293.8333333333, ans=0.0 2024-09-16 06:13:56,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.040e+02 2.240e+02 2.452e+02 4.086e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 06:13:59,470 INFO [train.py:1198] (1/2) Epoch 23, batch 1750, loss[loss=0.2578, ctc_loss=0.1743, cr_loss=0.4176, over 20852.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1573, cr_loss=0.3798, over 4110656.28 frames. ], batch size: 65, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:14:01,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=403322.1666666667, ans=0.125 2024-09-16 06:14:27,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=403350.5, ans=0.0 2024-09-16 06:14:31,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2024-09-16 06:15:18,047 INFO [train.py:1198] (1/2) Epoch 23, batch 1800, loss[loss=0.2075, ctc_loss=0.1374, cr_loss=0.3505, over 20968.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1577, cr_loss=0.3804, over 4113927.32 frames. ], batch size: 52, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:15:32,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-16 06:15:48,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403520.5, ans=0.1 2024-09-16 06:16:01,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2024-09-16 06:16:30,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.070e+02 2.202e+02 2.361e+02 2.764e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 06:16:33,884 INFO [train.py:1198] (1/2) Epoch 23, batch 1850, loss[loss=0.1942, ctc_loss=0.1296, cr_loss=0.3231, over 20929.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1583, cr_loss=0.381, over 4122000.43 frames. ], batch size: 49, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:16:38,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=403605.5, ans=0.0 2024-09-16 06:16:55,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403633.8333333333, ans=0.1 2024-09-16 06:16:57,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=403633.8333333333, ans=0.025 2024-09-16 06:17:03,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=403662.1666666667, ans=0.2 2024-09-16 06:17:12,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-16 06:17:14,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=403662.1666666667, ans=0.025 2024-09-16 06:17:20,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=403690.5, ans=0.0 2024-09-16 06:17:48,770 INFO [train.py:1198] (1/2) Epoch 23, batch 1900, loss[loss=0.2055, ctc_loss=0.1349, cr_loss=0.3532, over 21055.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.158, cr_loss=0.3806, over 4123984.83 frames. ], batch size: 53, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:18:13,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403775.5, ans=0.125 2024-09-16 06:18:17,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-16 06:18:52,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-16 06:18:54,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-16 06:19:01,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.067e+02 2.183e+02 2.353e+02 2.811e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-16 06:19:04,392 INFO [train.py:1198] (1/2) Epoch 23, batch 1950, loss[loss=0.2053, ctc_loss=0.1391, cr_loss=0.3311, over 20992.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1581, cr_loss=0.3801, over 4122754.00 frames. ], batch size: 52, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:19:12,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=403888.8333333333, ans=0.0 2024-09-16 06:19:21,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=403917.1666666667, ans=0.125 2024-09-16 06:19:25,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=403917.1666666667, ans=0.125 2024-09-16 06:19:31,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=403917.1666666667, ans=0.025 2024-09-16 06:20:17,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=404002.1666666667, ans=0.125 2024-09-16 06:20:22,970 INFO [train.py:1198] (1/2) Epoch 23, batch 2000, loss[loss=0.2469, ctc_loss=0.1663, cr_loss=0.4029, over 20974.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1588, cr_loss=0.3808, over 4116524.19 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:20:35,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=404030.5, ans=0.0 2024-09-16 06:20:53,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=404058.8333333333, ans=0.0 2024-09-16 06:20:53,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-16 06:20:59,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=404087.1666666667, ans=0.125 2024-09-16 06:21:03,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=404087.1666666667, ans=0.2 2024-09-16 06:21:08,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=404087.1666666667, ans=0.125 2024-09-16 06:21:37,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.132e+02 2.284e+02 2.501e+02 4.801e+02, threshold=4.568e+02, percent-clipped=2.0 2024-09-16 06:21:40,779 INFO [train.py:1198] (1/2) Epoch 23, batch 2050, loss[loss=0.2341, ctc_loss=0.1608, cr_loss=0.3663, over 20755.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.159, cr_loss=0.3812, over 4104591.26 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:22:21,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=404228.8333333333, ans=0.0 2024-09-16 06:22:39,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=404285.5, ans=0.125 2024-09-16 06:22:52,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=404285.5, ans=0.125 2024-09-16 06:22:56,183 INFO [train.py:1198] (1/2) Epoch 23, batch 2100, loss[loss=0.2274, ctc_loss=0.1529, cr_loss=0.3724, over 21028.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1599, cr_loss=0.3834, over 4096805.79 frames. ], batch size: 63, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:23:07,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=404313.8333333333, ans=0.0 2024-09-16 06:23:18,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2024-09-16 06:24:08,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.052e+02 2.203e+02 2.372e+02 2.715e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 06:24:11,603 INFO [train.py:1198] (1/2) Epoch 23, batch 2150, loss[loss=0.2145, ctc_loss=0.1426, cr_loss=0.3595, over 21066.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1594, cr_loss=0.3823, over 4103008.33 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:24:52,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=404512.1666666667, ans=0.0 2024-09-16 06:24:58,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404540.5, ans=0.125 2024-09-16 06:25:09,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=404540.5, ans=10.0 2024-09-16 06:25:30,410 INFO [train.py:1198] (1/2) Epoch 23, batch 2200, loss[loss=0.2614, ctc_loss=0.1785, cr_loss=0.4146, over 20818.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1591, cr_loss=0.3817, over 4108430.36 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:25:39,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=404597.1666666667, ans=0.025 2024-09-16 06:26:45,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.071e+02 2.210e+02 2.331e+02 3.694e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 06:26:48,918 INFO [train.py:1198] (1/2) Epoch 23, batch 2250, loss[loss=0.2544, ctc_loss=0.1724, cr_loss=0.4099, over 20985.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1588, cr_loss=0.381, over 4107691.32 frames. ], batch size: 64, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:26:53,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=22.5 2024-09-16 06:27:19,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=404795.5, ans=0.125 2024-09-16 06:27:43,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=404823.8333333333, ans=0.0 2024-09-16 06:28:01,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=404852.1666666667, ans=0.04949747468305833 2024-09-16 06:28:04,467 INFO [train.py:1198] (1/2) Epoch 23, batch 2300, loss[loss=0.2198, ctc_loss=0.1474, cr_loss=0.3617, over 21049.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3805, over 4102653.80 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:28:38,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=404937.1666666667, ans=0.125 2024-09-16 06:28:38,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=404937.1666666667, ans=0.2 2024-09-16 06:28:55,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=404965.5, ans=0.125 2024-09-16 06:28:56,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=404965.5, ans=0.125 2024-09-16 06:29:17,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.057e+02 2.172e+02 2.331e+02 3.388e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-16 06:29:20,662 INFO [train.py:1198] (1/2) Epoch 23, batch 2350, loss[loss=0.2577, ctc_loss=0.1752, cr_loss=0.412, over 21037.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3788, over 4110883.84 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:29:22,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-09-16 06:30:05,915 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:30:25,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405135.5, ans=0.1 2024-09-16 06:30:35,923 INFO [train.py:1198] (1/2) Epoch 23, batch 2400, loss[loss=0.2595, ctc_loss=0.1786, cr_loss=0.4045, over 19988.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.378, over 4107805.72 frames. ], batch size: 80, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:31:19,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=405220.5, ans=0.125 2024-09-16 06:31:47,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-16 06:31:51,075 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.111e+02 2.261e+02 2.435e+02 4.919e+02, threshold=4.522e+02, percent-clipped=1.0 2024-09-16 06:31:53,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=22.5 2024-09-16 06:31:54,232 INFO [train.py:1198] (1/2) Epoch 23, batch 2450, loss[loss=0.2086, ctc_loss=0.1409, cr_loss=0.3387, over 20777.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.157, cr_loss=0.3779, over 4098104.16 frames. ], batch size: 53, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:32:37,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-16 06:32:53,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=405390.5, ans=0.0 2024-09-16 06:33:12,691 INFO [train.py:1198] (1/2) Epoch 23, batch 2500, loss[loss=0.1904, ctc_loss=0.1271, cr_loss=0.3166, over 21078.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1567, cr_loss=0.3775, over 4098687.78 frames. ], batch size: 53, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:33:17,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=405447.1666666667, ans=0.125 2024-09-16 06:33:22,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=405447.1666666667, ans=0.125 2024-09-16 06:33:35,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=405475.5, ans=0.035 2024-09-16 06:33:35,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405475.5, ans=0.1 2024-09-16 06:34:09,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=405532.1666666667, ans=0.125 2024-09-16 06:34:09,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405532.1666666667, ans=0.1 2024-09-16 06:34:17,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=405560.5, ans=0.125 2024-09-16 06:34:25,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.075e+02 2.182e+02 2.343e+02 3.194e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-16 06:34:28,844 INFO [train.py:1198] (1/2) Epoch 23, batch 2550, loss[loss=0.226, ctc_loss=0.1488, cr_loss=0.386, over 20779.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1568, cr_loss=0.3768, over 4094078.96 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:34:29,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=405588.8333333333, ans=0.0 2024-09-16 06:34:44,948 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:35:45,334 INFO [train.py:1198] (1/2) Epoch 23, batch 2600, loss[loss=0.2756, ctc_loss=0.1942, cr_loss=0.4069, over 18383.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1568, cr_loss=0.3772, over 4094769.08 frames. ], batch size: 108, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:36:00,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=405758.8333333333, ans=0.125 2024-09-16 06:36:08,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=405758.8333333333, ans=0.025 2024-09-16 06:36:09,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=405758.8333333333, ans=0.125 2024-09-16 06:37:00,994 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.069e+02 2.208e+02 2.353e+02 6.566e+02, threshold=4.416e+02, percent-clipped=1.0 2024-09-16 06:37:03,887 INFO [train.py:1198] (1/2) Epoch 23, batch 2650, loss[loss=0.2973, ctc_loss=0.2121, cr_loss=0.4263, over 14083.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1568, cr_loss=0.3775, over 4096771.97 frames. ], batch size: 150, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:38:22,591 INFO [train.py:1198] (1/2) Epoch 23, batch 2700, loss[loss=0.2437, ctc_loss=0.1658, cr_loss=0.3895, over 20931.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1568, cr_loss=0.3774, over 4099569.89 frames. ], batch size: 60, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:38:24,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-09-16 06:38:44,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406042.1666666667, ans=0.1 2024-09-16 06:38:51,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=406070.5, ans=0.125 2024-09-16 06:39:25,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-16 06:39:33,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=406127.1666666667, ans=15.0 2024-09-16 06:39:37,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.121e+02 2.268e+02 2.452e+02 3.905e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-16 06:39:38,762 INFO [train.py:1198] (1/2) Epoch 23, batch 2750, loss[loss=0.2416, ctc_loss=0.16, cr_loss=0.4078, over 21059.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3784, over 4093842.10 frames. ], batch size: 56, lr: 3.62e-03, grad_scale: 16.0 2024-09-16 06:39:45,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=406155.5, ans=0.0 2024-09-16 06:40:50,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=406268.8333333333, ans=0.025 2024-09-16 06:40:53,761 INFO [train.py:1198] (1/2) Epoch 23, batch 2800, loss[loss=0.1855, ctc_loss=0.1241, cr_loss=0.307, over 20982.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1572, cr_loss=0.379, over 4102895.16 frames. ], batch size: 49, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:41:37,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=406382.1666666667, ans=0.125 2024-09-16 06:42:07,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.113e+02 2.299e+02 2.488e+02 7.893e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-16 06:42:09,409 INFO [train.py:1198] (1/2) Epoch 23, batch 2850, loss[loss=0.2353, ctc_loss=0.1558, cr_loss=0.3974, over 20872.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1576, cr_loss=0.3801, over 4109312.58 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:42:21,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=406438.8333333333, ans=0.015 2024-09-16 06:42:38,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=406467.1666666667, ans=0.025 2024-09-16 06:42:41,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=406495.5, ans=0.125 2024-09-16 06:42:47,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=406495.5, ans=0.125 2024-09-16 06:43:13,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=22.5 2024-09-16 06:43:19,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=406552.1666666667, ans=0.125 2024-09-16 06:43:27,983 INFO [train.py:1198] (1/2) Epoch 23, batch 2900, loss[loss=0.1899, ctc_loss=0.1244, cr_loss=0.3273, over 19972.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1567, cr_loss=0.3782, over 4109273.92 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:43:35,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406580.5, ans=0.1 2024-09-16 06:43:49,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-16 06:44:11,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=406637.1666666667, ans=0.125 2024-09-16 06:44:22,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=406665.5, ans=0.0 2024-09-16 06:44:35,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=406693.8333333333, ans=0.0 2024-09-16 06:44:43,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=406693.8333333333, ans=0.125 2024-09-16 06:44:44,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.062e+02 2.219e+02 2.383e+02 3.445e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 06:44:44,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=406722.1666666667, ans=0.125 2024-09-16 06:44:45,958 INFO [train.py:1198] (1/2) Epoch 23, batch 2950, loss[loss=0.2266, ctc_loss=0.1526, cr_loss=0.3697, over 20958.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3783, over 4102590.09 frames. ], batch size: 55, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:45:13,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=406750.5, ans=0.125 2024-09-16 06:45:22,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=406778.8333333333, ans=0.0 2024-09-16 06:45:24,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-09-16 06:45:33,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-16 06:45:57,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=406835.5, ans=0.0 2024-09-16 06:46:01,574 INFO [train.py:1198] (1/2) Epoch 23, batch 3000, loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3778, over 20862.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3794, over 4092200.40 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:46:01,575 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 06:46:24,121 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.0429, ctc_loss=0.0429, cr_loss=1.141e-14, over 944034.00 frames. 2024-09-16 06:46:24,122 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 06:46:39,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=406892.1666666667, ans=0.125 2024-09-16 06:46:56,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=12.0 2024-09-16 06:47:29,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=406977.1666666667, ans=0.04949747468305833 2024-09-16 06:47:36,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=406977.1666666667, ans=0.015 2024-09-16 06:47:39,362 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.070e+02 2.200e+02 2.343e+02 3.689e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-16 06:47:40,879 INFO [train.py:1198] (1/2) Epoch 23, batch 3050, loss[loss=0.2293, ctc_loss=0.1546, cr_loss=0.373, over 20864.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1582, cr_loss=0.3804, over 4083728.76 frames. ], batch size: 54, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:47:54,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=407033.8333333333, ans=0.0 2024-09-16 06:48:02,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407033.8333333333, ans=0.125 2024-09-16 06:48:59,744 INFO [train.py:1198] (1/2) Epoch 23, batch 3100, loss[loss=0.2779, ctc_loss=0.1992, cr_loss=0.3935, over 13983.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3801, over 4089462.26 frames. ], batch size: 150, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:49:14,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-16 06:49:59,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407232.1666666667, ans=0.1 2024-09-16 06:50:17,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.128e+02 2.254e+02 2.369e+02 3.498e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-16 06:50:18,680 INFO [train.py:1198] (1/2) Epoch 23, batch 3150, loss[loss=0.2565, ctc_loss=0.1744, cr_loss=0.4102, over 21015.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.158, cr_loss=0.3806, over 4095436.95 frames. ], batch size: 61, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:50:26,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=407288.8333333333, ans=0.125 2024-09-16 06:50:57,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407345.5, ans=0.125 2024-09-16 06:51:08,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=22.5 2024-09-16 06:51:21,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-16 06:51:28,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=407402.1666666667, ans=0.125 2024-09-16 06:51:34,645 INFO [train.py:1198] (1/2) Epoch 23, batch 3200, loss[loss=0.1857, ctc_loss=0.1225, cr_loss=0.316, over 20964.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1559, cr_loss=0.3772, over 4100061.39 frames. ], batch size: 50, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:52:05,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=407487.1666666667, ans=0.0 2024-09-16 06:52:15,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=407487.1666666667, ans=0.2 2024-09-16 06:52:17,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-16 06:52:39,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407543.8333333333, ans=0.125 2024-09-16 06:52:41,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407543.8333333333, ans=0.0 2024-09-16 06:52:48,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.167e+02 2.292e+02 2.503e+02 3.773e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-16 06:52:50,102 INFO [train.py:1198] (1/2) Epoch 23, batch 3250, loss[loss=0.2396, ctc_loss=0.1593, cr_loss=0.4016, over 20983.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.3781, over 4096196.47 frames. ], batch size: 55, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:53:06,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=407600.5, ans=0.125 2024-09-16 06:53:11,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=407600.5, ans=0.125 2024-09-16 06:53:11,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=407600.5, ans=0.125 2024-09-16 06:53:25,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407628.8333333333, ans=0.1 2024-09-16 06:53:35,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=407657.1666666667, ans=0.125 2024-09-16 06:53:50,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=407685.5, ans=0.0 2024-09-16 06:54:00,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=407685.5, ans=0.125 2024-09-16 06:54:08,820 INFO [train.py:1198] (1/2) Epoch 23, batch 3300, loss[loss=0.2468, ctc_loss=0.1679, cr_loss=0.3948, over 20339.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3798, over 4098055.53 frames. ], batch size: 74, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:54:18,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-16 06:54:26,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407742.1666666667, ans=0.125 2024-09-16 06:54:44,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=22.5 2024-09-16 06:54:56,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=15.0 2024-09-16 06:55:26,794 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.045e+02 2.180e+02 2.349e+02 7.274e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-16 06:55:28,281 INFO [train.py:1198] (1/2) Epoch 23, batch 3350, loss[loss=0.2323, ctc_loss=0.1537, cr_loss=0.3933, over 20938.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1564, cr_loss=0.3783, over 4099852.30 frames. ], batch size: 60, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:55:36,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=407855.5, ans=0.05 2024-09-16 06:56:17,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-16 06:56:33,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=407968.8333333333, ans=0.5 2024-09-16 06:56:36,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=407968.8333333333, ans=0.0 2024-09-16 06:56:44,016 INFO [train.py:1198] (1/2) Epoch 23, batch 3400, loss[loss=0.2409, ctc_loss=0.1625, cr_loss=0.3919, over 20834.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3792, over 4091263.86 frames. ], batch size: 59, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:56:57,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=407997.1666666667, ans=0.07 2024-09-16 06:57:36,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=408082.1666666667, ans=0.0 2024-09-16 06:57:43,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=22.5 2024-09-16 06:57:58,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.130e+02 2.286e+02 2.441e+02 4.492e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-16 06:58:00,499 INFO [train.py:1198] (1/2) Epoch 23, batch 3450, loss[loss=0.283, ctc_loss=0.1938, cr_loss=0.4457, over 18180.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.158, cr_loss=0.3804, over 4096270.88 frames. ], batch size: 108, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:58:28,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=408167.1666666667, ans=0.125 2024-09-16 06:58:32,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=408195.5, ans=0.125 2024-09-16 06:58:52,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=408223.8333333333, ans=0.025 2024-09-16 06:59:03,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-16 06:59:15,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-09-16 06:59:16,267 INFO [train.py:1198] (1/2) Epoch 23, batch 3500, loss[loss=0.2047, ctc_loss=0.1366, cr_loss=0.3402, over 20902.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1573, cr_loss=0.3795, over 4105803.31 frames. ], batch size: 54, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 06:59:22,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=408280.5, ans=0.0 2024-09-16 06:59:26,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.47 vs. limit=6.0 2024-09-16 06:59:37,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-09-16 06:59:56,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=408337.1666666667, ans=0.05 2024-09-16 07:00:05,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-09-16 07:00:06,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=408365.5, ans=10.0 2024-09-16 07:00:17,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=408365.5, ans=0.09899494936611666 2024-09-16 07:00:33,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.097e+02 2.202e+02 2.396e+02 4.312e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 07:00:34,888 INFO [train.py:1198] (1/2) Epoch 23, batch 3550, loss[loss=0.2324, ctc_loss=0.1595, cr_loss=0.3642, over 20951.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.158, cr_loss=0.3812, over 4110946.76 frames. ], batch size: 60, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:01:26,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408507.1666666667, ans=0.1 2024-09-16 07:01:37,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-09-16 07:01:53,294 INFO [train.py:1198] (1/2) Epoch 23, batch 3600, loss[loss=0.242, ctc_loss=0.1624, cr_loss=0.398, over 21063.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1587, cr_loss=0.3816, over 4098260.68 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:03:07,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.045e+02 2.183e+02 2.379e+02 3.734e+02, threshold=4.365e+02, percent-clipped=0.0 2024-09-16 07:03:09,146 INFO [train.py:1198] (1/2) Epoch 23, batch 3650, loss[loss=0.2652, ctc_loss=0.1819, cr_loss=0.4165, over 19938.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3795, over 4091016.99 frames. ], batch size: 80, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:03:09,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=408705.5, ans=0.2 2024-09-16 07:03:47,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=408762.1666666667, ans=0.0 2024-09-16 07:03:48,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=408762.1666666667, ans=0.0 2024-09-16 07:04:09,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=408818.8333333333, ans=0.0 2024-09-16 07:04:10,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-16 07:04:24,712 INFO [train.py:1198] (1/2) Epoch 23, batch 3700, loss[loss=0.2365, ctc_loss=0.16, cr_loss=0.3825, over 20837.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1592, cr_loss=0.3814, over 4095221.13 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:04:27,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=408847.1666666667, ans=0.2 2024-09-16 07:04:59,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=408903.8333333333, ans=0.015 2024-09-16 07:05:17,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=408932.1666666667, ans=0.0 2024-09-16 07:05:19,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408932.1666666667, ans=0.1 2024-09-16 07:05:31,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=408960.5, ans=0.0 2024-09-16 07:05:41,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.125e+02 2.301e+02 2.549e+02 3.917e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-16 07:05:42,627 INFO [train.py:1198] (1/2) Epoch 23, batch 3750, loss[loss=0.2807, ctc_loss=0.2015, cr_loss=0.3958, over 14896.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1588, cr_loss=0.3805, over 4088321.25 frames. ], batch size: 151, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:05:42,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=408988.8333333333, ans=0.125 2024-09-16 07:05:44,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=408988.8333333333, ans=0.125 2024-09-16 07:05:44,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408988.8333333333, ans=0.1 2024-09-16 07:05:50,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=408988.8333333333, ans=0.5 2024-09-16 07:06:05,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409017.1666666667, ans=0.1 2024-09-16 07:06:28,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-09-16 07:07:00,649 INFO [train.py:1198] (1/2) Epoch 23, batch 3800, loss[loss=0.2431, ctc_loss=0.1616, cr_loss=0.4072, over 21064.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1592, cr_loss=0.3814, over 4074054.47 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:07:41,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=409187.1666666667, ans=0.125 2024-09-16 07:08:14,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.136e+02 2.252e+02 2.439e+02 4.200e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-16 07:08:16,604 INFO [train.py:1198] (1/2) Epoch 23, batch 3850, loss[loss=0.2018, ctc_loss=0.1361, cr_loss=0.3282, over 20236.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3796, over 4081948.78 frames. ], batch size: 45, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:08:24,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=409272.1666666667, ans=0.125 2024-09-16 07:08:24,267 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:08:34,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=409300.5, ans=0.125 2024-09-16 07:09:30,983 INFO [train.py:1198] (1/2) Epoch 23, batch 3900, loss[loss=0.2469, ctc_loss=0.1694, cr_loss=0.3877, over 20131.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1576, cr_loss=0.3786, over 4083441.16 frames. ], batch size: 80, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:09:43,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409413.8333333333, ans=0.1 2024-09-16 07:09:51,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=409442.1666666667, ans=0.04949747468305833 2024-09-16 07:09:54,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-16 07:10:06,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=15.0 2024-09-16 07:10:07,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=409470.5, ans=0.125 2024-09-16 07:10:23,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=409498.8333333333, ans=0.125 2024-09-16 07:10:28,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=409498.8333333333, ans=0.04949747468305833 2024-09-16 07:10:45,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.057e+02 2.198e+02 2.318e+02 3.832e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 07:10:46,692 INFO [train.py:1198] (1/2) Epoch 23, batch 3950, loss[loss=0.2176, ctc_loss=0.1449, cr_loss=0.3633, over 20768.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1577, cr_loss=0.3793, over 4092284.44 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:10:59,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=409555.5, ans=0.0 2024-09-16 07:11:09,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=409583.8333333333, ans=0.2 2024-09-16 07:11:12,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=409583.8333333333, ans=0.125 2024-09-16 07:11:12,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=12.0 2024-09-16 07:11:39,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=409640.5, ans=0.2 2024-09-16 07:12:05,384 INFO [train.py:1198] (1/2) Epoch 23, batch 4000, loss[loss=0.2331, ctc_loss=0.1545, cr_loss=0.3928, over 21028.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3794, over 4095800.42 frames. ], batch size: 61, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:12:29,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-09-16 07:12:36,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=409725.5, ans=0.025 2024-09-16 07:12:59,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=409782.1666666667, ans=0.125 2024-09-16 07:13:22,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.055e+02 2.223e+02 2.385e+02 3.119e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 07:13:23,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-16 07:13:24,259 INFO [train.py:1198] (1/2) Epoch 23, batch 4050, loss[loss=0.2279, ctc_loss=0.1515, cr_loss=0.3819, over 20982.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3784, over 4095656.93 frames. ], batch size: 58, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:13:30,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409838.8333333333, ans=0.1 2024-09-16 07:13:43,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409867.1666666667, ans=0.1 2024-09-16 07:13:46,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=409867.1666666667, ans=0.2 2024-09-16 07:13:59,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2024-09-16 07:14:13,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=409923.8333333333, ans=0.0 2024-09-16 07:14:19,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=409923.8333333333, ans=0.2 2024-09-16 07:14:40,068 INFO [train.py:1198] (1/2) Epoch 23, batch 4100, loss[loss=0.2074, ctc_loss=0.1358, cr_loss=0.3581, over 21053.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3782, over 4088039.22 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:14:47,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=409980.5, ans=0.0 2024-09-16 07:14:59,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-16 07:15:05,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2024-09-16 07:15:54,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.086e+02 2.178e+02 2.319e+02 4.043e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 07:15:55,976 INFO [train.py:1198] (1/2) Epoch 23, batch 4150, loss[loss=0.2513, ctc_loss=0.1692, cr_loss=0.4102, over 20977.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1571, cr_loss=0.3794, over 4079107.62 frames. ], batch size: 64, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:16:26,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410178.8333333333, ans=0.1 2024-09-16 07:16:39,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-16 07:16:51,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-09-16 07:17:14,153 INFO [train.py:1198] (1/2) Epoch 23, batch 4200, loss[loss=0.1901, ctc_loss=0.1239, cr_loss=0.3313, over 21005.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1571, cr_loss=0.3794, over 4076843.83 frames. ], batch size: 52, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:17:15,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-09-16 07:17:22,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=410263.8333333333, ans=0.0 2024-09-16 07:17:51,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-16 07:17:59,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=410320.5, ans=0.5 2024-09-16 07:18:16,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=410377.1666666667, ans=0.2 2024-09-16 07:18:26,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=410377.1666666667, ans=0.0 2024-09-16 07:18:31,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.123e+02 2.246e+02 2.486e+02 3.752e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 07:18:32,839 INFO [train.py:1198] (1/2) Epoch 23, batch 4250, loss[loss=0.2223, ctc_loss=0.1505, cr_loss=0.3593, over 21065.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1579, cr_loss=0.3811, over 4075897.98 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:18:37,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=410405.5, ans=0.0 2024-09-16 07:18:49,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=410433.8333333333, ans=0.125 2024-09-16 07:19:24,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410490.5, ans=0.1 2024-09-16 07:19:42,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=410518.8333333333, ans=10.0 2024-09-16 07:19:48,434 INFO [train.py:1198] (1/2) Epoch 23, batch 4300, loss[loss=0.2402, ctc_loss=0.1664, cr_loss=0.3691, over 20975.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3798, over 4081207.01 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:20:08,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=410575.5, ans=0.0 2024-09-16 07:20:09,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=410575.5, ans=0.07 2024-09-16 07:20:12,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=410575.5, ans=0.2 2024-09-16 07:20:23,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410603.8333333333, ans=0.1 2024-09-16 07:21:02,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.085e+02 2.236e+02 2.414e+02 4.322e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 07:21:03,902 INFO [train.py:1198] (1/2) Epoch 23, batch 4350, loss[loss=0.2335, ctc_loss=0.1565, cr_loss=0.3847, over 20669.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1573, cr_loss=0.3798, over 4069647.90 frames. ], batch size: 68, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:22:07,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=22.5 2024-09-16 07:22:19,195 INFO [train.py:1198] (1/2) Epoch 23, batch 4400, loss[loss=0.237, ctc_loss=0.158, cr_loss=0.3953, over 21021.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1583, cr_loss=0.3803, over 4051124.94 frames. ], batch size: 61, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:22:45,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410858.8333333333, ans=0.1 2024-09-16 07:23:19,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.02 vs. limit=10.0 2024-09-16 07:23:20,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-16 07:23:39,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.181e+02 2.290e+02 2.501e+02 4.284e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-16 07:23:40,841 INFO [train.py:1198] (1/2) Epoch 23, batch 4450, loss[loss=0.2257, ctc_loss=0.1532, cr_loss=0.3627, over 20982.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3799, over 4051649.94 frames. ], batch size: 55, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:23:49,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-16 07:23:54,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=411000.5, ans=0.125 2024-09-16 07:23:59,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411000.5, ans=0.1 2024-09-16 07:24:19,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-16 07:24:19,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2024-09-16 07:24:21,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=411028.8333333333, ans=0.5 2024-09-16 07:24:56,360 INFO [train.py:1198] (1/2) Epoch 23, batch 4500, loss[loss=0.2595, ctc_loss=0.1821, cr_loss=0.3871, over 19453.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1574, cr_loss=0.3789, over 4069743.03 frames. ], batch size: 90, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:24:56,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411113.8333333333, ans=0.1 2024-09-16 07:25:12,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=411142.1666666667, ans=0.0 2024-09-16 07:25:28,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=411170.5, ans=0.1 2024-09-16 07:25:45,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=411198.8333333333, ans=0.2 2024-09-16 07:25:47,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-16 07:25:48,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=411198.8333333333, ans=0.125 2024-09-16 07:25:49,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=411198.8333333333, ans=0.025 2024-09-16 07:25:58,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411227.1666666667, ans=0.1 2024-09-16 07:26:10,747 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.116e+02 2.247e+02 2.381e+02 3.714e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 07:26:12,124 INFO [train.py:1198] (1/2) Epoch 23, batch 4550, loss[loss=0.295, ctc_loss=0.2038, cr_loss=0.4558, over 18189.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1578, cr_loss=0.379, over 4067674.48 frames. ], batch size: 108, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:27:12,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=411368.8333333333, ans=0.025 2024-09-16 07:27:20,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=411368.8333333333, ans=0.125 2024-09-16 07:27:28,199 INFO [train.py:1198] (1/2) Epoch 23, batch 4600, loss[loss=0.2414, ctc_loss=0.163, cr_loss=0.3921, over 19349.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1577, cr_loss=0.3793, over 4073814.61 frames. ], batch size: 90, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:27:34,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=411397.1666666667, ans=0.0 2024-09-16 07:27:37,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=411397.1666666667, ans=0.025 2024-09-16 07:27:51,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=411425.5, ans=0.95 2024-09-16 07:28:24,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=411482.1666666667, ans=0.2 2024-09-16 07:28:25,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=411482.1666666667, ans=0.5 2024-09-16 07:28:45,158 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.068e+02 2.187e+02 2.459e+02 3.117e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-16 07:28:46,687 INFO [train.py:1198] (1/2) Epoch 23, batch 4650, loss[loss=0.2034, ctc_loss=0.1352, cr_loss=0.3411, over 20880.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1573, cr_loss=0.3785, over 4082075.58 frames. ], batch size: 54, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:29:19,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411595.5, ans=0.125 2024-09-16 07:29:46,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=411623.8333333333, ans=0.125 2024-09-16 07:29:50,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=411652.1666666667, ans=0.125 2024-09-16 07:29:56,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=411652.1666666667, ans=0.2 2024-09-16 07:30:04,585 INFO [train.py:1198] (1/2) Epoch 23, batch 4700, loss[loss=0.2122, ctc_loss=0.1397, cr_loss=0.3627, over 21062.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3792, over 4095174.00 frames. ], batch size: 53, lr: 3.60e-03, grad_scale: 16.0 2024-09-16 07:30:22,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411708.8333333333, ans=0.1 2024-09-16 07:30:38,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-16 07:31:09,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=22.5 2024-09-16 07:31:20,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.105e+02 2.220e+02 2.381e+02 4.274e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-16 07:31:20,023 INFO [train.py:1198] (1/2) Epoch 23, batch 4750, loss[loss=0.2171, ctc_loss=0.1466, cr_loss=0.3529, over 20975.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3793, over 4091369.31 frames. ], batch size: 51, lr: 3.60e-03, grad_scale: 16.0 2024-09-16 07:31:23,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411822.1666666667, ans=0.1 2024-09-16 07:31:38,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411850.5, ans=0.1 2024-09-16 07:31:41,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=411850.5, ans=0.0 2024-09-16 07:31:43,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=411850.5, ans=0.2 2024-09-16 07:31:50,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=411878.8333333333, ans=0.0 2024-09-16 07:32:24,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=411935.5, ans=0.0 2024-09-16 07:32:35,948 INFO [train.py:1198] (1/2) Epoch 23, batch 4800, loss[loss=0.2741, ctc_loss=0.1861, cr_loss=0.4398, over 20154.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3788, over 4102915.45 frames. ], batch size: 80, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:33:09,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=412020.5, ans=0.0 2024-09-16 07:33:33,829 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:33:36,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=412077.1666666667, ans=0.0 2024-09-16 07:33:49,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-09-16 07:33:51,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.107e+02 2.248e+02 2.391e+02 3.143e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 07:33:51,453 INFO [train.py:1198] (1/2) Epoch 23, batch 4850, loss[loss=0.2438, ctc_loss=0.1622, cr_loss=0.4081, over 20959.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3796, over 4095616.04 frames. ], batch size: 58, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:34:25,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2024-09-16 07:34:32,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412162.1666666667, ans=0.125 2024-09-16 07:34:46,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:35:00,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=412218.8333333333, ans=0.2 2024-09-16 07:35:06,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=412218.8333333333, ans=0.2 2024-09-16 07:35:09,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=412218.8333333333, ans=0.125 2024-09-16 07:35:12,402 INFO [train.py:1198] (1/2) Epoch 23, batch 4900, loss[loss=0.2661, ctc_loss=0.1795, cr_loss=0.4331, over 20830.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1582, cr_loss=0.3807, over 4095390.11 frames. ], batch size: 65, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:35:20,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412247.1666666667, ans=0.1 2024-09-16 07:36:25,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.150e+02 2.264e+02 2.435e+02 4.831e+02, threshold=4.527e+02, percent-clipped=1.0 2024-09-16 07:36:25,976 INFO [train.py:1198] (1/2) Epoch 23, batch 4950, loss[loss=0.2182, ctc_loss=0.146, cr_loss=0.3614, over 20959.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1592, cr_loss=0.3817, over 4084262.95 frames. ], batch size: 52, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:36:26,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=412388.8333333333, ans=0.025 2024-09-16 07:36:27,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=412388.8333333333, ans=0.04949747468305833 2024-09-16 07:36:31,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-09-16 07:36:33,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=412388.8333333333, ans=0.05 2024-09-16 07:36:51,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-16 07:37:40,541 INFO [train.py:1198] (1/2) Epoch 23, batch 5000, loss[loss=0.209, ctc_loss=0.1369, cr_loss=0.3608, over 20958.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1588, cr_loss=0.3811, over 4083087.31 frames. ], batch size: 52, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:37:45,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-09-16 07:38:15,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-09-16 07:38:30,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=412615.5, ans=0.0 2024-09-16 07:38:54,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.025e+02 2.166e+02 2.351e+02 2.819e+02, threshold=4.332e+02, percent-clipped=0.0 2024-09-16 07:38:54,957 INFO [train.py:1198] (1/2) Epoch 23, batch 5050, loss[loss=0.2432, ctc_loss=0.1681, cr_loss=0.3757, over 21041.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1582, cr_loss=0.3803, over 4088408.48 frames. ], batch size: 62, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:38:56,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412672.1666666667, ans=0.1 2024-09-16 07:39:24,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412728.8333333333, ans=0.1 2024-09-16 07:39:36,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412728.8333333333, ans=0.1 2024-09-16 07:39:48,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-16 07:39:56,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=412785.5, ans=0.125 2024-09-16 07:40:09,165 INFO [train.py:1198] (1/2) Epoch 23, batch 5100, loss[loss=0.2492, ctc_loss=0.1731, cr_loss=0.3804, over 20638.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3795, over 4090975.41 frames. ], batch size: 68, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:40:37,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=22.5 2024-09-16 07:41:05,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=412898.8333333333, ans=0.125 2024-09-16 07:41:09,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-16 07:41:13,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-16 07:41:22,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-16 07:41:22,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.048e+02 2.221e+02 2.428e+02 4.386e+02, threshold=4.442e+02, percent-clipped=1.0 2024-09-16 07:41:22,885 INFO [train.py:1198] (1/2) Epoch 23, batch 5150, loss[loss=0.224, ctc_loss=0.1497, cr_loss=0.3717, over 20780.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1579, cr_loss=0.3799, over 4098513.55 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:41:32,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=412955.5, ans=0.125 2024-09-16 07:41:36,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=412983.8333333333, ans=0.125 2024-09-16 07:41:45,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=412983.8333333333, ans=0.0 2024-09-16 07:41:48,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412983.8333333333, ans=0.1 2024-09-16 07:41:51,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=413012.1666666667, ans=0.125 2024-09-16 07:42:12,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=413040.5, ans=0.125 2024-09-16 07:42:25,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=413068.8333333333, ans=0.2 2024-09-16 07:42:37,248 INFO [train.py:1198] (1/2) Epoch 23, batch 5200, loss[loss=0.2447, ctc_loss=0.1663, cr_loss=0.3922, over 20936.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.3781, over 4103306.03 frames. ], batch size: 64, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:42:43,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=413097.1666666667, ans=0.0 2024-09-16 07:43:01,994 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:43:39,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-09-16 07:43:54,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.127e+02 2.264e+02 2.474e+02 4.390e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 07:43:54,711 INFO [train.py:1198] (1/2) Epoch 23, batch 5250, loss[loss=0.2185, ctc_loss=0.1483, cr_loss=0.3512, over 19826.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1574, cr_loss=0.3788, over 4094136.43 frames. ], batch size: 44, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:44:01,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413238.8333333333, ans=0.1 2024-09-16 07:44:20,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-16 07:44:24,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413295.5, ans=0.125 2024-09-16 07:44:29,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-16 07:44:31,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=413295.5, ans=0.0 2024-09-16 07:44:40,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=12.0 2024-09-16 07:44:48,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=413323.8333333333, ans=0.125 2024-09-16 07:44:59,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=413352.1666666667, ans=0.0 2024-09-16 07:45:12,161 INFO [train.py:1198] (1/2) Epoch 23, batch 5300, loss[loss=0.1963, ctc_loss=0.1296, cr_loss=0.3334, over 20995.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3779, over 4092773.92 frames. ], batch size: 52, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:45:29,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-09-16 07:46:26,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.109e+02 2.290e+02 2.512e+02 6.419e+02, threshold=4.580e+02, percent-clipped=1.0 2024-09-16 07:46:26,690 INFO [train.py:1198] (1/2) Epoch 23, batch 5350, loss[loss=0.2328, ctc_loss=0.1579, cr_loss=0.3746, over 20830.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1565, cr_loss=0.3771, over 4088993.48 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:46:28,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-16 07:46:29,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=413522.1666666667, ans=0.025 2024-09-16 07:46:45,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=413550.5, ans=6.0 2024-09-16 07:46:52,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=413550.5, ans=0.125 2024-09-16 07:46:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=413578.8333333333, ans=0.04949747468305833 2024-09-16 07:47:05,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=413578.8333333333, ans=0.0 2024-09-16 07:47:17,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=413607.1666666667, ans=0.125 2024-09-16 07:47:41,091 INFO [train.py:1198] (1/2) Epoch 23, batch 5400, loss[loss=0.2444, ctc_loss=0.1658, cr_loss=0.3928, over 20264.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1561, cr_loss=0.3771, over 4084506.49 frames. ], batch size: 74, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:47:56,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413692.1666666667, ans=0.1 2024-09-16 07:47:59,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=413692.1666666667, ans=0.0 2024-09-16 07:48:34,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413748.8333333333, ans=0.1 2024-09-16 07:48:40,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=413777.1666666667, ans=0.0 2024-09-16 07:48:47,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-16 07:48:55,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.134e+02 2.266e+02 2.500e+02 4.304e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-16 07:48:55,237 INFO [train.py:1198] (1/2) Epoch 23, batch 5450, loss[loss=0.2069, ctc_loss=0.1349, cr_loss=0.3602, over 20960.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1564, cr_loss=0.3772, over 4097938.32 frames. ], batch size: 50, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:49:27,101 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:49:27,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413862.1666666667, ans=0.1 2024-09-16 07:49:43,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413890.5, ans=0.125 2024-09-16 07:50:09,825 INFO [train.py:1198] (1/2) Epoch 23, batch 5500, loss[loss=0.2227, ctc_loss=0.1489, cr_loss=0.3691, over 20758.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.379, over 4100906.20 frames. ], batch size: 53, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:50:14,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=413947.1666666667, ans=0.025 2024-09-16 07:50:16,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=413947.1666666667, ans=0.0 2024-09-16 07:50:36,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=22.5 2024-09-16 07:51:09,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414060.5, ans=0.1 2024-09-16 07:51:18,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=414060.5, ans=0.125 2024-09-16 07:51:24,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.114e+02 2.262e+02 2.494e+02 4.113e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-16 07:51:24,042 INFO [train.py:1198] (1/2) Epoch 23, batch 5550, loss[loss=0.2378, ctc_loss=0.1635, cr_loss=0.3717, over 20827.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1571, cr_loss=0.3786, over 4093150.74 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:51:36,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=414088.8333333333, ans=0.125 2024-09-16 07:51:58,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=414145.5, ans=0.0 2024-09-16 07:52:09,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=414173.8333333333, ans=0.0 2024-09-16 07:52:10,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414173.8333333333, ans=0.1 2024-09-16 07:52:31,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=414202.1666666667, ans=0.125 2024-09-16 07:52:33,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-16 07:52:39,120 INFO [train.py:1198] (1/2) Epoch 23, batch 5600, loss[loss=0.2216, ctc_loss=0.1477, cr_loss=0.3696, over 21014.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1569, cr_loss=0.3783, over 4090707.80 frames. ], batch size: 61, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:52:48,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=414230.5, ans=0.2 2024-09-16 07:53:05,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=414258.8333333333, ans=0.125 2024-09-16 07:53:31,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-09-16 07:53:37,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=22.5 2024-09-16 07:53:39,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-16 07:53:56,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.115e+02 2.228e+02 2.446e+02 3.372e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-16 07:53:56,188 INFO [train.py:1198] (1/2) Epoch 23, batch 5650, loss[loss=0.201, ctc_loss=0.1331, cr_loss=0.3393, over 21062.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1582, cr_loss=0.3809, over 4090818.28 frames. ], batch size: 53, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:54:30,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=414428.8333333333, ans=0.2 2024-09-16 07:54:57,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=414485.5, ans=0.0 2024-09-16 07:55:00,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=414485.5, ans=0.0 2024-09-16 07:55:05,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-16 07:55:13,725 INFO [train.py:1198] (1/2) Epoch 23, batch 5700, loss[loss=0.2116, ctc_loss=0.14, cr_loss=0.3583, over 20957.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3805, over 4080028.67 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:55:45,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=414570.5, ans=0.125 2024-09-16 07:56:28,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.107e+02 2.261e+02 2.440e+02 3.847e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 07:56:28,307 INFO [train.py:1198] (1/2) Epoch 23, batch 5750, loss[loss=0.2162, ctc_loss=0.145, cr_loss=0.356, over 20986.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1579, cr_loss=0.3803, over 4086598.38 frames. ], batch size: 52, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:57:38,239 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:57:42,439 INFO [train.py:1198] (1/2) Epoch 23, batch 5800, loss[loss=0.206, ctc_loss=0.1369, cr_loss=0.3457, over 20887.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1566, cr_loss=0.3782, over 4093980.04 frames. ], batch size: 54, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:57:48,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=414797.1666666667, ans=0.0 2024-09-16 07:57:58,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-09-16 07:58:21,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=414853.8333333333, ans=0.0 2024-09-16 07:58:26,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=414882.1666666667, ans=0.125 2024-09-16 07:58:56,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.105e+02 2.233e+02 2.415e+02 7.942e+02, threshold=4.466e+02, percent-clipped=1.0 2024-09-16 07:58:56,720 INFO [train.py:1198] (1/2) Epoch 23, batch 5850, loss[loss=0.2834, ctc_loss=0.1976, cr_loss=0.4287, over 18371.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1574, cr_loss=0.3789, over 4086442.20 frames. ], batch size: 108, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:58:56,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=414938.8333333333, ans=0.2 2024-09-16 07:59:05,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=414938.8333333333, ans=0.125 2024-09-16 07:59:37,278 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:59:38,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=414995.5, ans=0.125 2024-09-16 08:00:10,867 INFO [train.py:1198] (1/2) Epoch 23, batch 5900, loss[loss=0.2432, ctc_loss=0.1614, cr_loss=0.4092, over 20824.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1579, cr_loss=0.3804, over 4096579.40 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 16.0 2024-09-16 08:00:14,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=415080.5, ans=0.0 2024-09-16 08:00:45,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=415137.1666666667, ans=0.125 2024-09-16 08:00:57,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=415165.5, ans=0.125 2024-09-16 08:01:04,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=415165.5, ans=0.125 2024-09-16 08:01:08,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=8.0 2024-09-16 08:01:24,688 INFO [train.py:1198] (1/2) Epoch 23, batch 5950, loss[loss=0.2237, ctc_loss=0.1502, cr_loss=0.3677, over 20971.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1581, cr_loss=0.3803, over 4093551.56 frames. ], batch size: 55, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:01:26,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.148e+02 2.249e+02 2.375e+02 6.008e+02, threshold=4.497e+02, percent-clipped=1.0 2024-09-16 08:01:41,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=415250.5, ans=0.0 2024-09-16 08:01:49,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415250.5, ans=0.125 2024-09-16 08:01:53,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=415250.5, ans=0.2 2024-09-16 08:02:06,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-09-16 08:02:19,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-16 08:02:30,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415335.5, ans=0.1 2024-09-16 08:02:35,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-16 08:02:40,923 INFO [train.py:1198] (1/2) Epoch 23, batch 6000, loss[loss=0.2514, ctc_loss=0.1685, cr_loss=0.4143, over 20871.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1571, cr_loss=0.3795, over 4100393.10 frames. ], batch size: 57, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:02:40,923 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 08:03:04,421 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.04285, ctc_loss=0.04285, cr_loss=1.126e-14, over 944034.00 frames. 2024-09-16 08:03:04,422 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 08:03:41,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-09-16 08:03:52,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=415448.8333333333, ans=0.125 2024-09-16 08:04:04,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2024-09-16 08:04:17,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=415505.5, ans=0.125 2024-09-16 08:04:19,338 INFO [train.py:1198] (1/2) Epoch 23, batch 6050, loss[loss=0.1632, ctc_loss=0.1046, cr_loss=0.293, over 19915.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1572, cr_loss=0.3788, over 4104341.60 frames. ], batch size: 44, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:04:20,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.070e+02 2.252e+02 2.432e+02 2.990e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 08:04:22,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=415505.5, ans=0.035 2024-09-16 08:04:23,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=415505.5, ans=0.125 2024-09-16 08:04:24,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=415505.5, ans=0.0 2024-09-16 08:04:43,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=415533.8333333333, ans=0.125 2024-09-16 08:05:24,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415618.8333333333, ans=0.1 2024-09-16 08:05:33,194 INFO [train.py:1198] (1/2) Epoch 23, batch 6100, loss[loss=0.2376, ctc_loss=0.16, cr_loss=0.388, over 20628.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1571, cr_loss=0.3786, over 4092461.02 frames. ], batch size: 66, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:06:33,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=415760.5, ans=0.035 2024-09-16 08:06:46,704 INFO [train.py:1198] (1/2) Epoch 23, batch 6150, loss[loss=0.266, ctc_loss=0.1829, cr_loss=0.4155, over 19936.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1582, cr_loss=0.3798, over 4076687.40 frames. ], batch size: 80, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:06:48,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.135e+02 2.285e+02 2.429e+02 3.479e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 08:07:07,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=415817.1666666667, ans=0.125 2024-09-16 08:07:34,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=415873.8333333333, ans=0.125 2024-09-16 08:08:00,757 INFO [train.py:1198] (1/2) Epoch 23, batch 6200, loss[loss=0.2251, ctc_loss=0.15, cr_loss=0.3755, over 21017.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1586, cr_loss=0.3802, over 4062732.90 frames. ], batch size: 62, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:08:11,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=415930.5, ans=0.05 2024-09-16 08:08:19,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=415958.8333333333, ans=0.125 2024-09-16 08:08:40,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-16 08:09:13,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416072.1666666667, ans=0.125 2024-09-16 08:09:14,950 INFO [train.py:1198] (1/2) Epoch 23, batch 6250, loss[loss=0.2641, ctc_loss=0.1797, cr_loss=0.4223, over 21031.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1592, cr_loss=0.3813, over 4048515.19 frames. ], batch size: 63, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:09:17,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.140e+02 2.232e+02 2.381e+02 4.416e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 08:09:21,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=22.5 2024-09-16 08:09:23,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-16 08:09:28,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=416100.5, ans=0.025 2024-09-16 08:09:32,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=416100.5, ans=0.125 2024-09-16 08:10:24,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=416185.5, ans=0.0 2024-09-16 08:10:29,899 INFO [train.py:1198] (1/2) Epoch 23, batch 6300, loss[loss=0.2268, ctc_loss=0.1526, cr_loss=0.3709, over 20800.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1597, cr_loss=0.3817, over 4024381.27 frames. ], batch size: 53, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:10:36,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-09-16 08:11:21,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.48 vs. limit=10.0 2024-09-16 08:11:32,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=416327.1666666667, ans=0.125 2024-09-16 08:11:40,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=416327.1666666667, ans=0.125 2024-09-16 08:11:43,190 INFO [train.py:1198] (1/2) Epoch 23, batch 6350, loss[loss=0.2804, ctc_loss=0.1981, cr_loss=0.4118, over 14675.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.163, cr_loss=0.3837, over 3902521.06 frames. ], batch size: 149, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:11:45,959 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.199e+02 2.375e+02 2.594e+02 3.544e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-16 08:11:53,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=416355.5, ans=0.2 2024-09-16 08:12:22,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=416412.1666666667, ans=0.0 2024-09-16 08:12:36,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=416440.5, ans=0.025 2024-09-16 08:13:32,387 INFO [train.py:1198] (1/2) Epoch 24, batch 0, loss[loss=0.2311, ctc_loss=0.1571, cr_loss=0.3699, over 20983.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1571, cr_loss=0.3699, over 20983.00 frames. ], batch size: 52, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:13:32,388 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 08:13:50,635 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.04276, ctc_loss=0.04276, cr_loss=1.112e-14, over 944034.00 frames. 2024-09-16 08:13:50,635 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 08:14:07,554 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:14:11,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=416500.0, ans=0.015 2024-09-16 08:14:12,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-16 08:14:18,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=416500.0, ans=0.125 2024-09-16 08:14:44,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=416556.6666666667, ans=0.125 2024-09-16 08:14:48,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416556.6666666667, ans=0.125 2024-09-16 08:14:53,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=416585.0, ans=0.0 2024-09-16 08:15:05,589 INFO [train.py:1198] (1/2) Epoch 24, batch 50, loss[loss=0.2507, ctc_loss=0.1731, cr_loss=0.3882, over 20656.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1624, cr_loss=0.3877, over 919755.97 frames. ], batch size: 68, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:15:14,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=416613.3333333333, ans=0.125 2024-09-16 08:15:21,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.109e+02 2.337e+02 2.584e+02 3.452e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-16 08:15:25,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=416641.6666666667, ans=0.125 2024-09-16 08:15:25,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-09-16 08:15:52,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=416698.3333333333, ans=0.0 2024-09-16 08:15:58,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416698.3333333333, ans=0.1 2024-09-16 08:16:20,119 INFO [train.py:1198] (1/2) Epoch 24, batch 100, loss[loss=0.1786, ctc_loss=0.1186, cr_loss=0.2999, over 20986.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1607, cr_loss=0.3845, over 1610932.84 frames. ], batch size: 52, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:16:41,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=416783.3333333333, ans=0.015 2024-09-16 08:17:06,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=416840.0, ans=0.95 2024-09-16 08:17:31,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416868.3333333333, ans=0.1 2024-09-16 08:17:35,991 INFO [train.py:1198] (1/2) Epoch 24, batch 150, loss[loss=0.174, ctc_loss=0.1133, cr_loss=0.3033, over 20979.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1581, cr_loss=0.3806, over 2162761.85 frames. ], batch size: 52, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:17:52,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.114e+02 2.240e+02 2.430e+02 5.162e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-16 08:18:14,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=416953.3333333333, ans=0.125 2024-09-16 08:18:50,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=417010.0, ans=0.0 2024-09-16 08:18:57,630 INFO [train.py:1198] (1/2) Epoch 24, batch 200, loss[loss=0.2272, ctc_loss=0.1499, cr_loss=0.3865, over 20935.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3796, over 2580480.57 frames. ], batch size: 60, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:19:05,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=417038.3333333333, ans=0.0 2024-09-16 08:19:48,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=417123.3333333333, ans=0.0 2024-09-16 08:19:54,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=417123.3333333333, ans=0.125 2024-09-16 08:20:13,896 INFO [train.py:1198] (1/2) Epoch 24, batch 250, loss[loss=0.2444, ctc_loss=0.1675, cr_loss=0.3844, over 20762.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1569, cr_loss=0.3794, over 2921393.46 frames. ], batch size: 53, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:20:23,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2024-09-16 08:20:23,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-09-16 08:20:30,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.104e+02 2.210e+02 2.352e+02 3.316e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 08:20:35,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=417208.3333333333, ans=0.125 2024-09-16 08:21:29,662 INFO [train.py:1198] (1/2) Epoch 24, batch 300, loss[loss=0.2045, ctc_loss=0.1371, cr_loss=0.3372, over 21070.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.3788, over 3179073.90 frames. ], batch size: 53, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:21:44,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-09-16 08:21:49,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417350.0, ans=0.1 2024-09-16 08:22:04,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417378.3333333333, ans=0.1 2024-09-16 08:22:06,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=417378.3333333333, ans=0.0 2024-09-16 08:22:16,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=417406.6666666667, ans=0.0 2024-09-16 08:22:22,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=417406.6666666667, ans=0.0 2024-09-16 08:22:30,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=417435.0, ans=0.035 2024-09-16 08:22:45,289 INFO [train.py:1198] (1/2) Epoch 24, batch 350, loss[loss=0.2749, ctc_loss=0.1897, cr_loss=0.4262, over 20028.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1574, cr_loss=0.3789, over 3369321.61 frames. ], batch size: 80, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:23:01,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.065e+02 2.195e+02 2.372e+02 3.334e+02, threshold=4.391e+02, percent-clipped=0.0 2024-09-16 08:23:06,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=417491.6666666667, ans=0.0 2024-09-16 08:23:14,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417520.0, ans=0.125 2024-09-16 08:23:21,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=417520.0, ans=0.125 2024-09-16 08:23:28,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=417520.0, ans=0.2 2024-09-16 08:23:46,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-16 08:24:01,255 INFO [train.py:1198] (1/2) Epoch 24, batch 400, loss[loss=0.2048, ctc_loss=0.1369, cr_loss=0.3397, over 19926.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.3779, over 3534855.57 frames. ], batch size: 44, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:24:03,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-16 08:24:23,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=22.5 2024-09-16 08:24:28,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=417633.3333333333, ans=0.125 2024-09-16 08:24:44,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-16 08:24:49,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417661.6666666667, ans=0.1 2024-09-16 08:24:51,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=417690.0, ans=0.2 2024-09-16 08:25:03,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-16 08:25:22,740 INFO [train.py:1198] (1/2) Epoch 24, batch 450, loss[loss=0.2043, ctc_loss=0.137, cr_loss=0.3364, over 21000.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1559, cr_loss=0.3772, over 3667263.53 frames. ], batch size: 51, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:25:35,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=417746.6666666667, ans=0.0 2024-09-16 08:25:39,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.072e+02 2.190e+02 2.334e+02 3.687e+02, threshold=4.380e+02, percent-clipped=0.0 2024-09-16 08:26:09,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=417831.6666666667, ans=0.0 2024-09-16 08:26:14,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=417831.6666666667, ans=0.125 2024-09-16 08:26:38,515 INFO [train.py:1198] (1/2) Epoch 24, batch 500, loss[loss=0.1785, ctc_loss=0.1172, cr_loss=0.3068, over 20977.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.156, cr_loss=0.3776, over 3764025.88 frames. ], batch size: 51, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:26:49,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=417888.3333333333, ans=0.0 2024-09-16 08:27:16,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=417945.0, ans=0.125 2024-09-16 08:27:32,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=417973.3333333333, ans=0.125 2024-09-16 08:27:36,965 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:27:41,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=418001.6666666667, ans=0.125 2024-09-16 08:27:53,381 INFO [train.py:1198] (1/2) Epoch 24, batch 550, loss[loss=0.2167, ctc_loss=0.1422, cr_loss=0.3724, over 20954.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1572, cr_loss=0.3791, over 3822470.92 frames. ], batch size: 49, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:28:09,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.091e+02 2.222e+02 2.388e+02 4.824e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-16 08:28:19,288 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:28:59,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=418143.3333333333, ans=0.025 2024-09-16 08:29:08,677 INFO [train.py:1198] (1/2) Epoch 24, batch 600, loss[loss=0.2025, ctc_loss=0.1338, cr_loss=0.3433, over 19515.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3789, over 3878680.50 frames. ], batch size: 43, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:29:26,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-16 08:30:25,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-16 08:30:30,675 INFO [train.py:1198] (1/2) Epoch 24, batch 650, loss[loss=0.2363, ctc_loss=0.1589, cr_loss=0.387, over 20871.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3793, over 3924300.87 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:30:40,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=418313.3333333333, ans=0.0 2024-09-16 08:30:41,455 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:30:47,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.070e+02 2.181e+02 2.393e+02 3.062e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-16 08:31:04,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-09-16 08:31:07,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-16 08:31:14,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=418398.3333333333, ans=0.04949747468305833 2024-09-16 08:31:23,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=418398.3333333333, ans=0.125 2024-09-16 08:31:31,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=418426.6666666667, ans=0.0 2024-09-16 08:31:46,670 INFO [train.py:1198] (1/2) Epoch 24, batch 700, loss[loss=0.2407, ctc_loss=0.162, cr_loss=0.3935, over 20944.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1579, cr_loss=0.3797, over 3963790.47 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:32:07,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-16 08:32:15,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=418511.6666666667, ans=0.0 2024-09-16 08:32:31,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418540.0, ans=0.125 2024-09-16 08:32:41,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418540.0, ans=0.0 2024-09-16 08:32:43,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=418540.0, ans=0.2 2024-09-16 08:33:02,885 INFO [train.py:1198] (1/2) Epoch 24, batch 750, loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.3724, over 21068.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1582, cr_loss=0.3802, over 3984054.69 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:33:03,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=418596.6666666667, ans=0.0 2024-09-16 08:33:19,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.137e+02 2.258e+02 2.452e+02 4.942e+02, threshold=4.516e+02, percent-clipped=2.0 2024-09-16 08:33:21,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=418625.0, ans=0.125 2024-09-16 08:33:30,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=418625.0, ans=0.125 2024-09-16 08:33:36,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-16 08:33:37,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=418653.3333333333, ans=0.0 2024-09-16 08:33:42,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=418653.3333333333, ans=0.125 2024-09-16 08:33:48,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=418681.6666666667, ans=0.0 2024-09-16 08:33:57,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=418681.6666666667, ans=0.0 2024-09-16 08:34:18,183 INFO [train.py:1198] (1/2) Epoch 24, batch 800, loss[loss=0.2247, ctc_loss=0.152, cr_loss=0.3638, over 21013.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3794, over 4008098.01 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:34:28,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=418738.3333333333, ans=0.025 2024-09-16 08:34:54,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418795.0, ans=0.1 2024-09-16 08:35:18,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=418851.6666666667, ans=0.2 2024-09-16 08:35:21,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=418851.6666666667, ans=0.125 2024-09-16 08:35:26,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=418851.6666666667, ans=0.0 2024-09-16 08:35:33,530 INFO [train.py:1198] (1/2) Epoch 24, batch 850, loss[loss=0.2644, ctc_loss=0.1824, cr_loss=0.4097, over 20862.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3802, over 4019102.36 frames. ], batch size: 65, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:35:45,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=418880.0, ans=0.125 2024-09-16 08:35:52,922 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.132e+02 2.261e+02 2.449e+02 6.815e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-16 08:36:14,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=22.5 2024-09-16 08:36:17,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418936.6666666667, ans=0.0 2024-09-16 08:36:39,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=418993.3333333333, ans=0.125 2024-09-16 08:36:54,412 INFO [train.py:1198] (1/2) Epoch 24, batch 900, loss[loss=0.2546, ctc_loss=0.1734, cr_loss=0.4061, over 20867.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3796, over 4032892.41 frames. ], batch size: 65, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:37:03,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=419021.6666666667, ans=0.125 2024-09-16 08:37:58,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=419135.0, ans=0.125 2024-09-16 08:38:10,543 INFO [train.py:1198] (1/2) Epoch 24, batch 950, loss[loss=0.2485, ctc_loss=0.1666, cr_loss=0.4097, over 20970.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1577, cr_loss=0.3796, over 4042185.48 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:38:26,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.101e+02 2.246e+02 2.373e+02 3.641e+02, threshold=4.492e+02, percent-clipped=0.0 2024-09-16 08:38:53,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=419220.0, ans=0.0 2024-09-16 08:39:24,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419276.6666666667, ans=0.125 2024-09-16 08:39:26,831 INFO [train.py:1198] (1/2) Epoch 24, batch 1000, loss[loss=0.2072, ctc_loss=0.14, cr_loss=0.3359, over 20949.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1576, cr_loss=0.3791, over 4056822.06 frames. ], batch size: 50, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:40:27,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=419418.3333333333, ans=0.95 2024-09-16 08:40:32,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419418.3333333333, ans=0.125 2024-09-16 08:40:38,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419418.3333333333, ans=0.1 2024-09-16 08:40:41,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419418.3333333333, ans=0.125 2024-09-16 08:40:43,889 INFO [train.py:1198] (1/2) Epoch 24, batch 1050, loss[loss=0.2527, ctc_loss=0.171, cr_loss=0.4086, over 19999.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3799, over 4061326.84 frames. ], batch size: 80, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:40:48,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419446.6666666667, ans=0.125 2024-09-16 08:40:50,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=419446.6666666667, ans=0.125 2024-09-16 08:40:55,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=419446.6666666667, ans=10.0 2024-09-16 08:41:00,075 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.125e+02 2.248e+02 2.425e+02 4.002e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 08:41:09,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=419475.0, ans=0.0 2024-09-16 08:41:14,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2024-09-16 08:41:21,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-16 08:41:40,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2024-09-16 08:42:02,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=419560.0, ans=0.2 2024-09-16 08:42:05,355 INFO [train.py:1198] (1/2) Epoch 24, batch 1100, loss[loss=0.2748, ctc_loss=0.1868, cr_loss=0.4398, over 20222.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3785, over 4074464.39 frames. ], batch size: 80, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:42:20,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=419616.6666666667, ans=0.2 2024-09-16 08:43:20,725 INFO [train.py:1198] (1/2) Epoch 24, batch 1150, loss[loss=0.3095, ctc_loss=0.2212, cr_loss=0.4416, over 14124.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3797, over 4065239.19 frames. ], batch size: 149, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:43:37,473 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.103e+02 2.223e+02 2.465e+02 3.224e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 08:43:57,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=419786.6666666667, ans=0.125 2024-09-16 08:44:09,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419815.0, ans=0.1 2024-09-16 08:44:10,824 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:44:18,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=419815.0, ans=10.0 2024-09-16 08:44:36,176 INFO [train.py:1198] (1/2) Epoch 24, batch 1200, loss[loss=0.2287, ctc_loss=0.1575, cr_loss=0.356, over 19499.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1574, cr_loss=0.3803, over 4080969.20 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:44:41,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=419871.6666666667, ans=0.2 2024-09-16 08:45:05,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419928.3333333333, ans=0.125 2024-09-16 08:45:51,959 INFO [train.py:1198] (1/2) Epoch 24, batch 1250, loss[loss=0.2058, ctc_loss=0.1383, cr_loss=0.3376, over 20991.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1565, cr_loss=0.3789, over 4090678.95 frames. ], batch size: 51, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:46:07,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420041.6666666667, ans=0.125 2024-09-16 08:46:08,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.143e+02 2.229e+02 2.408e+02 3.533e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 08:46:09,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-16 08:46:16,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=420041.6666666667, ans=0.125 2024-09-16 08:46:21,144 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:47:04,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=420126.6666666667, ans=0.0 2024-09-16 08:47:10,224 INFO [train.py:1198] (1/2) Epoch 24, batch 1300, loss[loss=0.2195, ctc_loss=0.1456, cr_loss=0.3694, over 20892.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1564, cr_loss=0.3786, over 4091271.97 frames. ], batch size: 54, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:48:11,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=420268.3333333333, ans=0.0 2024-09-16 08:48:27,536 INFO [train.py:1198] (1/2) Epoch 24, batch 1350, loss[loss=0.2391, ctc_loss=0.1606, cr_loss=0.3921, over 20792.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1571, cr_loss=0.3799, over 4101224.24 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:48:41,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420325.0, ans=0.0 2024-09-16 08:48:41,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420325.0, ans=0.1 2024-09-16 08:48:44,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.131e+02 2.263e+02 2.450e+02 7.002e+02, threshold=4.527e+02, percent-clipped=1.0 2024-09-16 08:48:47,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=420325.0, ans=0.0 2024-09-16 08:48:48,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=420325.0, ans=0.125 2024-09-16 08:49:07,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=420353.3333333333, ans=0.0 2024-09-16 08:49:17,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=420381.6666666667, ans=0.0 2024-09-16 08:49:36,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=420410.0, ans=0.125 2024-09-16 08:49:42,634 INFO [train.py:1198] (1/2) Epoch 24, batch 1400, loss[loss=0.2117, ctc_loss=0.1399, cr_loss=0.3587, over 20876.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1564, cr_loss=0.3783, over 4095500.44 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:49:43,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=420438.3333333333, ans=0.0 2024-09-16 08:49:58,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=420466.6666666667, ans=0.2 2024-09-16 08:50:35,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=420523.3333333333, ans=0.125 2024-09-16 08:50:52,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=420551.6666666667, ans=10.0 2024-09-16 08:50:57,835 INFO [train.py:1198] (1/2) Epoch 24, batch 1450, loss[loss=0.246, ctc_loss=0.1679, cr_loss=0.3906, over 20703.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1578, cr_loss=0.3808, over 4085527.17 frames. ], batch size: 71, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:51:08,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=420580.0, ans=0.2 2024-09-16 08:51:11,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=420608.3333333333, ans=0.125 2024-09-16 08:51:14,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.142e+02 2.300e+02 2.520e+02 4.396e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-16 08:51:17,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=420608.3333333333, ans=0.0 2024-09-16 08:51:54,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=420665.0, ans=0.125 2024-09-16 08:51:57,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-16 08:52:00,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=420693.3333333333, ans=0.07 2024-09-16 08:52:11,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420693.3333333333, ans=0.1 2024-09-16 08:52:13,902 INFO [train.py:1198] (1/2) Epoch 24, batch 1500, loss[loss=0.2129, ctc_loss=0.145, cr_loss=0.3395, over 20928.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1578, cr_loss=0.3803, over 4091887.31 frames. ], batch size: 60, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:52:30,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420750.0, ans=0.1 2024-09-16 08:53:05,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=420806.6666666667, ans=0.125 2024-09-16 08:53:06,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=420806.6666666667, ans=0.125 2024-09-16 08:53:15,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=420806.6666666667, ans=0.125 2024-09-16 08:53:35,360 INFO [train.py:1198] (1/2) Epoch 24, batch 1550, loss[loss=0.2387, ctc_loss=0.1626, cr_loss=0.3806, over 20712.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1574, cr_loss=0.3797, over 4087289.95 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:53:41,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=420863.3333333333, ans=0.2 2024-09-16 08:53:41,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=420863.3333333333, ans=0.125 2024-09-16 08:53:50,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420891.6666666667, ans=0.1 2024-09-16 08:53:51,898 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.039e+02 2.175e+02 2.385e+02 3.423e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-16 08:54:19,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=420948.3333333333, ans=0.0 2024-09-16 08:54:38,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2024-09-16 08:54:51,259 INFO [train.py:1198] (1/2) Epoch 24, batch 1600, loss[loss=0.1892, ctc_loss=0.1254, cr_loss=0.3189, over 20997.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1575, cr_loss=0.3796, over 4094929.35 frames. ], batch size: 49, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:55:06,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2024-09-16 08:55:15,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=421033.3333333333, ans=0.025 2024-09-16 08:55:15,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421033.3333333333, ans=0.125 2024-09-16 08:55:29,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=421061.6666666667, ans=0.0 2024-09-16 08:56:07,507 INFO [train.py:1198] (1/2) Epoch 24, batch 1650, loss[loss=0.1948, ctc_loss=0.127, cr_loss=0.339, over 20796.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1567, cr_loss=0.3783, over 4085927.09 frames. ], batch size: 53, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:56:24,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.102e+02 2.248e+02 2.461e+02 4.188e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 08:57:12,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421260.0, ans=0.125 2024-09-16 08:57:14,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=421260.0, ans=0.125 2024-09-16 08:57:23,077 INFO [train.py:1198] (1/2) Epoch 24, batch 1700, loss[loss=0.2587, ctc_loss=0.1719, cr_loss=0.4343, over 21029.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3791, over 4086401.97 frames. ], batch size: 61, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:57:41,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-09-16 08:57:51,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=421345.0, ans=0.125 2024-09-16 08:57:53,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-09-16 08:58:11,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=421373.3333333333, ans=0.125 2024-09-16 08:58:26,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=421401.6666666667, ans=0.0 2024-09-16 08:58:26,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=421401.6666666667, ans=0.2 2024-09-16 08:58:41,695 INFO [train.py:1198] (1/2) Epoch 24, batch 1750, loss[loss=0.242, ctc_loss=0.1649, cr_loss=0.3855, over 20840.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1569, cr_loss=0.379, over 4088281.99 frames. ], batch size: 59, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:58:49,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421430.0, ans=0.125 2024-09-16 08:58:56,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-16 08:59:01,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.112e+02 2.261e+02 2.410e+02 3.889e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 08:59:10,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=421458.3333333333, ans=0.0 2024-09-16 08:59:33,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=421515.0, ans=0.5 2024-09-16 08:59:35,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=421515.0, ans=0.125 2024-09-16 09:00:00,996 INFO [train.py:1198] (1/2) Epoch 24, batch 1800, loss[loss=0.2794, ctc_loss=0.194, cr_loss=0.4271, over 20084.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1569, cr_loss=0.3792, over 4095726.79 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 64.0 2024-09-16 09:00:28,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=421600.0, ans=0.125 2024-09-16 09:00:37,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=421628.3333333333, ans=0.0 2024-09-16 09:00:52,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=421656.6666666667, ans=10.0 2024-09-16 09:01:05,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=421685.0, ans=0.125 2024-09-16 09:01:16,236 INFO [train.py:1198] (1/2) Epoch 24, batch 1850, loss[loss=0.2611, ctc_loss=0.1856, cr_loss=0.3775, over 14948.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1567, cr_loss=0.3779, over 4094263.77 frames. ], batch size: 151, lr: 3.48e-03, grad_scale: 64.0 2024-09-16 09:01:32,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.107e+02 2.233e+02 2.383e+02 3.396e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 09:02:20,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=421826.6666666667, ans=0.125 2024-09-16 09:02:22,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421826.6666666667, ans=0.1 2024-09-16 09:02:32,114 INFO [train.py:1198] (1/2) Epoch 24, batch 1900, loss[loss=0.2458, ctc_loss=0.1631, cr_loss=0.4136, over 20961.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1569, cr_loss=0.3787, over 4098136.15 frames. ], batch size: 67, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:02:37,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=421855.0, ans=0.2 2024-09-16 09:03:00,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=421883.3333333333, ans=0.05 2024-09-16 09:03:47,237 INFO [train.py:1198] (1/2) Epoch 24, batch 1950, loss[loss=0.2063, ctc_loss=0.1363, cr_loss=0.3499, over 21046.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1566, cr_loss=0.3779, over 4100601.95 frames. ], batch size: 56, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:03:51,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=421996.6666666667, ans=0.2 2024-09-16 09:04:05,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.091e+02 2.292e+02 2.472e+02 3.296e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-16 09:04:26,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422053.3333333333, ans=0.125 2024-09-16 09:05:08,378 INFO [train.py:1198] (1/2) Epoch 24, batch 2000, loss[loss=0.2145, ctc_loss=0.1392, cr_loss=0.3767, over 20766.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1584, cr_loss=0.3814, over 4089512.38 frames. ], batch size: 56, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:05:38,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422195.0, ans=0.1 2024-09-16 09:06:00,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=422223.3333333333, ans=0.0 2024-09-16 09:06:12,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=422251.6666666667, ans=0.025 2024-09-16 09:06:21,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=422251.6666666667, ans=0.2 2024-09-16 09:06:24,081 INFO [train.py:1198] (1/2) Epoch 24, batch 2050, loss[loss=0.1982, ctc_loss=0.1331, cr_loss=0.3255, over 21027.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1579, cr_loss=0.3808, over 4100758.59 frames. ], batch size: 52, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:06:40,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=422308.3333333333, ans=0.07 2024-09-16 09:06:42,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.169e+02 2.314e+02 2.452e+02 4.779e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-16 09:06:56,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=422336.6666666667, ans=0.125 2024-09-16 09:07:39,962 INFO [train.py:1198] (1/2) Epoch 24, batch 2100, loss[loss=0.2167, ctc_loss=0.1438, cr_loss=0.3646, over 20963.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3797, over 4099655.31 frames. ], batch size: 55, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:07:41,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=422421.6666666667, ans=0.2 2024-09-16 09:07:43,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=422421.6666666667, ans=0.125 2024-09-16 09:07:52,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422421.6666666667, ans=0.1 2024-09-16 09:08:03,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=422450.0, ans=0.025 2024-09-16 09:08:55,498 INFO [train.py:1198] (1/2) Epoch 24, batch 2150, loss[loss=0.2365, ctc_loss=0.1605, cr_loss=0.3796, over 20980.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1572, cr_loss=0.3799, over 4102032.25 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:09:13,363 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.646e+02 2.065e+02 2.226e+02 2.404e+02 5.265e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 09:09:36,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=22.5 2024-09-16 09:09:39,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=422648.3333333333, ans=0.07 2024-09-16 09:10:03,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-16 09:10:13,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=422676.6666666667, ans=0.125 2024-09-16 09:10:16,236 INFO [train.py:1198] (1/2) Epoch 24, batch 2200, loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3746, over 20802.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3803, over 4102413.46 frames. ], batch size: 53, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:10:19,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422705.0, ans=0.125 2024-09-16 09:10:21,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=422705.0, ans=0.0 2024-09-16 09:10:49,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=422761.6666666667, ans=0.125 2024-09-16 09:11:24,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=422818.3333333333, ans=15.0 2024-09-16 09:11:31,377 INFO [train.py:1198] (1/2) Epoch 24, batch 2250, loss[loss=0.2379, ctc_loss=0.1611, cr_loss=0.3836, over 20966.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.158, cr_loss=0.3808, over 4099206.13 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:11:40,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=422846.6666666667, ans=0.125 2024-09-16 09:11:49,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.103e+02 2.247e+02 2.384e+02 2.915e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-16 09:12:01,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=422903.3333333333, ans=0.125 2024-09-16 09:12:01,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=422903.3333333333, ans=0.0 2024-09-16 09:12:01,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=422903.3333333333, ans=0.0 2024-09-16 09:12:47,179 INFO [train.py:1198] (1/2) Epoch 24, batch 2300, loss[loss=0.2385, ctc_loss=0.1647, cr_loss=0.369, over 18106.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1572, cr_loss=0.3797, over 4106869.12 frames. ], batch size: 108, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:13:22,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423045.0, ans=0.125 2024-09-16 09:13:23,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=22.5 2024-09-16 09:13:27,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=423045.0, ans=0.0 2024-09-16 09:14:03,027 INFO [train.py:1198] (1/2) Epoch 24, batch 2350, loss[loss=0.229, ctc_loss=0.1529, cr_loss=0.3808, over 21061.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.3793, over 4112644.88 frames. ], batch size: 56, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:14:19,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=423158.3333333333, ans=0.125 2024-09-16 09:14:21,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.098e+02 2.203e+02 2.397e+02 6.665e+02, threshold=4.406e+02, percent-clipped=1.0 2024-09-16 09:14:50,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=423215.0, ans=0.0 2024-09-16 09:15:18,446 INFO [train.py:1198] (1/2) Epoch 24, batch 2400, loss[loss=0.2247, ctc_loss=0.1514, cr_loss=0.3669, over 21036.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1575, cr_loss=0.3805, over 4106422.25 frames. ], batch size: 62, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:15:31,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=423271.6666666667, ans=0.125 2024-09-16 09:15:56,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423328.3333333333, ans=0.125 2024-09-16 09:16:39,695 INFO [train.py:1198] (1/2) Epoch 24, batch 2450, loss[loss=0.2403, ctc_loss=0.1617, cr_loss=0.393, over 20834.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1569, cr_loss=0.3791, over 4089474.02 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:16:54,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=8.0 2024-09-16 09:16:57,958 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.091e+02 2.189e+02 2.384e+02 3.010e+02, threshold=4.377e+02, percent-clipped=0.0 2024-09-16 09:17:07,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=22.5 2024-09-16 09:17:22,233 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:17:48,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=423526.6666666667, ans=0.95 2024-09-16 09:17:55,382 INFO [train.py:1198] (1/2) Epoch 24, batch 2500, loss[loss=0.2347, ctc_loss=0.1551, cr_loss=0.3982, over 20854.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3787, over 4082198.15 frames. ], batch size: 65, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:17:57,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=423555.0, ans=0.0 2024-09-16 09:18:02,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=423555.0, ans=0.2 2024-09-16 09:18:12,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=423583.3333333333, ans=0.0 2024-09-16 09:18:46,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=423640.0, ans=0.09899494936611666 2024-09-16 09:18:51,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=423640.0, ans=0.09899494936611666 2024-09-16 09:19:10,786 INFO [train.py:1198] (1/2) Epoch 24, batch 2550, loss[loss=0.2446, ctc_loss=0.1663, cr_loss=0.3916, over 20334.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1567, cr_loss=0.3789, over 4088503.72 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:19:29,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.095e+02 2.206e+02 2.357e+02 3.619e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-16 09:19:44,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=423753.3333333333, ans=0.025 2024-09-16 09:19:47,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=423753.3333333333, ans=0.125 2024-09-16 09:19:54,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=423781.6666666667, ans=0.2 2024-09-16 09:19:58,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=423781.6666666667, ans=0.125 2024-09-16 09:20:24,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=423810.0, ans=0.02 2024-09-16 09:20:27,058 INFO [train.py:1198] (1/2) Epoch 24, batch 2600, loss[loss=0.2476, ctc_loss=0.1663, cr_loss=0.4063, over 21054.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.156, cr_loss=0.3785, over 4098051.81 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:20:36,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-16 09:20:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=423838.3333333333, ans=0.125 2024-09-16 09:21:13,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423923.3333333333, ans=0.125 2024-09-16 09:21:19,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=423923.3333333333, ans=0.125 2024-09-16 09:21:29,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=423951.6666666667, ans=0.125 2024-09-16 09:21:47,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423980.0, ans=0.1 2024-09-16 09:21:48,369 INFO [train.py:1198] (1/2) Epoch 24, batch 2650, loss[loss=0.2464, ctc_loss=0.1677, cr_loss=0.3938, over 19520.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3791, over 4100487.55 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:21:53,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-09-16 09:22:06,657 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.080e+02 2.199e+02 2.345e+02 2.937e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-16 09:22:08,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-09-16 09:22:37,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=424065.0, ans=0.0 2024-09-16 09:22:44,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=424065.0, ans=0.125 2024-09-16 09:23:04,048 INFO [train.py:1198] (1/2) Epoch 24, batch 2700, loss[loss=0.1902, ctc_loss=0.1262, cr_loss=0.3205, over 20796.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1563, cr_loss=0.3786, over 4103437.72 frames. ], batch size: 53, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:23:09,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=424121.6666666667, ans=0.125 2024-09-16 09:23:16,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=424121.6666666667, ans=0.09899494936611666 2024-09-16 09:24:04,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=424235.0, ans=0.2 2024-09-16 09:24:19,237 INFO [train.py:1198] (1/2) Epoch 24, batch 2750, loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3583, over 19982.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1579, cr_loss=0.3815, over 4096458.61 frames. ], batch size: 44, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:24:37,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.107e+02 2.260e+02 2.391e+02 3.699e+02, threshold=4.519e+02, percent-clipped=0.0 2024-09-16 09:24:42,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=424291.6666666667, ans=0.125 2024-09-16 09:25:08,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=424348.3333333333, ans=0.07 2024-09-16 09:25:35,547 INFO [train.py:1198] (1/2) Epoch 24, batch 2800, loss[loss=0.227, ctc_loss=0.1528, cr_loss=0.3709, over 20878.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1574, cr_loss=0.3812, over 4102010.86 frames. ], batch size: 57, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:25:44,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-16 09:26:21,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-16 09:26:22,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=424490.0, ans=0.125 2024-09-16 09:26:39,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=424518.3333333333, ans=0.125 2024-09-16 09:26:49,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=424546.6666666667, ans=0.1 2024-09-16 09:26:51,165 INFO [train.py:1198] (1/2) Epoch 24, batch 2850, loss[loss=0.2458, ctc_loss=0.1663, cr_loss=0.3977, over 19340.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1572, cr_loss=0.3804, over 4100842.90 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:27:09,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.139e+02 2.236e+02 2.347e+02 3.219e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-16 09:27:26,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=424603.3333333333, ans=0.07 2024-09-16 09:27:36,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=424603.3333333333, ans=0.125 2024-09-16 09:27:49,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=424631.6666666667, ans=0.04949747468305833 2024-09-16 09:27:54,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-16 09:28:04,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=424660.0, ans=0.2 2024-09-16 09:28:13,425 INFO [train.py:1198] (1/2) Epoch 24, batch 2900, loss[loss=0.2537, ctc_loss=0.1731, cr_loss=0.4027, over 20692.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1565, cr_loss=0.3787, over 4102143.29 frames. ], batch size: 71, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:28:13,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=424688.3333333333, ans=0.0 2024-09-16 09:28:58,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424773.3333333333, ans=0.125 2024-09-16 09:29:19,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=424801.6666666667, ans=0.125 2024-09-16 09:29:29,724 INFO [train.py:1198] (1/2) Epoch 24, batch 2950, loss[loss=0.2147, ctc_loss=0.1446, cr_loss=0.3506, over 20820.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.3775, over 4094715.18 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:29:47,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.087e+02 2.181e+02 2.372e+02 4.689e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-16 09:30:10,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=424886.6666666667, ans=0.0 2024-09-16 09:30:13,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-16 09:30:19,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=424915.0, ans=0.0 2024-09-16 09:30:34,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=424943.3333333333, ans=0.2 2024-09-16 09:30:44,797 INFO [train.py:1198] (1/2) Epoch 24, batch 3000, loss[loss=0.2327, ctc_loss=0.1562, cr_loss=0.3826, over 21003.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1558, cr_loss=0.3774, over 4094058.62 frames. ], batch size: 64, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:30:44,798 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 09:31:06,285 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2862, 2.9261, 3.0847, 2.9303], device='cuda:1') 2024-09-16 09:31:09,979 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.04253, ctc_loss=0.04253, cr_loss=1.102e-14, over 944034.00 frames. 2024-09-16 09:31:09,980 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 09:31:25,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425000.0, ans=0.1 2024-09-16 09:32:19,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425085.0, ans=0.1 2024-09-16 09:32:20,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=425085.0, ans=0.025 2024-09-16 09:32:26,330 INFO [train.py:1198] (1/2) Epoch 24, batch 3050, loss[loss=0.2356, ctc_loss=0.1566, cr_loss=0.3947, over 20972.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1552, cr_loss=0.3771, over 4100970.36 frames. ], batch size: 58, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:32:28,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=425113.3333333333, ans=0.05 2024-09-16 09:32:46,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.090e+02 2.229e+02 2.364e+02 3.613e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 09:33:40,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425226.6666666667, ans=0.125 2024-09-16 09:33:45,896 INFO [train.py:1198] (1/2) Epoch 24, batch 3100, loss[loss=0.2657, ctc_loss=0.1875, cr_loss=0.3911, over 19231.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1549, cr_loss=0.3766, over 4109978.03 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:34:17,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=425311.6666666667, ans=0.125 2024-09-16 09:34:33,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-16 09:34:59,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=425396.6666666667, ans=0.125 2024-09-16 09:35:00,955 INFO [train.py:1198] (1/2) Epoch 24, batch 3150, loss[loss=0.2134, ctc_loss=0.1439, cr_loss=0.3475, over 21049.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1549, cr_loss=0.3767, over 4097799.82 frames. ], batch size: 62, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:35:01,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=425396.6666666667, ans=0.0 2024-09-16 09:35:20,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.089e+02 2.222e+02 2.393e+02 3.442e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-16 09:36:08,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=425510.0, ans=0.0 2024-09-16 09:36:16,784 INFO [train.py:1198] (1/2) Epoch 24, batch 3200, loss[loss=0.2784, ctc_loss=0.1888, cr_loss=0.4478, over 19933.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1553, cr_loss=0.3778, over 4107385.73 frames. ], batch size: 80, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:36:19,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-16 09:36:46,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-09-16 09:37:23,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425651.6666666667, ans=0.125 2024-09-16 09:37:33,116 INFO [train.py:1198] (1/2) Epoch 24, batch 3250, loss[loss=0.2572, ctc_loss=0.1732, cr_loss=0.42, over 20841.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3793, over 4102470.28 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:37:36,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=425680.0, ans=0.125 2024-09-16 09:37:41,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=425680.0, ans=0.125 2024-09-16 09:37:49,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=15.0 2024-09-16 09:37:53,059 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.097e+02 2.289e+02 2.402e+02 4.297e+02, threshold=4.577e+02, percent-clipped=0.0 2024-09-16 09:38:20,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425765.0, ans=0.1 2024-09-16 09:38:53,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=425821.6666666667, ans=0.125 2024-09-16 09:38:55,169 INFO [train.py:1198] (1/2) Epoch 24, batch 3300, loss[loss=0.2132, ctc_loss=0.139, cr_loss=0.3709, over 20956.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3784, over 4105238.63 frames. ], batch size: 51, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:39:00,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-16 09:39:34,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=425878.3333333333, ans=0.125 2024-09-16 09:39:35,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=425878.3333333333, ans=0.125 2024-09-16 09:40:12,099 INFO [train.py:1198] (1/2) Epoch 24, batch 3350, loss[loss=0.187, ctc_loss=0.1219, cr_loss=0.3254, over 20973.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3774, over 4107162.36 frames. ], batch size: 49, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:40:12,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=425963.3333333333, ans=0.0 2024-09-16 09:40:31,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.153e+02 2.322e+02 2.452e+02 3.830e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-16 09:40:37,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=425991.6666666667, ans=0.0 2024-09-16 09:40:41,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=426020.0, ans=0.125 2024-09-16 09:41:25,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=426076.6666666667, ans=6.0 2024-09-16 09:41:27,370 INFO [train.py:1198] (1/2) Epoch 24, batch 3400, loss[loss=0.237, ctc_loss=0.159, cr_loss=0.3902, over 20936.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1564, cr_loss=0.3797, over 4110119.82 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:41:46,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-16 09:42:43,519 INFO [train.py:1198] (1/2) Epoch 24, batch 3450, loss[loss=0.261, ctc_loss=0.1728, cr_loss=0.441, over 20693.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1565, cr_loss=0.38, over 4114790.44 frames. ], batch size: 66, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:42:49,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=426246.6666666667, ans=0.2 2024-09-16 09:43:02,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.099e+02 2.187e+02 2.349e+02 8.336e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-16 09:43:18,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-09-16 09:43:58,924 INFO [train.py:1198] (1/2) Epoch 24, batch 3500, loss[loss=0.2291, ctc_loss=0.1528, cr_loss=0.3811, over 21085.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1573, cr_loss=0.3809, over 4113492.18 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:44:18,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426416.6666666667, ans=0.1 2024-09-16 09:45:21,119 INFO [train.py:1198] (1/2) Epoch 24, batch 3550, loss[loss=0.2357, ctc_loss=0.1597, cr_loss=0.3804, over 21041.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1568, cr_loss=0.381, over 4120229.60 frames. ], batch size: 62, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:45:26,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=426530.0, ans=0.0 2024-09-16 09:45:30,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426530.0, ans=0.1 2024-09-16 09:45:41,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.095e+02 2.213e+02 2.381e+02 4.293e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 09:45:47,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=426558.3333333333, ans=0.0 2024-09-16 09:45:49,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=426558.3333333333, ans=0.125 2024-09-16 09:46:15,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-16 09:46:19,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=426615.0, ans=0.2 2024-09-16 09:46:37,626 INFO [train.py:1198] (1/2) Epoch 24, batch 3600, loss[loss=0.2591, ctc_loss=0.1798, cr_loss=0.3969, over 20882.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1559, cr_loss=0.3788, over 4128862.17 frames. ], batch size: 65, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:47:19,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=426728.3333333333, ans=0.2 2024-09-16 09:47:53,516 INFO [train.py:1198] (1/2) Epoch 24, batch 3650, loss[loss=0.2543, ctc_loss=0.1724, cr_loss=0.41, over 21015.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3799, over 4128314.46 frames. ], batch size: 61, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:47:59,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=426813.3333333333, ans=0.125 2024-09-16 09:48:13,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.077e+02 2.192e+02 2.344e+02 2.962e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-16 09:48:31,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=426870.0, ans=0.0 2024-09-16 09:48:45,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=426898.3333333333, ans=0.125 2024-09-16 09:49:09,493 INFO [train.py:1198] (1/2) Epoch 24, batch 3700, loss[loss=0.2911, ctc_loss=0.2042, cr_loss=0.4343, over 18097.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1565, cr_loss=0.3797, over 4112336.29 frames. ], batch size: 108, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:49:23,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=22.5 2024-09-16 09:49:59,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=427040.0, ans=0.0 2024-09-16 09:50:06,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-09-16 09:50:30,924 INFO [train.py:1198] (1/2) Epoch 24, batch 3750, loss[loss=0.2361, ctc_loss=0.1593, cr_loss=0.384, over 20966.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1562, cr_loss=0.3796, over 4117427.39 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:50:50,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.094e+02 2.217e+02 2.403e+02 3.482e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-16 09:51:01,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=427153.3333333333, ans=0.09899494936611666 2024-09-16 09:51:16,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=427181.6666666667, ans=0.0 2024-09-16 09:51:32,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=427210.0, ans=0.125 2024-09-16 09:51:33,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-16 09:51:37,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427210.0, ans=0.125 2024-09-16 09:51:47,294 INFO [train.py:1198] (1/2) Epoch 24, batch 3800, loss[loss=0.2391, ctc_loss=0.1582, cr_loss=0.4044, over 20958.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1564, cr_loss=0.38, over 4119516.35 frames. ], batch size: 64, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:51:55,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=427238.3333333333, ans=0.125 2024-09-16 09:52:02,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=427266.6666666667, ans=0.025 2024-09-16 09:52:24,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-16 09:52:54,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427351.6666666667, ans=0.125 2024-09-16 09:53:03,691 INFO [train.py:1198] (1/2) Epoch 24, batch 3850, loss[loss=0.1808, ctc_loss=0.1203, cr_loss=0.3024, over 19942.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3797, over 4107916.70 frames. ], batch size: 44, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:53:06,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427380.0, ans=0.125 2024-09-16 09:53:23,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.157e+02 2.309e+02 2.537e+02 3.875e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-16 09:53:49,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=427465.0, ans=0.09899494936611666 2024-09-16 09:54:18,761 INFO [train.py:1198] (1/2) Epoch 24, batch 3900, loss[loss=0.2078, ctc_loss=0.14, cr_loss=0.3387, over 20998.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1576, cr_loss=0.3806, over 4091867.24 frames. ], batch size: 52, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:54:49,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=427578.3333333333, ans=0.0 2024-09-16 09:55:03,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-16 09:55:19,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=427635.0, ans=0.2 2024-09-16 09:55:34,287 INFO [train.py:1198] (1/2) Epoch 24, batch 3950, loss[loss=0.204, ctc_loss=0.1336, cr_loss=0.3517, over 19337.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1569, cr_loss=0.3795, over 4086928.91 frames. ], batch size: 43, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:55:44,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-16 09:55:53,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.097e+02 2.213e+02 2.383e+02 4.274e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 09:56:02,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=427691.6666666667, ans=0.2 2024-09-16 09:56:04,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-16 09:56:44,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=427776.6666666667, ans=0.0 2024-09-16 09:56:53,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=427776.6666666667, ans=0.025 2024-09-16 09:56:55,906 INFO [train.py:1198] (1/2) Epoch 24, batch 4000, loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3764, over 20837.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1569, cr_loss=0.3807, over 4088830.99 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:57:43,007 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:58:01,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-09-16 09:58:11,400 INFO [train.py:1198] (1/2) Epoch 24, batch 4050, loss[loss=0.2514, ctc_loss=0.1734, cr_loss=0.3899, over 20970.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3793, over 4099510.37 frames. ], batch size: 58, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:58:13,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=427946.6666666667, ans=0.125 2024-09-16 09:58:26,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=427975.0, ans=0.125 2024-09-16 09:58:31,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.109e+02 2.220e+02 2.395e+02 4.719e+02, threshold=4.440e+02, percent-clipped=1.0 2024-09-16 09:58:38,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=427975.0, ans=0.125 2024-09-16 09:59:12,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428060.0, ans=0.1 2024-09-16 09:59:27,430 INFO [train.py:1198] (1/2) Epoch 24, batch 4100, loss[loss=0.2461, ctc_loss=0.1653, cr_loss=0.404, over 20672.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1572, cr_loss=0.3803, over 4093071.98 frames. ], batch size: 71, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 09:59:33,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428088.3333333333, ans=0.1 2024-09-16 09:59:38,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428088.3333333333, ans=0.1 2024-09-16 10:00:04,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-16 10:00:08,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=428145.0, ans=0.125 2024-09-16 10:00:24,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=22.5 2024-09-16 10:00:30,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-16 10:00:35,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=428201.6666666667, ans=0.125 2024-09-16 10:00:44,046 INFO [train.py:1198] (1/2) Epoch 24, batch 4150, loss[loss=0.2204, ctc_loss=0.1446, cr_loss=0.3794, over 20883.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1558, cr_loss=0.3783, over 4110928.59 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:00:44,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=428230.0, ans=0.125 2024-09-16 10:00:50,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=428230.0, ans=0.125 2024-09-16 10:01:03,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.091e+02 2.200e+02 2.388e+02 3.185e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-16 10:01:44,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=428343.3333333333, ans=0.125 2024-09-16 10:02:04,847 INFO [train.py:1198] (1/2) Epoch 24, batch 4200, loss[loss=0.2088, ctc_loss=0.1385, cr_loss=0.3513, over 20971.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1565, cr_loss=0.3797, over 4114532.49 frames. ], batch size: 48, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:02:12,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=428371.6666666667, ans=0.0 2024-09-16 10:03:09,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=428485.0, ans=0.125 2024-09-16 10:03:21,041 INFO [train.py:1198] (1/2) Epoch 24, batch 4250, loss[loss=0.2134, ctc_loss=0.1408, cr_loss=0.3627, over 20970.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3782, over 4123257.24 frames. ], batch size: 50, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:03:34,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=428541.6666666667, ans=0.035 2024-09-16 10:03:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=428541.6666666667, ans=0.0 2024-09-16 10:03:40,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.064e+02 2.201e+02 2.328e+02 3.119e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 10:03:51,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=428570.0, ans=0.2 2024-09-16 10:04:08,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=428598.3333333333, ans=0.125 2024-09-16 10:04:25,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-16 10:04:36,699 INFO [train.py:1198] (1/2) Epoch 24, batch 4300, loss[loss=0.204, ctc_loss=0.1347, cr_loss=0.3464, over 20948.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1566, cr_loss=0.3802, over 4109692.77 frames. ], batch size: 48, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:05:01,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=428683.3333333333, ans=0.04949747468305833 2024-09-16 10:05:14,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=428711.6666666667, ans=0.125 2024-09-16 10:05:22,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=428740.0, ans=0.125 2024-09-16 10:05:33,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-16 10:05:51,864 INFO [train.py:1198] (1/2) Epoch 24, batch 4350, loss[loss=0.1855, ctc_loss=0.1243, cr_loss=0.3058, over 20978.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1568, cr_loss=0.3802, over 4109684.50 frames. ], batch size: 48, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:05:52,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=428796.6666666667, ans=0.2 2024-09-16 10:06:11,953 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.156e+02 2.299e+02 2.517e+02 2.946e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-16 10:06:17,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-16 10:06:36,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=428881.6666666667, ans=0.2 2024-09-16 10:07:07,812 INFO [train.py:1198] (1/2) Epoch 24, batch 4400, loss[loss=0.2592, ctc_loss=0.1763, cr_loss=0.4143, over 20930.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1569, cr_loss=0.38, over 4105709.43 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:07:11,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-09-16 10:08:03,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-16 10:08:12,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-16 10:08:19,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=429051.6666666667, ans=0.125 2024-09-16 10:08:30,149 INFO [train.py:1198] (1/2) Epoch 24, batch 4450, loss[loss=0.251, ctc_loss=0.169, cr_loss=0.41, over 19599.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1568, cr_loss=0.3796, over 4105879.94 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:08:47,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=429108.3333333333, ans=0.125 2024-09-16 10:08:51,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.121e+02 2.283e+02 2.420e+02 4.289e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 10:09:14,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=429165.0, ans=0.2 2024-09-16 10:09:23,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=429165.0, ans=0.125 2024-09-16 10:09:45,648 INFO [train.py:1198] (1/2) Epoch 24, batch 4500, loss[loss=0.2372, ctc_loss=0.1614, cr_loss=0.379, over 21028.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3793, over 4102199.81 frames. ], batch size: 63, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:09:53,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=429221.6666666667, ans=0.125 2024-09-16 10:10:46,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=429335.0, ans=0.025 2024-09-16 10:10:51,485 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:11:01,355 INFO [train.py:1198] (1/2) Epoch 24, batch 4550, loss[loss=0.2473, ctc_loss=0.1662, cr_loss=0.4055, over 20775.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1573, cr_loss=0.3796, over 4098509.57 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:11:06,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=429363.3333333333, ans=0.0 2024-09-16 10:11:15,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.59 vs. limit=15.0 2024-09-16 10:11:23,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.134e+02 2.270e+02 2.418e+02 3.628e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-16 10:11:26,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-16 10:12:16,962 INFO [train.py:1198] (1/2) Epoch 24, batch 4600, loss[loss=0.2216, ctc_loss=0.1503, cr_loss=0.3567, over 21056.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3786, over 4097769.62 frames. ], batch size: 59, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:12:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=429533.3333333333, ans=0.0 2024-09-16 10:12:39,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=429533.3333333333, ans=0.2 2024-09-16 10:12:47,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=429561.6666666667, ans=0.125 2024-09-16 10:12:51,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429561.6666666667, ans=0.125 2024-09-16 10:13:26,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=22.5 2024-09-16 10:13:37,449 INFO [train.py:1198] (1/2) Epoch 24, batch 4650, loss[loss=0.2213, ctc_loss=0.1502, cr_loss=0.3556, over 20873.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1582, cr_loss=0.381, over 4088849.64 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:13:50,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2024-09-16 10:14:00,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.107e+02 2.214e+02 2.387e+02 3.192e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 10:14:03,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=429675.0, ans=0.2 2024-09-16 10:14:09,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=429703.3333333333, ans=0.2 2024-09-16 10:14:29,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=429731.6666666667, ans=0.2 2024-09-16 10:14:36,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=429760.0, ans=10.0 2024-09-16 10:14:52,738 INFO [train.py:1198] (1/2) Epoch 24, batch 4700, loss[loss=0.2311, ctc_loss=0.1562, cr_loss=0.3743, over 20836.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.3778, over 4097799.16 frames. ], batch size: 59, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:14:57,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=429788.3333333333, ans=0.125 2024-09-16 10:15:05,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-16 10:15:06,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=429816.6666666667, ans=0.125 2024-09-16 10:15:25,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=429845.0, ans=0.125 2024-09-16 10:15:59,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429901.6666666667, ans=0.125 2024-09-16 10:16:04,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=429901.6666666667, ans=0.125 2024-09-16 10:16:09,156 INFO [train.py:1198] (1/2) Epoch 24, batch 4750, loss[loss=0.2248, ctc_loss=0.15, cr_loss=0.374, over 21067.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1562, cr_loss=0.3785, over 4102745.43 frames. ], batch size: 53, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:16:31,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.092e+02 2.233e+02 2.402e+02 6.462e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-16 10:16:34,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=429958.3333333333, ans=0.125 2024-09-16 10:16:47,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=429986.6666666667, ans=0.125 2024-09-16 10:16:59,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=430015.0, ans=0.125 2024-09-16 10:17:08,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=430043.3333333333, ans=0.05 2024-09-16 10:17:22,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=22.5 2024-09-16 10:17:23,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=430071.6666666667, ans=0.125 2024-09-16 10:17:24,702 INFO [train.py:1198] (1/2) Epoch 24, batch 4800, loss[loss=0.2008, ctc_loss=0.1314, cr_loss=0.3467, over 21004.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1561, cr_loss=0.3781, over 4108706.98 frames. ], batch size: 48, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:17:59,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=430128.3333333333, ans=0.2 2024-09-16 10:18:19,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=430156.6666666667, ans=0.125 2024-09-16 10:18:40,573 INFO [train.py:1198] (1/2) Epoch 24, batch 4850, loss[loss=0.2809, ctc_loss=0.1874, cr_loss=0.4674, over 20988.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3794, over 4110387.45 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:18:55,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=430241.6666666667, ans=0.0 2024-09-16 10:19:07,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.094e+02 2.221e+02 2.422e+02 4.398e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-16 10:19:20,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-09-16 10:19:48,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430326.6666666667, ans=0.1 2024-09-16 10:20:01,139 INFO [train.py:1198] (1/2) Epoch 24, batch 4900, loss[loss=0.2427, ctc_loss=0.1619, cr_loss=0.404, over 21064.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.157, cr_loss=0.3804, over 4105246.44 frames. ], batch size: 59, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:20:04,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=22.5 2024-09-16 10:20:22,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-16 10:20:29,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=430411.6666666667, ans=0.125 2024-09-16 10:20:44,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430440.0, ans=0.0 2024-09-16 10:20:59,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=430468.3333333333, ans=0.015 2024-09-16 10:21:05,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=430468.3333333333, ans=0.2 2024-09-16 10:21:07,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=430468.3333333333, ans=0.125 2024-09-16 10:21:16,133 INFO [train.py:1198] (1/2) Epoch 24, batch 4950, loss[loss=0.2579, ctc_loss=0.1748, cr_loss=0.4155, over 20748.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1562, cr_loss=0.3793, over 4107483.10 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:21:35,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-16 10:21:39,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.125e+02 2.261e+02 2.482e+02 4.962e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-16 10:21:41,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-16 10:22:31,071 INFO [train.py:1198] (1/2) Epoch 24, batch 5000, loss[loss=0.1953, ctc_loss=0.1286, cr_loss=0.3335, over 20949.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3788, over 4109619.71 frames. ], batch size: 49, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:23:33,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=430751.6666666667, ans=0.125 2024-09-16 10:23:41,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-16 10:23:46,544 INFO [train.py:1198] (1/2) Epoch 24, batch 5050, loss[loss=0.2676, ctc_loss=0.189, cr_loss=0.3931, over 14518.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1562, cr_loss=0.3793, over 4094850.86 frames. ], batch size: 151, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:24:04,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=430808.3333333333, ans=0.1 2024-09-16 10:24:04,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=430808.3333333333, ans=0.125 2024-09-16 10:24:10,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.122e+02 2.270e+02 2.410e+02 4.397e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-16 10:24:25,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430836.6666666667, ans=0.1 2024-09-16 10:25:00,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=430921.6666666667, ans=0.125 2024-09-16 10:25:01,554 INFO [train.py:1198] (1/2) Epoch 24, batch 5100, loss[loss=0.2393, ctc_loss=0.1616, cr_loss=0.3889, over 20878.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1558, cr_loss=0.3777, over 4080777.02 frames. ], batch size: 57, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:25:21,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-16 10:25:22,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=430950.0, ans=0.125 2024-09-16 10:25:35,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=430978.3333333333, ans=0.5 2024-09-16 10:25:43,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-16 10:26:15,831 INFO [train.py:1198] (1/2) Epoch 24, batch 5150, loss[loss=0.2013, ctc_loss=0.1342, cr_loss=0.3354, over 20785.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1558, cr_loss=0.3771, over 4078571.53 frames. ], batch size: 53, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:26:39,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.087e+02 2.253e+02 2.421e+02 5.808e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 10:26:44,148 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:26:47,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=431120.0, ans=0.09899494936611666 2024-09-16 10:26:48,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=431120.0, ans=0.05 2024-09-16 10:26:54,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=431120.0, ans=0.125 2024-09-16 10:26:56,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=431120.0, ans=0.2 2024-09-16 10:27:16,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=431176.6666666667, ans=0.0 2024-09-16 10:27:29,935 INFO [train.py:1198] (1/2) Epoch 24, batch 5200, loss[loss=0.2142, ctc_loss=0.1443, cr_loss=0.3493, over 20969.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3786, over 4058733.41 frames. ], batch size: 48, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:27:51,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431233.3333333333, ans=0.125 2024-09-16 10:28:04,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431261.6666666667, ans=0.125 2024-09-16 10:28:06,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=431261.6666666667, ans=0.0 2024-09-16 10:28:19,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=431290.0, ans=0.0 2024-09-16 10:28:33,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=431318.3333333333, ans=0.0 2024-09-16 10:28:43,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=431318.3333333333, ans=0.125 2024-09-16 10:28:47,817 INFO [train.py:1198] (1/2) Epoch 24, batch 5250, loss[loss=0.2355, ctc_loss=0.1604, cr_loss=0.3757, over 20351.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1574, cr_loss=0.3785, over 4055658.73 frames. ], batch size: 74, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:29:02,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431346.6666666667, ans=0.1 2024-09-16 10:29:08,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=431375.0, ans=0.025 2024-09-16 10:29:11,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-09-16 10:29:14,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.112e+02 2.223e+02 2.484e+02 5.133e+02, threshold=4.446e+02, percent-clipped=1.0 2024-09-16 10:29:17,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431375.0, ans=0.0 2024-09-16 10:30:04,247 INFO [train.py:1198] (1/2) Epoch 24, batch 5300, loss[loss=0.2006, ctc_loss=0.1323, cr_loss=0.3415, over 20780.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1573, cr_loss=0.3787, over 4067954.75 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:30:46,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431545.0, ans=0.1 2024-09-16 10:31:18,440 INFO [train.py:1198] (1/2) Epoch 24, batch 5350, loss[loss=0.2361, ctc_loss=0.1595, cr_loss=0.3833, over 21009.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3779, over 4076068.35 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:31:27,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=431630.0, ans=0.2 2024-09-16 10:31:38,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=431658.3333333333, ans=0.125 2024-09-16 10:31:42,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.060e+02 2.198e+02 2.307e+02 3.212e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 10:32:06,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=431715.0, ans=0.2 2024-09-16 10:32:33,236 INFO [train.py:1198] (1/2) Epoch 24, batch 5400, loss[loss=0.2405, ctc_loss=0.1652, cr_loss=0.3769, over 20963.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.157, cr_loss=0.3787, over 4080660.48 frames. ], batch size: 58, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:32:46,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431800.0, ans=0.1 2024-09-16 10:32:57,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=431800.0, ans=0.0 2024-09-16 10:33:15,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=431828.3333333333, ans=0.0 2024-09-16 10:33:47,701 INFO [train.py:1198] (1/2) Epoch 24, batch 5450, loss[loss=0.2444, ctc_loss=0.166, cr_loss=0.392, over 20718.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1571, cr_loss=0.3794, over 4091372.79 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:34:00,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=431913.3333333333, ans=0.07 2024-09-16 10:34:11,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.145e+02 2.255e+02 2.435e+02 3.827e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-16 10:34:14,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=431941.6666666667, ans=0.125 2024-09-16 10:34:32,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.59 vs. limit=6.0 2024-09-16 10:34:33,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431998.3333333333, ans=0.1 2024-09-16 10:34:38,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=431998.3333333333, ans=0.125 2024-09-16 10:34:50,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=432026.6666666667, ans=0.2 2024-09-16 10:35:01,824 INFO [train.py:1198] (1/2) Epoch 24, batch 5500, loss[loss=0.2082, ctc_loss=0.1405, cr_loss=0.3384, over 21054.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1567, cr_loss=0.3782, over 4084111.81 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:35:22,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=432083.3333333333, ans=0.025 2024-09-16 10:35:36,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=432111.6666666667, ans=0.0 2024-09-16 10:35:57,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=432140.0, ans=0.125 2024-09-16 10:36:15,854 INFO [train.py:1198] (1/2) Epoch 24, batch 5550, loss[loss=0.2513, ctc_loss=0.1729, cr_loss=0.3919, over 20226.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3785, over 4071349.08 frames. ], batch size: 74, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:36:23,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=432196.6666666667, ans=0.2 2024-09-16 10:36:29,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=432225.0, ans=0.125 2024-09-16 10:36:37,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=432225.0, ans=0.025 2024-09-16 10:36:39,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.100e+02 2.253e+02 2.429e+02 5.430e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 10:36:39,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=432225.0, ans=0.125 2024-09-16 10:36:51,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=432253.3333333333, ans=0.125 2024-09-16 10:36:53,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=22.5 2024-09-16 10:37:00,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432281.6666666667, ans=0.1 2024-09-16 10:37:02,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-09-16 10:37:20,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432310.0, ans=0.125 2024-09-16 10:37:21,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=432310.0, ans=0.0 2024-09-16 10:37:30,383 INFO [train.py:1198] (1/2) Epoch 24, batch 5600, loss[loss=0.2656, ctc_loss=0.1848, cr_loss=0.4042, over 18054.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1565, cr_loss=0.3785, over 4078092.59 frames. ], batch size: 108, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:37:43,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432338.3333333333, ans=0.1 2024-09-16 10:38:31,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=432423.3333333333, ans=0.0 2024-09-16 10:38:46,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=432451.6666666667, ans=0.125 2024-09-16 10:38:49,703 INFO [train.py:1198] (1/2) Epoch 24, batch 5650, loss[loss=0.1995, ctc_loss=0.1317, cr_loss=0.3388, over 21066.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1565, cr_loss=0.378, over 4083831.85 frames. ], batch size: 53, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:39:05,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-16 10:39:07,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=432508.3333333333, ans=0.0 2024-09-16 10:39:10,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=432508.3333333333, ans=0.5 2024-09-16 10:39:13,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.092e+02 2.271e+02 2.405e+02 5.610e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 10:39:21,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=22.5 2024-09-16 10:39:25,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=432536.6666666667, ans=0.2 2024-09-16 10:39:37,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=432565.0, ans=0.0 2024-09-16 10:40:00,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432593.3333333333, ans=0.1 2024-09-16 10:40:04,198 INFO [train.py:1198] (1/2) Epoch 24, batch 5700, loss[loss=0.2312, ctc_loss=0.1554, cr_loss=0.3792, over 20799.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.156, cr_loss=0.3777, over 4096524.94 frames. ], batch size: 53, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:41:18,871 INFO [train.py:1198] (1/2) Epoch 24, batch 5750, loss[loss=0.2326, ctc_loss=0.1563, cr_loss=0.3814, over 21063.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3787, over 4093035.42 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:41:24,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=432763.3333333333, ans=0.0 2024-09-16 10:41:43,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.109e+02 2.228e+02 2.426e+02 3.110e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-16 10:42:33,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-16 10:42:33,528 INFO [train.py:1198] (1/2) Epoch 24, batch 5800, loss[loss=0.239, ctc_loss=0.1617, cr_loss=0.3864, over 21017.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1563, cr_loss=0.3779, over 4098320.71 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:42:56,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=15.0 2024-09-16 10:43:34,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=433018.3333333333, ans=0.0 2024-09-16 10:43:40,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=433018.3333333333, ans=0.2 2024-09-16 10:43:43,950 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:43:48,150 INFO [train.py:1198] (1/2) Epoch 24, batch 5850, loss[loss=0.2306, ctc_loss=0.1578, cr_loss=0.364, over 20983.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1567, cr_loss=0.3783, over 4089430.38 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:44:09,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=433075.0, ans=0.1 2024-09-16 10:44:11,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.113e+02 2.236e+02 2.375e+02 3.033e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 10:44:16,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433103.3333333333, ans=0.125 2024-09-16 10:44:22,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433103.3333333333, ans=0.125 2024-09-16 10:44:38,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433131.6666666667, ans=0.1 2024-09-16 10:44:56,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-09-16 10:44:57,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=433160.0, ans=0.125 2024-09-16 10:44:59,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-16 10:45:00,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433188.3333333333, ans=0.125 2024-09-16 10:45:02,021 INFO [train.py:1198] (1/2) Epoch 24, batch 5900, loss[loss=0.2844, ctc_loss=0.2019, cr_loss=0.4127, over 14921.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.156, cr_loss=0.3782, over 4093965.35 frames. ], batch size: 150, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:45:24,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433216.6666666667, ans=0.0 2024-09-16 10:45:51,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=433273.3333333333, ans=0.09899494936611666 2024-09-16 10:46:02,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-16 10:46:16,511 INFO [train.py:1198] (1/2) Epoch 24, batch 5950, loss[loss=0.2331, ctc_loss=0.1578, cr_loss=0.3764, over 20782.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1559, cr_loss=0.3778, over 4099343.61 frames. ], batch size: 53, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:46:45,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.094e+02 2.238e+02 2.450e+02 3.969e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 10:47:11,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433415.0, ans=0.1 2024-09-16 10:47:23,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-16 10:47:36,181 INFO [train.py:1198] (1/2) Epoch 24, batch 6000, loss[loss=0.2497, ctc_loss=0.174, cr_loss=0.3787, over 20296.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1564, cr_loss=0.378, over 4090014.23 frames. ], batch size: 74, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:47:36,181 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 10:48:00,681 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.04257, ctc_loss=0.04257, cr_loss=1.116e-14, over 944034.00 frames. 2024-09-16 10:48:00,681 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 10:48:02,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=433471.6666666667, ans=0.0 2024-09-16 10:48:07,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433471.6666666667, ans=0.125 2024-09-16 10:48:09,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=433471.6666666667, ans=0.0 2024-09-16 10:48:53,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433556.6666666667, ans=0.125 2024-09-16 10:48:58,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-09-16 10:49:14,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=433613.3333333333, ans=0.07 2024-09-16 10:49:15,518 INFO [train.py:1198] (1/2) Epoch 24, batch 6050, loss[loss=0.2735, ctc_loss=0.1889, cr_loss=0.4232, over 18050.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.3759, over 4096666.62 frames. ], batch size: 108, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:49:40,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=433641.6666666667, ans=0.125 2024-09-16 10:49:41,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.070e+02 2.201e+02 2.383e+02 3.042e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-16 10:50:11,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433698.3333333333, ans=0.125 2024-09-16 10:50:14,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=433726.6666666667, ans=0.07 2024-09-16 10:50:29,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433755.0, ans=0.125 2024-09-16 10:50:30,415 INFO [train.py:1198] (1/2) Epoch 24, batch 6100, loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.348, over 20990.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.3771, over 4092885.02 frames. ], batch size: 52, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:50:42,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=433755.0, ans=0.025 2024-09-16 10:51:19,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=433840.0, ans=0.0 2024-09-16 10:51:43,593 INFO [train.py:1198] (1/2) Epoch 24, batch 6150, loss[loss=0.2219, ctc_loss=0.1485, cr_loss=0.367, over 20967.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3781, over 4063416.39 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:51:50,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=433896.6666666667, ans=0.2 2024-09-16 10:52:00,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433925.0, ans=0.125 2024-09-16 10:52:08,613 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.136e+02 2.249e+02 2.391e+02 2.992e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-16 10:52:57,946 INFO [train.py:1198] (1/2) Epoch 24, batch 6200, loss[loss=0.2494, ctc_loss=0.166, cr_loss=0.4172, over 21044.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3803, over 4045453.76 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:53:26,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=434095.0, ans=0.125 2024-09-16 10:53:38,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=434095.0, ans=0.0 2024-09-16 10:53:48,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-16 10:54:12,622 INFO [train.py:1198] (1/2) Epoch 24, batch 6250, loss[loss=0.2538, ctc_loss=0.1692, cr_loss=0.4232, over 18555.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1588, cr_loss=0.3813, over 4032897.58 frames. ], batch size: 108, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:54:12,988 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:54:38,248 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.094e+02 2.247e+02 2.534e+02 5.643e+02, threshold=4.493e+02, percent-clipped=1.0 2024-09-16 10:54:51,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=434236.6666666667, ans=0.125 2024-09-16 10:54:59,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2024-09-16 10:55:26,757 INFO [train.py:1198] (1/2) Epoch 24, batch 6300, loss[loss=0.263, ctc_loss=0.1756, cr_loss=0.4372, over 20848.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3807, over 3958609.09 frames. ], batch size: 65, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:55:40,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=434350.0, ans=0.0 2024-09-16 10:55:55,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:56:07,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=434378.3333333333, ans=0.0 2024-09-16 10:56:30,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=434435.0, ans=0.125 2024-09-16 10:56:39,640 INFO [train.py:1198] (1/2) Epoch 24, batch 6350, loss[loss=0.2578, ctc_loss=0.1801, cr_loss=0.3889, over 14744.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1613, cr_loss=0.3791, over 3829548.31 frames. ], batch size: 149, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:56:53,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=434491.6666666667, ans=0.0 2024-09-16 10:56:56,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=434491.6666666667, ans=0.0 2024-09-16 10:57:04,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.354e+02 2.611e+02 2.822e+02 4.059e+02, threshold=5.222e+02, percent-clipped=0.0 2024-09-16 10:57:14,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434520.0, ans=0.125 2024-09-16 10:57:31,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2024-09-16 10:58:28,736 INFO [train.py:1198] (1/2) Epoch 25, batch 0, loss[loss=0.2449, ctc_loss=0.1644, cr_loss=0.4026, over 21011.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1644, cr_loss=0.4026, over 21011.00 frames. ], batch size: 61, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 10:58:28,736 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 10:58:47,384 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.04275, ctc_loss=0.04275, cr_loss=1.118e-14, over 944034.00 frames. 2024-09-16 10:58:47,385 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 10:59:33,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=434661.6666666667, ans=0.2 2024-09-16 10:59:39,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=434661.6666666667, ans=0.2 2024-09-16 10:59:45,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=434661.6666666667, ans=0.125 2024-09-16 10:59:51,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=434690.0, ans=0.2 2024-09-16 11:00:02,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=12.0 2024-09-16 11:00:03,126 INFO [train.py:1198] (1/2) Epoch 25, batch 50, loss[loss=0.2255, ctc_loss=0.1511, cr_loss=0.372, over 20970.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1544, cr_loss=0.3779, over 932666.06 frames. ], batch size: 51, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:00:44,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.172e+02 2.295e+02 2.567e+02 4.286e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-16 11:00:48,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-09-16 11:01:03,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-16 11:01:18,999 INFO [train.py:1198] (1/2) Epoch 25, batch 100, loss[loss=0.264, ctc_loss=0.1875, cr_loss=0.3822, over 14254.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.3762, over 1632056.04 frames. ], batch size: 149, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:01:55,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=434916.6666666667, ans=0.0 2024-09-16 11:02:40,829 INFO [train.py:1198] (1/2) Epoch 25, batch 150, loss[loss=0.2151, ctc_loss=0.1441, cr_loss=0.3547, over 20784.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1542, cr_loss=0.3763, over 2178536.78 frames. ], batch size: 56, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:03:10,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-16 11:03:13,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.09 vs. limit=10.0 2024-09-16 11:03:21,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.130e+02 2.244e+02 2.399e+02 4.281e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-16 11:03:38,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=22.5 2024-09-16 11:03:56,290 INFO [train.py:1198] (1/2) Epoch 25, batch 200, loss[loss=0.1889, ctc_loss=0.124, cr_loss=0.3248, over 20927.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1534, cr_loss=0.3747, over 2599271.13 frames. ], batch size: 48, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:04:31,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=435200.0, ans=0.025 2024-09-16 11:04:42,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=435228.3333333333, ans=0.0 2024-09-16 11:04:49,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=435228.3333333333, ans=0.0 2024-09-16 11:04:49,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=435228.3333333333, ans=0.2 2024-09-16 11:04:54,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=435228.3333333333, ans=0.125 2024-09-16 11:05:01,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=435256.6666666667, ans=0.125 2024-09-16 11:05:11,569 INFO [train.py:1198] (1/2) Epoch 25, batch 250, loss[loss=0.2109, ctc_loss=0.1382, cr_loss=0.3638, over 21070.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.376, over 2940775.39 frames. ], batch size: 56, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:05:18,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=435285.0, ans=0.5 2024-09-16 11:05:29,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=435313.3333333333, ans=0.0 2024-09-16 11:05:37,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=435313.3333333333, ans=0.025 2024-09-16 11:05:41,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=435341.6666666667, ans=0.2 2024-09-16 11:05:52,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.099e+02 2.209e+02 2.392e+02 3.192e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 11:06:05,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=435370.0, ans=0.125 2024-09-16 11:06:26,750 INFO [train.py:1198] (1/2) Epoch 25, batch 300, loss[loss=0.2085, ctc_loss=0.1374, cr_loss=0.3554, over 20959.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.3758, over 3201487.98 frames. ], batch size: 50, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:06:43,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=435455.0, ans=0.0 2024-09-16 11:07:19,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435511.6666666667, ans=0.0 2024-09-16 11:07:32,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-16 11:07:38,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2024-09-16 11:07:45,320 INFO [train.py:1198] (1/2) Epoch 25, batch 350, loss[loss=0.2053, ctc_loss=0.1366, cr_loss=0.3433, over 20949.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1537, cr_loss=0.3753, over 3406772.16 frames. ], batch size: 49, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:08:18,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435625.0, ans=0.1 2024-09-16 11:08:20,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-16 11:08:25,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.060e+02 2.170e+02 2.334e+02 3.522e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-16 11:08:32,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=435653.3333333333, ans=0.125 2024-09-16 11:08:53,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=435681.6666666667, ans=0.07 2024-09-16 11:08:53,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435681.6666666667, ans=0.1 2024-09-16 11:09:03,511 INFO [train.py:1198] (1/2) Epoch 25, batch 400, loss[loss=0.2267, ctc_loss=0.1498, cr_loss=0.3848, over 20802.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1547, cr_loss=0.3782, over 3558898.91 frames. ], batch size: 53, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:09:05,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=435710.0, ans=0.09899494936611666 2024-09-16 11:09:20,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=435738.3333333333, ans=0.125 2024-09-16 11:09:42,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.76 vs. limit=5.0 2024-09-16 11:09:56,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-16 11:09:57,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=435795.0, ans=0.125 2024-09-16 11:10:02,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=435823.3333333333, ans=0.125 2024-09-16 11:10:05,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=435823.3333333333, ans=0.0 2024-09-16 11:10:08,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=435823.3333333333, ans=0.025 2024-09-16 11:10:18,938 INFO [train.py:1198] (1/2) Epoch 25, batch 450, loss[loss=0.2085, ctc_loss=0.1369, cr_loss=0.3581, over 21046.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3779, over 3676466.87 frames. ], batch size: 53, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:10:58,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.127e+02 2.223e+02 2.369e+02 3.185e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-16 11:11:17,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=435965.0, ans=0.2 2024-09-16 11:11:20,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=435965.0, ans=0.125 2024-09-16 11:11:33,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-16 11:11:33,820 INFO [train.py:1198] (1/2) Epoch 25, batch 500, loss[loss=0.2008, ctc_loss=0.1304, cr_loss=0.3519, over 21005.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1561, cr_loss=0.3783, over 3752317.12 frames. ], batch size: 51, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:11:42,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435993.3333333333, ans=0.0 2024-09-16 11:12:10,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436050.0, ans=0.125 2024-09-16 11:12:48,638 INFO [train.py:1198] (1/2) Epoch 25, batch 550, loss[loss=0.2073, ctc_loss=0.1404, cr_loss=0.3347, over 20958.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3795, over 3831967.41 frames. ], batch size: 50, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:12:56,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-16 11:13:07,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-09-16 11:13:31,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.118e+02 2.233e+02 2.424e+02 4.104e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-16 11:13:53,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=436248.3333333333, ans=0.125 2024-09-16 11:14:09,441 INFO [train.py:1198] (1/2) Epoch 25, batch 600, loss[loss=0.2471, ctc_loss=0.1696, cr_loss=0.3877, over 20708.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1561, cr_loss=0.3777, over 3900408.27 frames. ], batch size: 66, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:14:12,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=436276.6666666667, ans=0.2 2024-09-16 11:14:54,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=436361.6666666667, ans=0.015 2024-09-16 11:15:08,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436390.0, ans=0.1 2024-09-16 11:15:25,469 INFO [train.py:1198] (1/2) Epoch 25, batch 650, loss[loss=0.2462, ctc_loss=0.1634, cr_loss=0.414, over 20783.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.156, cr_loss=0.3784, over 3945439.20 frames. ], batch size: 53, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:15:36,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436418.3333333333, ans=0.1 2024-09-16 11:16:05,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2024-09-16 11:16:05,892 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.108e+02 2.225e+02 2.386e+02 3.938e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-16 11:16:06,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=436475.0, ans=0.025 2024-09-16 11:16:09,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=436503.3333333333, ans=0.0 2024-09-16 11:16:16,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=436503.3333333333, ans=0.0 2024-09-16 11:16:18,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=436503.3333333333, ans=0.125 2024-09-16 11:16:22,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436503.3333333333, ans=0.1 2024-09-16 11:16:40,801 INFO [train.py:1198] (1/2) Epoch 25, batch 700, loss[loss=0.1928, ctc_loss=0.1277, cr_loss=0.3257, over 20373.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1551, cr_loss=0.3774, over 3989798.85 frames. ], batch size: 45, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:16:41,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=436560.0, ans=0.0 2024-09-16 11:17:01,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=436588.3333333333, ans=0.0 2024-09-16 11:17:20,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436616.6666666667, ans=0.125 2024-09-16 11:17:56,522 INFO [train.py:1198] (1/2) Epoch 25, batch 750, loss[loss=0.2553, ctc_loss=0.1782, cr_loss=0.3855, over 20822.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1557, cr_loss=0.378, over 4003859.89 frames. ], batch size: 65, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:18:32,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=436758.3333333333, ans=0.125 2024-09-16 11:18:35,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=436758.3333333333, ans=0.2 2024-09-16 11:18:36,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.158e+02 2.259e+02 2.423e+02 8.420e+02, threshold=4.519e+02, percent-clipped=2.0 2024-09-16 11:18:38,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=436758.3333333333, ans=0.0 2024-09-16 11:18:50,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436786.6666666667, ans=0.125 2024-09-16 11:18:55,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=436815.0, ans=0.0 2024-09-16 11:19:14,888 INFO [train.py:1198] (1/2) Epoch 25, batch 800, loss[loss=0.23, ctc_loss=0.1551, cr_loss=0.3743, over 21006.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3781, over 4032943.83 frames. ], batch size: 63, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:19:21,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436843.3333333333, ans=0.125 2024-09-16 11:19:22,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436843.3333333333, ans=0.1 2024-09-16 11:19:28,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-16 11:20:24,930 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:20:33,589 INFO [train.py:1198] (1/2) Epoch 25, batch 850, loss[loss=0.2467, ctc_loss=0.1685, cr_loss=0.3909, over 20653.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.3784, over 4054731.64 frames. ], batch size: 68, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:21:10,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437041.6666666667, ans=0.125 2024-09-16 11:21:14,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.155e+02 2.302e+02 2.474e+02 2.779e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-16 11:21:16,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=437041.6666666667, ans=0.125 2024-09-16 11:21:33,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-16 11:21:34,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=437098.3333333333, ans=0.2 2024-09-16 11:21:40,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437098.3333333333, ans=0.0 2024-09-16 11:21:46,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437098.3333333333, ans=0.0 2024-09-16 11:21:48,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437126.6666666667, ans=0.0 2024-09-16 11:21:49,819 INFO [train.py:1198] (1/2) Epoch 25, batch 900, loss[loss=0.1833, ctc_loss=0.1206, cr_loss=0.3135, over 20959.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.3788, over 4064492.63 frames. ], batch size: 51, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:22:15,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=437155.0, ans=0.05 2024-09-16 11:23:04,988 INFO [train.py:1198] (1/2) Epoch 25, batch 950, loss[loss=0.1997, ctc_loss=0.1319, cr_loss=0.3388, over 20950.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1556, cr_loss=0.3785, over 4078368.64 frames. ], batch size: 50, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:23:26,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=437296.6666666667, ans=0.125 2024-09-16 11:23:45,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.128e+02 2.232e+02 2.411e+02 3.684e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 11:23:45,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437325.0, ans=0.1 2024-09-16 11:23:50,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=437353.3333333333, ans=0.125 2024-09-16 11:23:54,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-16 11:24:02,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437353.3333333333, ans=0.1 2024-09-16 11:24:14,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=437381.6666666667, ans=0.125 2024-09-16 11:24:19,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=437410.0, ans=0.035 2024-09-16 11:24:20,264 INFO [train.py:1198] (1/2) Epoch 25, batch 1000, loss[loss=0.2215, ctc_loss=0.151, cr_loss=0.3526, over 20830.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1545, cr_loss=0.3771, over 4083956.67 frames. ], batch size: 59, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:24:31,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2024-09-16 11:24:47,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=437438.3333333333, ans=0.0 2024-09-16 11:25:01,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437466.6666666667, ans=0.1 2024-09-16 11:25:20,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-16 11:25:38,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=437523.3333333333, ans=0.125 2024-09-16 11:25:42,848 INFO [train.py:1198] (1/2) Epoch 25, batch 1050, loss[loss=0.2107, ctc_loss=0.1416, cr_loss=0.3452, over 20835.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1546, cr_loss=0.3769, over 4084574.09 frames. ], batch size: 59, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:26:23,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.109e+02 2.259e+02 2.389e+02 2.874e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-16 11:26:27,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=437636.6666666667, ans=0.0 2024-09-16 11:26:37,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=437636.6666666667, ans=0.125 2024-09-16 11:26:58,793 INFO [train.py:1198] (1/2) Epoch 25, batch 1100, loss[loss=0.2402, ctc_loss=0.1614, cr_loss=0.3944, over 21089.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1542, cr_loss=0.3757, over 4077882.24 frames. ], batch size: 59, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:27:17,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437721.6666666667, ans=0.125 2024-09-16 11:27:43,692 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:27:48,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=437778.3333333333, ans=0.125 2024-09-16 11:27:51,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 11:28:00,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2024-09-16 11:28:15,013 INFO [train.py:1198] (1/2) Epoch 25, batch 1150, loss[loss=0.2217, ctc_loss=0.1496, cr_loss=0.3607, over 20766.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1541, cr_loss=0.3748, over 4078225.72 frames. ], batch size: 56, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:28:26,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=437835.0, ans=0.05 2024-09-16 11:28:26,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=437835.0, ans=0.07 2024-09-16 11:28:42,852 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:28:55,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.095e+02 2.231e+02 2.395e+02 4.651e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 11:29:14,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437948.3333333333, ans=0.1 2024-09-16 11:29:31,011 INFO [train.py:1198] (1/2) Epoch 25, batch 1200, loss[loss=0.2244, ctc_loss=0.1487, cr_loss=0.3783, over 20867.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1549, cr_loss=0.3762, over 4089191.82 frames. ], batch size: 54, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:29:48,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=438005.0, ans=0.125 2024-09-16 11:30:12,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=438033.3333333333, ans=0.0 2024-09-16 11:30:16,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438061.6666666667, ans=0.1 2024-09-16 11:30:34,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=438090.0, ans=0.025 2024-09-16 11:30:46,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=438090.0, ans=0.125 2024-09-16 11:30:49,249 INFO [train.py:1198] (1/2) Epoch 25, batch 1250, loss[loss=0.239, ctc_loss=0.1629, cr_loss=0.3803, over 20965.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1549, cr_loss=0.3758, over 4090737.27 frames. ], batch size: 58, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:30:52,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=438118.3333333333, ans=0.125 2024-09-16 11:30:57,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=438118.3333333333, ans=0.125 2024-09-16 11:31:00,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=438118.3333333333, ans=0.0 2024-09-16 11:31:18,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=438146.6666666667, ans=0.125 2024-09-16 11:31:33,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.109e+02 2.249e+02 2.363e+02 4.221e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 11:31:38,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=438203.3333333333, ans=0.125 2024-09-16 11:31:45,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=438203.3333333333, ans=0.05 2024-09-16 11:32:08,138 INFO [train.py:1198] (1/2) Epoch 25, batch 1300, loss[loss=0.2693, ctc_loss=0.1801, cr_loss=0.4458, over 20846.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1552, cr_loss=0.3768, over 4098663.32 frames. ], batch size: 65, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:32:30,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=438288.3333333333, ans=0.125 2024-09-16 11:32:32,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-16 11:32:41,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=438316.6666666667, ans=0.2 2024-09-16 11:32:44,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=438316.6666666667, ans=0.125 2024-09-16 11:32:56,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438345.0, ans=0.125 2024-09-16 11:33:04,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=438345.0, ans=0.5 2024-09-16 11:33:23,581 INFO [train.py:1198] (1/2) Epoch 25, batch 1350, loss[loss=0.2348, ctc_loss=0.1589, cr_loss=0.3795, over 19532.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1552, cr_loss=0.3766, over 4091406.36 frames. ], batch size: 90, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:34:04,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.089e+02 2.196e+02 2.363e+02 3.106e+02, threshold=4.393e+02, percent-clipped=0.0 2024-09-16 11:34:21,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-16 11:34:30,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=22.5 2024-09-16 11:34:39,009 INFO [train.py:1198] (1/2) Epoch 25, batch 1400, loss[loss=0.1951, ctc_loss=0.1291, cr_loss=0.33, over 20965.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1551, cr_loss=0.3761, over 4082785.91 frames. ], batch size: 49, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:34:39,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438543.3333333333, ans=0.125 2024-09-16 11:34:56,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-16 11:34:56,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-16 11:35:08,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=438600.0, ans=0.125 2024-09-16 11:35:26,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438628.3333333333, ans=0.1 2024-09-16 11:35:55,142 INFO [train.py:1198] (1/2) Epoch 25, batch 1450, loss[loss=0.2425, ctc_loss=0.1619, cr_loss=0.4034, over 20828.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1554, cr_loss=0.3768, over 4085136.12 frames. ], batch size: 65, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:35:55,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438685.0, ans=0.1 2024-09-16 11:36:04,618 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:36:09,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-16 11:36:10,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=438713.3333333333, ans=0.2 2024-09-16 11:36:35,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-16 11:36:38,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.692e+02 2.070e+02 2.226e+02 2.366e+02 7.202e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 11:36:50,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438770.0, ans=0.1 2024-09-16 11:37:11,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=438798.3333333333, ans=0.04949747468305833 2024-09-16 11:37:15,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=438826.6666666667, ans=0.2 2024-09-16 11:37:15,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=438826.6666666667, ans=0.0 2024-09-16 11:37:16,840 INFO [train.py:1198] (1/2) Epoch 25, batch 1500, loss[loss=0.2452, ctc_loss=0.1664, cr_loss=0.3938, over 20724.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1547, cr_loss=0.3756, over 4094061.08 frames. ], batch size: 68, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:37:37,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-16 11:37:55,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=15.0 2024-09-16 11:38:08,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-09-16 11:38:27,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=438940.0, ans=0.125 2024-09-16 11:38:29,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=438940.0, ans=0.2 2024-09-16 11:38:31,989 INFO [train.py:1198] (1/2) Epoch 25, batch 1550, loss[loss=0.2592, ctc_loss=0.1806, cr_loss=0.3929, over 19489.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1551, cr_loss=0.3767, over 4092037.69 frames. ], batch size: 90, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:39:10,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439025.0, ans=0.1 2024-09-16 11:39:12,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.109e+02 2.219e+02 2.398e+02 2.849e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 11:39:21,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=439053.3333333333, ans=0.125 2024-09-16 11:39:39,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=439081.6666666667, ans=0.125 2024-09-16 11:39:47,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=439110.0, ans=0.0 2024-09-16 11:39:48,415 INFO [train.py:1198] (1/2) Epoch 25, batch 1600, loss[loss=0.2189, ctc_loss=0.148, cr_loss=0.3543, over 20975.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1557, cr_loss=0.3776, over 4094153.02 frames. ], batch size: 58, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:39:54,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=439110.0, ans=0.0 2024-09-16 11:40:29,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=439166.6666666667, ans=0.125 2024-09-16 11:41:04,791 INFO [train.py:1198] (1/2) Epoch 25, batch 1650, loss[loss=0.2593, ctc_loss=0.1758, cr_loss=0.4171, over 20834.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1561, cr_loss=0.3787, over 4098994.78 frames. ], batch size: 65, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:41:08,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=439251.6666666667, ans=0.0 2024-09-16 11:41:26,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-16 11:41:45,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.122e+02 2.248e+02 2.389e+02 3.064e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 11:41:50,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=439336.6666666667, ans=0.025 2024-09-16 11:41:56,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=439336.6666666667, ans=0.2 2024-09-16 11:42:23,681 INFO [train.py:1198] (1/2) Epoch 25, batch 1700, loss[loss=0.2353, ctc_loss=0.1588, cr_loss=0.3825, over 20948.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.3777, over 4099167.53 frames. ], batch size: 49, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:42:25,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439393.3333333333, ans=0.125 2024-09-16 11:43:00,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=439450.0, ans=0.2 2024-09-16 11:43:03,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=439450.0, ans=0.125 2024-09-16 11:43:03,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=439450.0, ans=0.025 2024-09-16 11:43:12,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-16 11:43:21,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=439478.3333333333, ans=0.125 2024-09-16 11:43:28,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=439506.6666666667, ans=0.125 2024-09-16 11:43:41,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=439535.0, ans=0.0 2024-09-16 11:43:42,317 INFO [train.py:1198] (1/2) Epoch 25, batch 1750, loss[loss=0.2559, ctc_loss=0.1771, cr_loss=0.3942, over 18139.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1551, cr_loss=0.3766, over 4084645.48 frames. ], batch size: 108, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:43:57,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=439563.3333333333, ans=0.125 2024-09-16 11:44:21,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=439591.6666666667, ans=0.0 2024-09-16 11:44:23,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=439591.6666666667, ans=0.125 2024-09-16 11:44:24,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.091e+02 2.217e+02 2.370e+02 3.258e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 11:44:58,088 INFO [train.py:1198] (1/2) Epoch 25, batch 1800, loss[loss=0.2455, ctc_loss=0.168, cr_loss=0.3878, over 20044.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1556, cr_loss=0.3769, over 4089022.37 frames. ], batch size: 80, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:45:13,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=439705.0, ans=0.125 2024-09-16 11:45:30,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=22.5 2024-09-16 11:45:38,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=439733.3333333333, ans=0.0 2024-09-16 11:45:46,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.49 vs. limit=10.0 2024-09-16 11:46:00,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=439790.0, ans=0.0 2024-09-16 11:46:03,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=439790.0, ans=0.125 2024-09-16 11:46:14,149 INFO [train.py:1198] (1/2) Epoch 25, batch 1850, loss[loss=0.2648, ctc_loss=0.1832, cr_loss=0.4079, over 20871.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1554, cr_loss=0.3762, over 4077639.59 frames. ], batch size: 65, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:46:56,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.129e+02 2.241e+02 2.368e+02 4.107e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 11:47:29,946 INFO [train.py:1198] (1/2) Epoch 25, batch 1900, loss[loss=0.2329, ctc_loss=0.1607, cr_loss=0.3611, over 19407.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1555, cr_loss=0.3767, over 4084864.61 frames. ], batch size: 90, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:48:09,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-16 11:48:14,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=440016.6666666667, ans=0.125 2024-09-16 11:48:47,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=440073.3333333333, ans=0.125 2024-09-16 11:48:50,758 INFO [train.py:1198] (1/2) Epoch 25, batch 1950, loss[loss=0.2788, ctc_loss=0.1976, cr_loss=0.406, over 19474.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1553, cr_loss=0.3771, over 4088938.71 frames. ], batch size: 90, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:49:07,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=440130.0, ans=0.0 2024-09-16 11:49:27,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=22.5 2024-09-16 11:49:32,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.107e+02 2.217e+02 2.374e+02 3.462e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 11:50:05,902 INFO [train.py:1198] (1/2) Epoch 25, batch 2000, loss[loss=0.2321, ctc_loss=0.1571, cr_loss=0.3746, over 21051.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.378, over 4087659.95 frames. ], batch size: 56, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:50:09,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-16 11:50:22,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440271.6666666667, ans=0.1 2024-09-16 11:50:44,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-09-16 11:50:51,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-09-16 11:50:53,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-16 11:51:21,448 INFO [train.py:1198] (1/2) Epoch 25, batch 2050, loss[loss=0.2604, ctc_loss=0.1832, cr_loss=0.3862, over 14835.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1554, cr_loss=0.3778, over 4087140.10 frames. ], batch size: 149, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:51:23,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-16 11:51:32,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=440385.0, ans=0.125 2024-09-16 11:51:33,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=440385.0, ans=0.0 2024-09-16 11:52:03,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.081e+02 2.211e+02 2.389e+02 8.247e+02, threshold=4.422e+02, percent-clipped=1.0 2024-09-16 11:52:10,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=15.0 2024-09-16 11:52:36,731 INFO [train.py:1198] (1/2) Epoch 25, batch 2100, loss[loss=0.2207, ctc_loss=0.1488, cr_loss=0.3595, over 20775.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1561, cr_loss=0.378, over 4074157.36 frames. ], batch size: 56, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:53:05,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=440583.3333333333, ans=0.2 2024-09-16 11:53:42,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=440640.0, ans=0.0 2024-09-16 11:53:54,015 INFO [train.py:1198] (1/2) Epoch 25, batch 2150, loss[loss=0.2273, ctc_loss=0.1532, cr_loss=0.3704, over 20963.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1557, cr_loss=0.3773, over 4079024.10 frames. ], batch size: 50, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:54:08,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=440696.6666666667, ans=0.0 2024-09-16 11:54:15,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=440696.6666666667, ans=0.125 2024-09-16 11:54:36,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=440725.0, ans=0.0 2024-09-16 11:54:37,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-16 11:54:38,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=440725.0, ans=0.2 2024-09-16 11:54:39,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.100e+02 2.245e+02 2.441e+02 3.108e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 11:55:05,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440781.6666666667, ans=0.1 2024-09-16 11:55:12,579 INFO [train.py:1198] (1/2) Epoch 25, batch 2200, loss[loss=0.2289, ctc_loss=0.1526, cr_loss=0.3815, over 20841.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.156, cr_loss=0.3779, over 4076329.61 frames. ], batch size: 65, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:55:29,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=440838.3333333333, ans=0.0 2024-09-16 11:55:57,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=440895.0, ans=0.2 2024-09-16 11:55:58,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-16 11:56:00,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=440895.0, ans=0.125 2024-09-16 11:56:28,268 INFO [train.py:1198] (1/2) Epoch 25, batch 2250, loss[loss=0.2339, ctc_loss=0.1575, cr_loss=0.382, over 21050.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1548, cr_loss=0.3764, over 4090526.14 frames. ], batch size: 56, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:56:34,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440951.6666666667, ans=0.125 2024-09-16 11:56:36,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=440951.6666666667, ans=0.2 2024-09-16 11:57:08,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-16 11:57:11,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.095e+02 2.232e+02 2.386e+02 3.697e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 11:57:37,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441065.0, ans=0.1 2024-09-16 11:57:44,496 INFO [train.py:1198] (1/2) Epoch 25, batch 2300, loss[loss=0.2287, ctc_loss=0.152, cr_loss=0.3834, over 21065.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1549, cr_loss=0.377, over 4091088.10 frames. ], batch size: 56, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:57:53,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-16 11:57:58,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=441121.6666666667, ans=0.125 2024-09-16 11:58:37,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=441178.3333333333, ans=0.125 2024-09-16 11:58:49,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=441206.6666666667, ans=0.0 2024-09-16 11:58:54,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=441206.6666666667, ans=0.125 2024-09-16 11:59:00,365 INFO [train.py:1198] (1/2) Epoch 25, batch 2350, loss[loss=0.3035, ctc_loss=0.2187, cr_loss=0.424, over 14437.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1555, cr_loss=0.3771, over 4068986.02 frames. ], batch size: 152, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:59:02,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=441235.0, ans=0.025 2024-09-16 11:59:30,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=441263.3333333333, ans=0.2 2024-09-16 11:59:33,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=441291.6666666667, ans=0.04949747468305833 2024-09-16 11:59:36,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=441291.6666666667, ans=0.0 2024-09-16 11:59:39,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=441291.6666666667, ans=0.125 2024-09-16 11:59:42,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=441291.6666666667, ans=0.0 2024-09-16 11:59:42,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=441291.6666666667, ans=0.0 2024-09-16 11:59:45,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.115e+02 2.271e+02 2.439e+02 3.151e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 12:00:21,659 INFO [train.py:1198] (1/2) Epoch 25, batch 2400, loss[loss=0.196, ctc_loss=0.1312, cr_loss=0.3244, over 20938.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1541, cr_loss=0.3748, over 4090471.47 frames. ], batch size: 48, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:00:45,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-09-16 12:00:51,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441433.3333333333, ans=0.125 2024-09-16 12:01:27,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441490.0, ans=0.125 2024-09-16 12:01:36,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=441518.3333333333, ans=0.125 2024-09-16 12:01:37,869 INFO [train.py:1198] (1/2) Epoch 25, batch 2450, loss[loss=0.2635, ctc_loss=0.1824, cr_loss=0.4058, over 18427.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.376, over 4073597.46 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:02:20,216 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.147e+02 2.262e+02 2.417e+02 3.603e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-16 12:02:29,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=441603.3333333333, ans=0.035 2024-09-16 12:02:53,836 INFO [train.py:1198] (1/2) Epoch 25, batch 2500, loss[loss=0.1899, ctc_loss=0.1245, cr_loss=0.3269, over 20900.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.3757, over 4076356.44 frames. ], batch size: 54, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:02:55,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=441660.0, ans=0.125 2024-09-16 12:03:06,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441660.0, ans=0.125 2024-09-16 12:03:15,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441688.3333333333, ans=0.1 2024-09-16 12:03:19,951 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:03:24,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=441716.6666666667, ans=0.5 2024-09-16 12:03:40,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=441745.0, ans=0.125 2024-09-16 12:04:09,686 INFO [train.py:1198] (1/2) Epoch 25, batch 2550, loss[loss=0.2608, ctc_loss=0.1764, cr_loss=0.4217, over 19584.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1554, cr_loss=0.376, over 4084153.30 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:04:17,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441801.6666666667, ans=0.125 2024-09-16 12:04:54,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.101e+02 2.226e+02 2.396e+02 3.958e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-16 12:05:17,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=441915.0, ans=0.125 2024-09-16 12:05:25,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=441915.0, ans=0.125 2024-09-16 12:05:28,253 INFO [train.py:1198] (1/2) Epoch 25, batch 2600, loss[loss=0.2748, ctc_loss=0.1888, cr_loss=0.4303, over 20010.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1557, cr_loss=0.377, over 4081581.41 frames. ], batch size: 80, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:05:50,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=12.0 2024-09-16 12:05:55,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=441971.6666666667, ans=0.125 2024-09-16 12:05:57,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-16 12:06:44,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=442056.6666666667, ans=0.0 2024-09-16 12:06:47,605 INFO [train.py:1198] (1/2) Epoch 25, batch 2650, loss[loss=0.2535, ctc_loss=0.1775, cr_loss=0.3798, over 20725.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1549, cr_loss=0.3758, over 4087222.05 frames. ], batch size: 68, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:07:01,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=442113.3333333333, ans=0.035 2024-09-16 12:07:03,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=442113.3333333333, ans=0.025 2024-09-16 12:07:29,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.116e+02 2.227e+02 2.422e+02 3.317e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-16 12:07:33,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2024-09-16 12:08:02,702 INFO [train.py:1198] (1/2) Epoch 25, batch 2700, loss[loss=0.2485, ctc_loss=0.1659, cr_loss=0.4129, over 20611.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1546, cr_loss=0.3749, over 4083599.54 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:08:27,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=442255.0, ans=0.2 2024-09-16 12:08:33,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=442283.3333333333, ans=0.2 2024-09-16 12:08:53,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=442311.6666666667, ans=0.2 2024-09-16 12:08:56,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=442311.6666666667, ans=0.125 2024-09-16 12:09:03,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442340.0, ans=0.0 2024-09-16 12:09:08,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-16 12:09:12,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=442340.0, ans=0.04949747468305833 2024-09-16 12:09:18,660 INFO [train.py:1198] (1/2) Epoch 25, batch 2750, loss[loss=0.2276, ctc_loss=0.152, cr_loss=0.3775, over 20818.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1536, cr_loss=0.3741, over 4088370.15 frames. ], batch size: 59, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:09:38,687 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:10:00,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.706e+02 2.054e+02 2.140e+02 2.316e+02 5.148e+02, threshold=4.279e+02, percent-clipped=1.0 2024-09-16 12:10:33,967 INFO [train.py:1198] (1/2) Epoch 25, batch 2800, loss[loss=0.2776, ctc_loss=0.1924, cr_loss=0.4259, over 18003.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1536, cr_loss=0.3742, over 4096035.25 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:11:30,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=442595.0, ans=0.0 2024-09-16 12:11:32,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2024-09-16 12:11:51,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2024-09-16 12:11:55,873 INFO [train.py:1198] (1/2) Epoch 25, batch 2850, loss[loss=0.2031, ctc_loss=0.1325, cr_loss=0.3529, over 19854.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3763, over 4087542.08 frames. ], batch size: 44, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:11:56,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=442651.6666666667, ans=0.125 2024-09-16 12:12:08,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-16 12:12:19,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-16 12:12:29,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=442708.3333333333, ans=0.025 2024-09-16 12:12:37,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-16 12:12:37,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.093e+02 2.267e+02 2.452e+02 5.204e+02, threshold=4.534e+02, percent-clipped=1.0 2024-09-16 12:13:10,844 INFO [train.py:1198] (1/2) Epoch 25, batch 2900, loss[loss=0.2798, ctc_loss=0.1941, cr_loss=0.4283, over 18007.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1557, cr_loss=0.3781, over 4089618.71 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:13:48,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442850.0, ans=0.0 2024-09-16 12:13:53,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=442850.0, ans=0.0 2024-09-16 12:13:55,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-16 12:13:56,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=442878.3333333333, ans=0.025 2024-09-16 12:14:09,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=442906.6666666667, ans=0.0 2024-09-16 12:14:17,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=442906.6666666667, ans=0.0 2024-09-16 12:14:26,102 INFO [train.py:1198] (1/2) Epoch 25, batch 2950, loss[loss=0.1934, ctc_loss=0.1274, cr_loss=0.3297, over 20974.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3779, over 4083386.78 frames. ], batch size: 49, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:14:44,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=442963.3333333333, ans=0.0 2024-09-16 12:14:57,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=442991.6666666667, ans=0.2 2024-09-16 12:15:08,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.045e+02 2.167e+02 2.296e+02 5.762e+02, threshold=4.335e+02, percent-clipped=1.0 2024-09-16 12:15:11,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=443020.0, ans=0.0 2024-09-16 12:15:20,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=443020.0, ans=0.0 2024-09-16 12:15:36,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=443048.3333333333, ans=0.0 2024-09-16 12:15:41,121 INFO [train.py:1198] (1/2) Epoch 25, batch 3000, loss[loss=0.2046, ctc_loss=0.1334, cr_loss=0.3561, over 20980.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1547, cr_loss=0.3768, over 4097186.60 frames. ], batch size: 48, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:15:41,121 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 12:15:53,530 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.1506, 5.6238, 5.6485, 5.9258], device='cuda:1') 2024-09-16 12:16:05,256 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.0425, ctc_loss=0.0425, cr_loss=1.166e-14, over 944034.00 frames. 2024-09-16 12:16:05,257 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 12:16:19,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-16 12:16:21,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=22.5 2024-09-16 12:16:58,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=443161.6666666667, ans=0.025 2024-09-16 12:17:05,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443161.6666666667, ans=0.1 2024-09-16 12:17:06,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443161.6666666667, ans=0.125 2024-09-16 12:17:27,398 INFO [train.py:1198] (1/2) Epoch 25, batch 3050, loss[loss=0.2928, ctc_loss=0.2071, cr_loss=0.4283, over 14236.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1554, cr_loss=0.3775, over 4080816.64 frames. ], batch size: 151, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:17:46,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=443246.6666666667, ans=0.2 2024-09-16 12:17:56,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=443275.0, ans=0.0 2024-09-16 12:17:58,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=443275.0, ans=0.0 2024-09-16 12:18:02,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443275.0, ans=0.125 2024-09-16 12:18:07,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-16 12:18:09,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2024-09-16 12:18:09,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.144e+02 2.277e+02 2.434e+02 3.518e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-16 12:18:43,444 INFO [train.py:1198] (1/2) Epoch 25, batch 3100, loss[loss=0.2307, ctc_loss=0.1556, cr_loss=0.3758, over 20999.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3794, over 4086617.66 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:19:22,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443416.6666666667, ans=0.1 2024-09-16 12:19:28,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=443445.0, ans=0.09899494936611666 2024-09-16 12:19:50,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=443473.3333333333, ans=0.0 2024-09-16 12:19:53,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443473.3333333333, ans=0.1 2024-09-16 12:19:58,613 INFO [train.py:1198] (1/2) Epoch 25, batch 3150, loss[loss=0.2406, ctc_loss=0.1615, cr_loss=0.3956, over 19983.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1565, cr_loss=0.3799, over 4091463.73 frames. ], batch size: 44, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:20:13,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=443530.0, ans=0.125 2024-09-16 12:20:35,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443558.3333333333, ans=0.125 2024-09-16 12:20:40,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.079e+02 2.229e+02 2.428e+02 4.413e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 12:20:51,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=443586.6666666667, ans=0.2 2024-09-16 12:20:53,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=443586.6666666667, ans=0.125 2024-09-16 12:21:11,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-16 12:21:14,333 INFO [train.py:1198] (1/2) Epoch 25, batch 3200, loss[loss=0.2654, ctc_loss=0.1787, cr_loss=0.4333, over 20873.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1564, cr_loss=0.38, over 4096817.09 frames. ], batch size: 65, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:21:25,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2024-09-16 12:21:42,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-16 12:22:32,977 INFO [train.py:1198] (1/2) Epoch 25, batch 3250, loss[loss=0.2718, ctc_loss=0.1852, cr_loss=0.4329, over 20677.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1561, cr_loss=0.3794, over 4106800.07 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:22:36,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=443785.0, ans=0.0 2024-09-16 12:22:37,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443785.0, ans=0.1 2024-09-16 12:22:58,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-16 12:23:15,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=443841.6666666667, ans=0.2 2024-09-16 12:23:18,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.065e+02 2.203e+02 2.349e+02 3.896e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 12:23:20,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=443870.0, ans=0.0 2024-09-16 12:23:51,737 INFO [train.py:1198] (1/2) Epoch 25, batch 3300, loss[loss=0.2297, ctc_loss=0.1524, cr_loss=0.3861, over 20982.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3785, over 4109652.43 frames. ], batch size: 63, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:23:55,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443926.6666666667, ans=0.125 2024-09-16 12:24:02,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443926.6666666667, ans=0.125 2024-09-16 12:24:13,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443955.0, ans=0.1 2024-09-16 12:24:38,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=444011.6666666667, ans=0.025 2024-09-16 12:24:56,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-09-16 12:24:59,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-09-16 12:25:07,660 INFO [train.py:1198] (1/2) Epoch 25, batch 3350, loss[loss=0.2497, ctc_loss=0.1722, cr_loss=0.3871, over 20698.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.155, cr_loss=0.3775, over 4108233.68 frames. ], batch size: 68, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:25:17,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=444068.3333333333, ans=0.04949747468305833 2024-09-16 12:25:19,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-16 12:25:30,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=444096.6666666667, ans=0.125 2024-09-16 12:25:50,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.098e+02 2.234e+02 2.376e+02 4.221e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-16 12:26:24,055 INFO [train.py:1198] (1/2) Epoch 25, batch 3400, loss[loss=0.2384, ctc_loss=0.1599, cr_loss=0.3926, over 20355.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1546, cr_loss=0.3772, over 4114715.60 frames. ], batch size: 74, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:27:09,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-16 12:27:34,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444323.3333333333, ans=0.1 2024-09-16 12:27:40,325 INFO [train.py:1198] (1/2) Epoch 25, batch 3450, loss[loss=0.246, ctc_loss=0.1658, cr_loss=0.401, over 20868.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3775, over 4118301.07 frames. ], batch size: 57, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:28:12,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=444408.3333333333, ans=0.0 2024-09-16 12:28:20,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=444408.3333333333, ans=0.0 2024-09-16 12:28:25,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.081e+02 2.215e+02 2.442e+02 5.986e+02, threshold=4.430e+02, percent-clipped=1.0 2024-09-16 12:28:30,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=444436.6666666667, ans=0.125 2024-09-16 12:28:42,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=444465.0, ans=0.07 2024-09-16 12:29:01,821 INFO [train.py:1198] (1/2) Epoch 25, batch 3500, loss[loss=0.2352, ctc_loss=0.1594, cr_loss=0.3792, over 21001.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.155, cr_loss=0.3775, over 4112468.25 frames. ], batch size: 61, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:29:13,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-16 12:29:13,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-16 12:29:27,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444521.6666666667, ans=0.0 2024-09-16 12:29:39,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=444550.0, ans=0.0 2024-09-16 12:30:03,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=444606.6666666667, ans=0.025 2024-09-16 12:30:17,204 INFO [train.py:1198] (1/2) Epoch 25, batch 3550, loss[loss=0.2213, ctc_loss=0.1461, cr_loss=0.3759, over 21062.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.3759, over 4118641.32 frames. ], batch size: 56, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:30:29,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444635.0, ans=0.1 2024-09-16 12:30:51,494 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:31:00,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.123e+02 2.257e+02 2.460e+02 4.749e+02, threshold=4.514e+02, percent-clipped=1.0 2024-09-16 12:31:17,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=444748.3333333333, ans=0.125 2024-09-16 12:31:33,425 INFO [train.py:1198] (1/2) Epoch 25, batch 3600, loss[loss=0.2275, ctc_loss=0.1541, cr_loss=0.367, over 21044.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3774, over 4103797.08 frames. ], batch size: 53, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:31:41,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=444776.6666666667, ans=0.125 2024-09-16 12:32:08,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=444833.3333333333, ans=0.125 2024-09-16 12:32:19,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444861.6666666667, ans=0.0 2024-09-16 12:32:43,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=444890.0, ans=0.05 2024-09-16 12:32:49,380 INFO [train.py:1198] (1/2) Epoch 25, batch 3650, loss[loss=0.2302, ctc_loss=0.155, cr_loss=0.3762, over 20650.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1551, cr_loss=0.3772, over 4097933.77 frames. ], batch size: 71, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:32:52,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=444918.3333333333, ans=0.125 2024-09-16 12:33:07,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444946.6666666667, ans=0.1 2024-09-16 12:33:34,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.094e+02 2.222e+02 2.356e+02 2.929e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-16 12:34:08,497 INFO [train.py:1198] (1/2) Epoch 25, batch 3700, loss[loss=0.2781, ctc_loss=0.1862, cr_loss=0.4593, over 18254.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1549, cr_loss=0.377, over 4097131.71 frames. ], batch size: 108, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:34:11,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=445060.0, ans=0.125 2024-09-16 12:34:22,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=445088.3333333333, ans=0.125 2024-09-16 12:34:33,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=445088.3333333333, ans=0.125 2024-09-16 12:35:12,661 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:35:15,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445173.3333333333, ans=0.1 2024-09-16 12:35:27,549 INFO [train.py:1198] (1/2) Epoch 25, batch 3750, loss[loss=0.2303, ctc_loss=0.153, cr_loss=0.3865, over 20973.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.378, over 4104089.59 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:35:37,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=445201.6666666667, ans=0.025 2024-09-16 12:36:10,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.048e+02 2.174e+02 2.276e+02 2.887e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-16 12:36:36,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=445315.0, ans=0.2 2024-09-16 12:36:43,495 INFO [train.py:1198] (1/2) Epoch 25, batch 3800, loss[loss=0.2361, ctc_loss=0.1588, cr_loss=0.3863, over 20228.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3775, over 4095972.24 frames. ], batch size: 74, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:36:57,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=445371.6666666667, ans=0.2 2024-09-16 12:37:00,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=12.0 2024-09-16 12:37:04,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=445371.6666666667, ans=0.125 2024-09-16 12:37:24,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445400.0, ans=0.125 2024-09-16 12:37:59,384 INFO [train.py:1198] (1/2) Epoch 25, batch 3850, loss[loss=0.1961, ctc_loss=0.1261, cr_loss=0.3498, over 20995.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1547, cr_loss=0.3768, over 4102601.47 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:38:33,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=445541.6666666667, ans=0.125 2024-09-16 12:38:43,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.127e+02 2.257e+02 2.453e+02 4.128e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-16 12:38:44,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=445570.0, ans=0.0 2024-09-16 12:39:08,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2024-09-16 12:39:18,233 INFO [train.py:1198] (1/2) Epoch 25, batch 3900, loss[loss=0.2024, ctc_loss=0.1353, cr_loss=0.3353, over 20952.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.378, over 4102416.96 frames. ], batch size: 50, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:39:27,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=445626.6666666667, ans=0.125 2024-09-16 12:39:29,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=445626.6666666667, ans=0.09899494936611666 2024-09-16 12:39:51,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=445683.3333333333, ans=0.2 2024-09-16 12:40:36,450 INFO [train.py:1198] (1/2) Epoch 25, batch 3950, loss[loss=0.2569, ctc_loss=0.1751, cr_loss=0.4091, over 20957.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3795, over 4089821.28 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:41:05,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=445825.0, ans=0.0 2024-09-16 12:41:16,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=445825.0, ans=0.125 2024-09-16 12:41:20,323 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.095e+02 2.214e+02 2.396e+02 3.784e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-16 12:41:31,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-16 12:41:52,097 INFO [train.py:1198] (1/2) Epoch 25, batch 4000, loss[loss=0.2612, ctc_loss=0.1766, cr_loss=0.4229, over 20644.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1565, cr_loss=0.3796, over 4078384.71 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:43:05,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=446023.3333333333, ans=0.125 2024-09-16 12:43:08,185 INFO [train.py:1198] (1/2) Epoch 25, batch 4050, loss[loss=0.2442, ctc_loss=0.1637, cr_loss=0.4028, over 21047.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1565, cr_loss=0.3797, over 4088867.37 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:43:16,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-09-16 12:43:35,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-16 12:43:37,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=446108.3333333333, ans=0.125 2024-09-16 12:43:51,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.099e+02 2.220e+02 2.385e+02 4.787e+02, threshold=4.440e+02, percent-clipped=1.0 2024-09-16 12:43:58,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=446136.6666666667, ans=0.0 2024-09-16 12:44:16,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=446165.0, ans=0.2 2024-09-16 12:44:23,850 INFO [train.py:1198] (1/2) Epoch 25, batch 4100, loss[loss=0.2166, ctc_loss=0.1461, cr_loss=0.3526, over 20961.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.379, over 4097243.35 frames. ], batch size: 52, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:44:27,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-16 12:44:32,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-16 12:44:55,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=446250.0, ans=0.125 2024-09-16 12:45:04,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=446250.0, ans=0.0 2024-09-16 12:45:41,697 INFO [train.py:1198] (1/2) Epoch 25, batch 4150, loss[loss=0.2241, ctc_loss=0.1507, cr_loss=0.3674, over 20971.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.157, cr_loss=0.3801, over 4089427.48 frames. ], batch size: 55, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:46:16,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2024-09-16 12:46:27,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.112e+02 2.214e+02 2.356e+02 4.198e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 12:46:32,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=446420.0, ans=0.025 2024-09-16 12:46:32,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=446420.0, ans=0.125 2024-09-16 12:46:45,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=446448.3333333333, ans=0.125 2024-09-16 12:46:47,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=446448.3333333333, ans=0.125 2024-09-16 12:46:51,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-16 12:47:00,053 INFO [train.py:1198] (1/2) Epoch 25, batch 4200, loss[loss=0.2719, ctc_loss=0.1873, cr_loss=0.4234, over 20735.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1573, cr_loss=0.3807, over 4089344.29 frames. ], batch size: 71, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:47:21,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446505.0, ans=0.1 2024-09-16 12:48:16,084 INFO [train.py:1198] (1/2) Epoch 25, batch 4250, loss[loss=0.2568, ctc_loss=0.1729, cr_loss=0.4195, over 20673.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1572, cr_loss=0.3808, over 4099258.82 frames. ], batch size: 68, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:48:51,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-16 12:49:00,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.107e+02 2.242e+02 2.391e+02 4.206e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 12:49:26,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=446731.6666666667, ans=0.125 2024-09-16 12:49:32,059 INFO [train.py:1198] (1/2) Epoch 25, batch 4300, loss[loss=0.233, ctc_loss=0.1547, cr_loss=0.3917, over 20642.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1561, cr_loss=0.3791, over 4100331.87 frames. ], batch size: 68, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:50:03,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-16 12:50:20,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=446845.0, ans=0.0 2024-09-16 12:50:50,644 INFO [train.py:1198] (1/2) Epoch 25, batch 4350, loss[loss=0.2909, ctc_loss=0.2036, cr_loss=0.4367, over 14608.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3784, over 4100123.70 frames. ], batch size: 149, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:51:18,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-09-16 12:51:29,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=446958.3333333333, ans=0.5 2024-09-16 12:51:34,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.281e+02 2.487e+02 2.742e+02 4.213e+02, threshold=4.974e+02, percent-clipped=0.0 2024-09-16 12:51:42,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=446986.6666666667, ans=0.125 2024-09-16 12:51:46,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446986.6666666667, ans=0.1 2024-09-16 12:51:55,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=447015.0, ans=0.0 2024-09-16 12:51:58,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=447015.0, ans=0.125 2024-09-16 12:51:58,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=22.5 2024-09-16 12:52:10,158 INFO [train.py:1198] (1/2) Epoch 25, batch 4400, loss[loss=0.2236, ctc_loss=0.1502, cr_loss=0.3666, over 20848.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3786, over 4100764.54 frames. ], batch size: 65, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:52:33,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=447071.6666666667, ans=0.025 2024-09-16 12:52:37,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447071.6666666667, ans=0.125 2024-09-16 12:53:01,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447128.3333333333, ans=0.1 2024-09-16 12:53:01,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-09-16 12:53:04,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=447128.3333333333, ans=0.0 2024-09-16 12:53:26,407 INFO [train.py:1198] (1/2) Epoch 25, batch 4450, loss[loss=0.2592, ctc_loss=0.1755, cr_loss=0.4186, over 20304.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1562, cr_loss=0.3795, over 4096825.19 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:53:49,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=447213.3333333333, ans=0.0 2024-09-16 12:54:10,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.107e+02 2.220e+02 2.363e+02 4.587e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 12:54:15,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=22.5 2024-09-16 12:54:22,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=447270.0, ans=0.0 2024-09-16 12:54:27,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-16 12:54:39,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447298.3333333333, ans=0.125 2024-09-16 12:54:42,004 INFO [train.py:1198] (1/2) Epoch 25, batch 4500, loss[loss=0.2528, ctc_loss=0.1757, cr_loss=0.3856, over 19323.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1568, cr_loss=0.381, over 4101461.83 frames. ], batch size: 90, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:54:54,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=447326.6666666667, ans=0.0 2024-09-16 12:55:00,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=447355.0, ans=0.0 2024-09-16 12:55:39,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=447411.6666666667, ans=0.125 2024-09-16 12:55:43,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2024-09-16 12:55:51,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447440.0, ans=0.125 2024-09-16 12:55:57,368 INFO [train.py:1198] (1/2) Epoch 25, batch 4550, loss[loss=0.2594, ctc_loss=0.178, cr_loss=0.4069, over 20634.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3796, over 4098377.02 frames. ], batch size: 71, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:56:23,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=447496.6666666667, ans=0.125 2024-09-16 12:56:23,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=447496.6666666667, ans=0.0 2024-09-16 12:56:42,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447525.0, ans=0.125 2024-09-16 12:56:44,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.085e+02 2.226e+02 2.370e+02 5.298e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 12:57:16,794 INFO [train.py:1198] (1/2) Epoch 25, batch 4600, loss[loss=0.2356, ctc_loss=0.1575, cr_loss=0.3905, over 21026.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1559, cr_loss=0.3787, over 4097988.83 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:57:18,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447610.0, ans=0.125 2024-09-16 12:57:32,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=447638.3333333333, ans=0.125 2024-09-16 12:57:56,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=447666.6666666667, ans=0.0 2024-09-16 12:58:03,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447695.0, ans=0.125 2024-09-16 12:58:35,450 INFO [train.py:1198] (1/2) Epoch 25, batch 4650, loss[loss=0.2242, ctc_loss=0.1503, cr_loss=0.3694, over 21024.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3778, over 4107578.31 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:58:43,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=15.0 2024-09-16 12:59:19,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.129e+02 2.221e+02 2.386e+02 4.102e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-16 12:59:50,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=447893.3333333333, ans=0.125 2024-09-16 12:59:52,000 INFO [train.py:1198] (1/2) Epoch 25, batch 4700, loss[loss=0.2422, ctc_loss=0.1658, cr_loss=0.3822, over 19413.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1556, cr_loss=0.3786, over 4098514.08 frames. ], batch size: 90, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:59:52,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447893.3333333333, ans=0.1 2024-09-16 13:00:02,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=447893.3333333333, ans=0.125 2024-09-16 13:00:06,128 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:00:22,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=447950.0, ans=0.035 2024-09-16 13:00:25,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=447950.0, ans=0.0 2024-09-16 13:00:26,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=447950.0, ans=0.0 2024-09-16 13:00:41,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=447978.3333333333, ans=0.125 2024-09-16 13:01:07,906 INFO [train.py:1198] (1/2) Epoch 25, batch 4750, loss[loss=0.2441, ctc_loss=0.1639, cr_loss=0.4007, over 20858.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1561, cr_loss=0.3786, over 4080133.09 frames. ], batch size: 57, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:01:20,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=448035.0, ans=0.2 2024-09-16 13:01:46,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=448091.6666666667, ans=0.125 2024-09-16 13:01:54,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.131e+02 2.228e+02 2.410e+02 5.492e+02, threshold=4.457e+02, percent-clipped=2.0 2024-09-16 13:02:07,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-16 13:02:22,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-16 13:02:26,645 INFO [train.py:1198] (1/2) Epoch 25, batch 4800, loss[loss=0.2324, ctc_loss=0.1584, cr_loss=0.3701, over 20826.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1559, cr_loss=0.3789, over 4091813.55 frames. ], batch size: 59, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:02:51,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2024-09-16 13:03:09,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448233.3333333333, ans=0.125 2024-09-16 13:03:45,068 INFO [train.py:1198] (1/2) Epoch 25, batch 4850, loss[loss=0.2892, ctc_loss=0.2047, cr_loss=0.4227, over 14094.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3795, over 4067602.36 frames. ], batch size: 150, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:03:45,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-16 13:04:29,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.092e+02 2.219e+02 2.385e+02 3.498e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 13:04:39,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-16 13:04:56,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=448431.6666666667, ans=0.125 2024-09-16 13:04:57,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=448431.6666666667, ans=0.025 2024-09-16 13:05:01,120 INFO [train.py:1198] (1/2) Epoch 25, batch 4900, loss[loss=0.2414, ctc_loss=0.1643, cr_loss=0.3855, over 20604.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3797, over 4076846.35 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:05:15,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=448488.3333333333, ans=0.0 2024-09-16 13:05:28,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448488.3333333333, ans=0.0 2024-09-16 13:05:34,681 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:05:42,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=448516.6666666667, ans=0.125 2024-09-16 13:05:46,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=448545.0, ans=0.2 2024-09-16 13:05:49,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=448545.0, ans=0.125 2024-09-16 13:06:13,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=448573.3333333333, ans=0.125 2024-09-16 13:06:16,113 INFO [train.py:1198] (1/2) Epoch 25, batch 4950, loss[loss=0.2033, ctc_loss=0.134, cr_loss=0.3465, over 20997.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.156, cr_loss=0.3794, over 4077434.30 frames. ], batch size: 48, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:06:19,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=448601.6666666667, ans=0.0 2024-09-16 13:06:59,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.109e+02 2.215e+02 2.399e+02 3.808e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 13:07:31,343 INFO [train.py:1198] (1/2) Epoch 25, batch 5000, loss[loss=0.2647, ctc_loss=0.1812, cr_loss=0.4173, over 20417.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1558, cr_loss=0.3789, over 4085054.59 frames. ], batch size: 74, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:07:34,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=448743.3333333333, ans=0.125 2024-09-16 13:07:55,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=448771.6666666667, ans=0.125 2024-09-16 13:08:13,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448800.0, ans=0.0 2024-09-16 13:08:20,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=448828.3333333333, ans=0.2 2024-09-16 13:08:45,681 INFO [train.py:1198] (1/2) Epoch 25, batch 5050, loss[loss=0.2293, ctc_loss=0.1529, cr_loss=0.3821, over 19399.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3766, over 4093712.80 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:08:47,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=448885.0, ans=6.0 2024-09-16 13:08:49,056 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:08:51,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=448885.0, ans=0.2 2024-09-16 13:09:05,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448913.3333333333, ans=0.1 2024-09-16 13:09:11,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=448913.3333333333, ans=0.125 2024-09-16 13:09:25,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448941.6666666667, ans=0.1 2024-09-16 13:09:31,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=448970.0, ans=0.07 2024-09-16 13:09:31,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=448970.0, ans=0.0 2024-09-16 13:09:32,821 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.077e+02 2.215e+02 2.352e+02 3.095e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-16 13:09:40,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.28 vs. limit=10.0 2024-09-16 13:09:50,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=448998.3333333333, ans=0.0 2024-09-16 13:09:55,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=448998.3333333333, ans=0.125 2024-09-16 13:09:59,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=448998.3333333333, ans=0.125 2024-09-16 13:10:02,551 INFO [train.py:1198] (1/2) Epoch 25, batch 5100, loss[loss=0.2234, ctc_loss=0.1496, cr_loss=0.3689, over 20826.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1537, cr_loss=0.3752, over 4101795.10 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:10:14,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=449026.6666666667, ans=0.125 2024-09-16 13:10:16,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=449055.0, ans=0.125 2024-09-16 13:10:47,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=449111.6666666667, ans=0.0 2024-09-16 13:11:03,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=449140.0, ans=0.125 2024-09-16 13:11:16,595 INFO [train.py:1198] (1/2) Epoch 25, batch 5150, loss[loss=0.2124, ctc_loss=0.1447, cr_loss=0.3386, over 21058.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1534, cr_loss=0.3742, over 4104535.97 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:11:36,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=449196.6666666667, ans=0.0 2024-09-16 13:11:39,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=449196.6666666667, ans=0.125 2024-09-16 13:11:40,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=449196.6666666667, ans=0.025 2024-09-16 13:12:04,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.149e+02 2.272e+02 2.416e+02 4.273e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 13:12:34,447 INFO [train.py:1198] (1/2) Epoch 25, batch 5200, loss[loss=0.2091, ctc_loss=0.1407, cr_loss=0.3421, over 20966.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.154, cr_loss=0.3759, over 4114052.58 frames. ], batch size: 49, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:12:46,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=449310.0, ans=0.125 2024-09-16 13:13:04,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=449366.6666666667, ans=0.95 2024-09-16 13:13:10,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=449366.6666666667, ans=0.0 2024-09-16 13:13:44,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449423.3333333333, ans=0.125 2024-09-16 13:13:48,969 INFO [train.py:1198] (1/2) Epoch 25, batch 5250, loss[loss=0.2174, ctc_loss=0.1444, cr_loss=0.3648, over 21003.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1538, cr_loss=0.3757, over 4114413.92 frames. ], batch size: 52, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:14:12,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=449480.0, ans=0.125 2024-09-16 13:14:33,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.126e+02 2.233e+02 2.413e+02 3.021e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 13:14:35,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=449536.6666666667, ans=0.025 2024-09-16 13:14:40,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-16 13:15:02,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=449593.3333333333, ans=0.0 2024-09-16 13:15:03,830 INFO [train.py:1198] (1/2) Epoch 25, batch 5300, loss[loss=0.2356, ctc_loss=0.1571, cr_loss=0.3925, over 20842.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1531, cr_loss=0.3745, over 4118825.40 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:15:19,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=449621.6666666667, ans=0.125 2024-09-16 13:15:20,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449621.6666666667, ans=0.125 2024-09-16 13:15:23,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449621.6666666667, ans=0.1 2024-09-16 13:15:29,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=449621.6666666667, ans=0.04949747468305833 2024-09-16 13:15:47,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=449678.3333333333, ans=0.025 2024-09-16 13:15:52,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=449678.3333333333, ans=0.025 2024-09-16 13:16:18,503 INFO [train.py:1198] (1/2) Epoch 25, batch 5350, loss[loss=0.2403, ctc_loss=0.1655, cr_loss=0.3737, over 18225.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1535, cr_loss=0.3753, over 4119441.80 frames. ], batch size: 108, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:16:39,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-16 13:16:40,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=449763.3333333333, ans=0.125 2024-09-16 13:17:03,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.149e+02 2.241e+02 2.476e+02 2.922e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 13:17:15,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449820.0, ans=0.125 2024-09-16 13:17:25,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=449848.3333333333, ans=15.0 2024-09-16 13:17:33,383 INFO [train.py:1198] (1/2) Epoch 25, batch 5400, loss[loss=0.2035, ctc_loss=0.1356, cr_loss=0.3396, over 21061.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1533, cr_loss=0.3755, over 4123245.81 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:17:35,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=449876.6666666667, ans=0.2 2024-09-16 13:17:47,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-09-16 13:17:58,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-16 13:18:26,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=449961.6666666667, ans=0.0 2024-09-16 13:18:27,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=449961.6666666667, ans=0.125 2024-09-16 13:18:42,559 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:18:50,593 INFO [train.py:1198] (1/2) Epoch 25, batch 5450, loss[loss=0.2468, ctc_loss=0.1667, cr_loss=0.4006, over 19957.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3768, over 4110183.44 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:18:53,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=450018.3333333333, ans=0.0 2024-09-16 13:19:11,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-16 13:19:18,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=450046.6666666667, ans=0.2 2024-09-16 13:19:19,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=450075.0, ans=0.2 2024-09-16 13:19:35,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.122e+02 2.248e+02 2.443e+02 5.856e+02, threshold=4.495e+02, percent-clipped=2.0 2024-09-16 13:19:50,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=450131.6666666667, ans=0.125 2024-09-16 13:20:04,802 INFO [train.py:1198] (1/2) Epoch 25, batch 5500, loss[loss=0.2242, ctc_loss=0.1518, cr_loss=0.3617, over 20992.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1544, cr_loss=0.378, over 4115849.64 frames. ], batch size: 58, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:20:07,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=450160.0, ans=0.0 2024-09-16 13:20:38,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-16 13:21:07,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-16 13:21:18,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-09-16 13:21:18,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=22.5 2024-09-16 13:21:21,970 INFO [train.py:1198] (1/2) Epoch 25, batch 5550, loss[loss=0.2027, ctc_loss=0.1331, cr_loss=0.3483, over 21064.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1545, cr_loss=0.378, over 4104934.85 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:21:43,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=450330.0, ans=0.125 2024-09-16 13:22:06,591 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.103e+02 2.265e+02 2.441e+02 4.541e+02, threshold=4.530e+02, percent-clipped=1.0 2024-09-16 13:22:17,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450386.6666666667, ans=0.125 2024-09-16 13:22:19,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=450386.6666666667, ans=0.05 2024-09-16 13:22:22,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-16 13:22:33,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-16 13:22:36,848 INFO [train.py:1198] (1/2) Epoch 25, batch 5600, loss[loss=0.2307, ctc_loss=0.1519, cr_loss=0.3939, over 21034.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1551, cr_loss=0.3792, over 4097345.62 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:23:17,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=450500.0, ans=0.0 2024-09-16 13:23:51,957 INFO [train.py:1198] (1/2) Epoch 25, batch 5650, loss[loss=0.2674, ctc_loss=0.1793, cr_loss=0.4407, over 19365.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1566, cr_loss=0.3809, over 4079795.95 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:23:58,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-09-16 13:24:12,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:24:25,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=450641.6666666667, ans=0.0 2024-09-16 13:24:29,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:24:36,776 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.096e+02 2.208e+02 2.368e+02 3.546e+02, threshold=4.416e+02, percent-clipped=0.0 2024-09-16 13:25:06,376 INFO [train.py:1198] (1/2) Epoch 25, batch 5700, loss[loss=0.2148, ctc_loss=0.1423, cr_loss=0.3624, over 20958.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1561, cr_loss=0.3798, over 4087242.56 frames. ], batch size: 51, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:25:13,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=450726.6666666667, ans=0.0 2024-09-16 13:25:30,267 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:25:33,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=450755.0, ans=0.2 2024-09-16 13:25:38,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=450783.3333333333, ans=0.0 2024-09-16 13:25:51,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=450811.6666666667, ans=0.125 2024-09-16 13:25:57,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=450811.6666666667, ans=0.0 2024-09-16 13:26:16,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2024-09-16 13:26:17,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=450840.0, ans=0.125 2024-09-16 13:26:20,379 INFO [train.py:1198] (1/2) Epoch 25, batch 5750, loss[loss=0.2512, ctc_loss=0.1728, cr_loss=0.3924, over 19507.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1552, cr_loss=0.3779, over 4087238.56 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:26:41,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=450896.6666666667, ans=0.0 2024-09-16 13:27:05,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.115e+02 2.237e+02 2.403e+02 6.193e+02, threshold=4.473e+02, percent-clipped=1.0 2024-09-16 13:27:37,702 INFO [train.py:1198] (1/2) Epoch 25, batch 5800, loss[loss=0.2372, ctc_loss=0.1569, cr_loss=0.4012, over 20984.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3773, over 4092324.21 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:28:03,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=451038.3333333333, ans=0.2 2024-09-16 13:28:39,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-16 13:28:48,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=451123.3333333333, ans=0.125 2024-09-16 13:28:52,517 INFO [train.py:1198] (1/2) Epoch 25, batch 5850, loss[loss=0.1988, ctc_loss=0.129, cr_loss=0.3489, over 20951.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1552, cr_loss=0.3775, over 4095767.69 frames. ], batch size: 49, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:29:03,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-16 13:29:09,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451180.0, ans=0.1 2024-09-16 13:29:22,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=451208.3333333333, ans=0.125 2024-09-16 13:29:31,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=451208.3333333333, ans=0.2 2024-09-16 13:29:39,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.148e+02 2.347e+02 2.593e+02 4.296e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-16 13:29:40,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-16 13:29:44,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=451236.6666666667, ans=0.125 2024-09-16 13:29:55,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=451265.0, ans=0.95 2024-09-16 13:29:55,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=451265.0, ans=0.2 2024-09-16 13:29:58,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=451265.0, ans=0.0 2024-09-16 13:29:59,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=451265.0, ans=0.125 2024-09-16 13:30:09,794 INFO [train.py:1198] (1/2) Epoch 25, batch 5900, loss[loss=0.2736, ctc_loss=0.187, cr_loss=0.4328, over 18103.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1559, cr_loss=0.3788, over 4093263.35 frames. ], batch size: 108, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:30:15,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=451293.3333333333, ans=0.125 2024-09-16 13:30:30,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451321.6666666667, ans=0.0 2024-09-16 13:30:38,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=22.5 2024-09-16 13:31:03,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=12.0 2024-09-16 13:31:24,052 INFO [train.py:1198] (1/2) Epoch 25, batch 5950, loss[loss=0.1721, ctc_loss=0.1116, cr_loss=0.3023, over 19965.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.378, over 4093699.95 frames. ], batch size: 44, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:31:43,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=451463.3333333333, ans=0.125 2024-09-16 13:32:02,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451491.6666666667, ans=0.1 2024-09-16 13:32:03,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-16 13:32:08,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.067e+02 2.207e+02 2.317e+02 3.232e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-16 13:32:38,378 INFO [train.py:1198] (1/2) Epoch 25, batch 6000, loss[loss=0.2347, ctc_loss=0.1589, cr_loss=0.3791, over 20934.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1558, cr_loss=0.3788, over 4093448.10 frames. ], batch size: 60, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:32:38,378 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 13:32:50,308 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9343, 5.5867, 5.3653, 5.0614], device='cuda:1') 2024-09-16 13:32:52,689 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7190, 4.4257, 3.3307, 3.8875], device='cuda:1') 2024-09-16 13:33:02,496 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.04198, ctc_loss=0.04198, cr_loss=1.159e-14, over 944034.00 frames. 2024-09-16 13:33:02,497 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 13:33:19,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=451605.0, ans=0.125 2024-09-16 13:34:02,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=451690.0, ans=0.0 2024-09-16 13:34:16,873 INFO [train.py:1198] (1/2) Epoch 25, batch 6050, loss[loss=0.1904, ctc_loss=0.1259, cr_loss=0.3225, over 20946.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3788, over 4083360.84 frames. ], batch size: 51, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:34:37,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=451746.6666666667, ans=0.2 2024-09-16 13:35:02,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.106e+02 2.238e+02 2.471e+02 5.382e+02, threshold=4.476e+02, percent-clipped=1.0 2024-09-16 13:35:13,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=451803.3333333333, ans=0.0 2024-09-16 13:35:16,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=451803.3333333333, ans=0.04949747468305833 2024-09-16 13:35:33,559 INFO [train.py:1198] (1/2) Epoch 25, batch 6100, loss[loss=0.1934, ctc_loss=0.1287, cr_loss=0.3234, over 19000.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1552, cr_loss=0.3786, over 4092922.75 frames. ], batch size: 42, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:35:47,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=451888.3333333333, ans=0.04949747468305833 2024-09-16 13:36:09,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=451916.6666666667, ans=0.0 2024-09-16 13:36:45,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451973.3333333333, ans=0.125 2024-09-16 13:36:47,704 INFO [train.py:1198] (1/2) Epoch 25, batch 6150, loss[loss=0.2224, ctc_loss=0.1447, cr_loss=0.3883, over 20889.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3786, over 4086882.00 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:36:50,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=452001.6666666667, ans=0.05 2024-09-16 13:37:15,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452030.0, ans=0.1 2024-09-16 13:37:33,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.668e+02 2.108e+02 2.276e+02 2.462e+02 3.641e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 13:38:02,948 INFO [train.py:1198] (1/2) Epoch 25, batch 6200, loss[loss=0.2468, ctc_loss=0.1641, cr_loss=0.4136, over 20657.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3786, over 4076548.03 frames. ], batch size: 68, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:38:34,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=452200.0, ans=0.02 2024-09-16 13:38:57,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=452228.3333333333, ans=0.125 2024-09-16 13:39:16,046 INFO [train.py:1198] (1/2) Epoch 25, batch 6250, loss[loss=0.2394, ctc_loss=0.1616, cr_loss=0.389, over 21034.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1551, cr_loss=0.3763, over 4054353.72 frames. ], batch size: 62, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:39:27,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-09-16 13:39:49,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=452341.6666666667, ans=0.2 2024-09-16 13:39:58,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=452341.6666666667, ans=0.2 2024-09-16 13:40:01,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.156e+02 2.310e+02 2.565e+02 4.257e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 13:40:29,803 INFO [train.py:1198] (1/2) Epoch 25, batch 6300, loss[loss=0.2378, ctc_loss=0.1607, cr_loss=0.3855, over 18428.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3786, over 4009565.06 frames. ], batch size: 108, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:40:37,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=452426.6666666667, ans=0.125 2024-09-16 13:40:46,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=452455.0, ans=0.025 2024-09-16 13:40:46,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=452455.0, ans=0.0 2024-09-16 13:40:55,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452455.0, ans=0.0 2024-09-16 13:40:55,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=452455.0, ans=0.125 2024-09-16 13:41:13,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-16 13:41:14,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=452511.6666666667, ans=0.125 2024-09-16 13:41:29,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=452540.0, ans=0.04949747468305833 2024-09-16 13:41:41,485 INFO [train.py:1198] (1/2) Epoch 25, batch 6350, loss[loss=0.2701, ctc_loss=0.193, cr_loss=0.3855, over 14298.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1633, cr_loss=0.3831, over 3841563.56 frames. ], batch size: 149, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:41:44,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=452568.3333333333, ans=0.125 2024-09-16 13:42:05,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-16 13:42:07,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=452596.6666666667, ans=0.0 2024-09-16 13:42:24,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.439e+02 2.653e+02 2.916e+02 4.379e+02, threshold=5.307e+02, percent-clipped=0.0 2024-09-16 13:42:32,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452653.3333333333, ans=0.1 2024-09-16 13:42:36,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=452653.3333333333, ans=0.05 2024-09-16 13:43:34,254 INFO [train.py:1198] (1/2) Epoch 26, batch 0, loss[loss=0.1947, ctc_loss=0.127, cr_loss=0.3382, over 20982.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.127, cr_loss=0.3382, over 20982.00 frames. ], batch size: 52, lr: 3.23e-03, grad_scale: 32.0 2024-09-16 13:43:34,255 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 13:43:55,345 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.04202, ctc_loss=0.04202, cr_loss=1.14e-14, over 944034.00 frames. 2024-09-16 13:43:55,346 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 13:43:57,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-16 13:44:05,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2024-09-16 13:44:16,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-16 13:44:25,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=452741.1666666667, ans=0.95 2024-09-16 13:44:58,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=452797.8333333333, ans=0.04949747468305833 2024-09-16 13:45:10,598 INFO [train.py:1198] (1/2) Epoch 26, batch 50, loss[loss=0.2317, ctc_loss=0.1562, cr_loss=0.3776, over 21069.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1544, cr_loss=0.3729, over 927859.66 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:45:13,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=452826.1666666667, ans=0.0 2024-09-16 13:45:21,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=452826.1666666667, ans=0.025 2024-09-16 13:45:24,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=452854.5, ans=0.2 2024-09-16 13:46:02,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=452911.1666666667, ans=0.125 2024-09-16 13:46:09,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.095e+02 2.264e+02 2.491e+02 4.123e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 13:46:17,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=452939.5, ans=0.0 2024-09-16 13:46:22,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=452939.5, ans=0.0 2024-09-16 13:46:25,748 INFO [train.py:1198] (1/2) Epoch 26, batch 100, loss[loss=0.2333, ctc_loss=0.1588, cr_loss=0.3726, over 19292.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1521, cr_loss=0.3706, over 1636487.87 frames. ], batch size: 90, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:46:58,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=453024.5, ans=0.125 2024-09-16 13:47:10,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=453024.5, ans=0.125 2024-09-16 13:47:28,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-09-16 13:47:44,148 INFO [train.py:1198] (1/2) Epoch 26, batch 150, loss[loss=0.1769, ctc_loss=0.1142, cr_loss=0.3131, over 20956.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1539, cr_loss=0.3736, over 2178830.85 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:48:15,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-16 13:48:19,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-16 13:48:22,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=453166.1666666667, ans=0.0 2024-09-16 13:48:31,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=453194.5, ans=0.0 2024-09-16 13:48:42,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.079e+02 2.215e+02 2.385e+02 3.003e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-16 13:48:49,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453222.8333333333, ans=0.1 2024-09-16 13:48:55,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=453222.8333333333, ans=15.0 2024-09-16 13:48:56,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=453222.8333333333, ans=0.125 2024-09-16 13:48:59,345 INFO [train.py:1198] (1/2) Epoch 26, batch 200, loss[loss=0.2067, ctc_loss=0.1374, cr_loss=0.3461, over 20757.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1531, cr_loss=0.3744, over 2609976.69 frames. ], batch size: 53, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:49:54,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=453336.1666666667, ans=15.0 2024-09-16 13:50:01,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=453336.1666666667, ans=0.125 2024-09-16 13:50:19,161 INFO [train.py:1198] (1/2) Epoch 26, batch 250, loss[loss=0.2979, ctc_loss=0.2124, cr_loss=0.4273, over 14752.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1536, cr_loss=0.3757, over 2929307.24 frames. ], batch size: 149, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:50:19,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=453392.8333333333, ans=0.125 2024-09-16 13:50:22,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453392.8333333333, ans=0.125 2024-09-16 13:51:01,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=453449.5, ans=0.0 2024-09-16 13:51:10,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=453477.8333333333, ans=0.125 2024-09-16 13:51:13,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=453477.8333333333, ans=0.125 2024-09-16 13:51:18,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.099e+02 2.209e+02 2.337e+02 3.735e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 13:51:34,885 INFO [train.py:1198] (1/2) Epoch 26, batch 300, loss[loss=0.2465, ctc_loss=0.1634, cr_loss=0.4155, over 20988.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.153, cr_loss=0.374, over 3183393.15 frames. ], batch size: 55, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:51:38,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=453534.5, ans=0.0 2024-09-16 13:51:40,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=453534.5, ans=0.125 2024-09-16 13:52:20,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-16 13:52:38,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453647.8333333333, ans=0.1 2024-09-16 13:52:44,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=453647.8333333333, ans=0.0 2024-09-16 13:52:49,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453647.8333333333, ans=0.1 2024-09-16 13:52:53,453 INFO [train.py:1198] (1/2) Epoch 26, batch 350, loss[loss=0.2187, ctc_loss=0.1443, cr_loss=0.3717, over 20894.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1525, cr_loss=0.3732, over 3375019.77 frames. ], batch size: 54, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:52:59,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=453676.1666666667, ans=0.125 2024-09-16 13:53:00,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-16 13:53:51,510 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.099e+02 2.203e+02 2.392e+02 3.716e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 13:54:07,992 INFO [train.py:1198] (1/2) Epoch 26, batch 400, loss[loss=0.2327, ctc_loss=0.1578, cr_loss=0.3749, over 20971.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.3763, over 3539034.58 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:54:18,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=453817.8333333333, ans=0.125 2024-09-16 13:55:13,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=453931.1666666667, ans=0.125 2024-09-16 13:55:22,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=453931.1666666667, ans=10.0 2024-09-16 13:55:26,721 INFO [train.py:1198] (1/2) Epoch 26, batch 450, loss[loss=0.2328, ctc_loss=0.1546, cr_loss=0.3909, over 20640.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1548, cr_loss=0.3767, over 3659222.66 frames. ], batch size: 71, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:55:31,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=453959.5, ans=0.04949747468305833 2024-09-16 13:55:49,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2024-09-16 13:56:25,187 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.074e+02 2.167e+02 2.316e+02 2.993e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-16 13:56:34,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=454072.8333333333, ans=0.2 2024-09-16 13:56:40,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=454101.1666666667, ans=0.0 2024-09-16 13:56:42,000 INFO [train.py:1198] (1/2) Epoch 26, batch 500, loss[loss=0.2651, ctc_loss=0.1809, cr_loss=0.4209, over 20701.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1553, cr_loss=0.377, over 3749807.64 frames. ], batch size: 66, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:56:58,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-16 13:57:08,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454129.5, ans=0.0 2024-09-16 13:57:32,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=454186.1666666667, ans=0.2 2024-09-16 13:57:58,512 INFO [train.py:1198] (1/2) Epoch 26, batch 550, loss[loss=0.2358, ctc_loss=0.1609, cr_loss=0.3746, over 20883.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3774, over 3829152.44 frames. ], batch size: 57, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:58:51,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=454327.8333333333, ans=0.07 2024-09-16 13:58:52,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-16 13:59:00,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.132e+02 2.359e+02 2.534e+02 5.840e+02, threshold=4.717e+02, percent-clipped=1.0 2024-09-16 13:59:17,475 INFO [train.py:1198] (1/2) Epoch 26, batch 600, loss[loss=0.2097, ctc_loss=0.1383, cr_loss=0.3567, over 21013.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1549, cr_loss=0.3772, over 3896990.83 frames. ], batch size: 52, lr: 3.22e-03, grad_scale: 64.0 2024-09-16 13:59:46,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=454441.1666666667, ans=0.125 2024-09-16 14:00:26,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454497.8333333333, ans=0.1 2024-09-16 14:00:33,470 INFO [train.py:1198] (1/2) Epoch 26, batch 650, loss[loss=0.2186, ctc_loss=0.1449, cr_loss=0.3681, over 21061.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3776, over 3953250.66 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:00:53,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=454554.5, ans=10.0 2024-09-16 14:00:59,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=454554.5, ans=15.0 2024-09-16 14:01:08,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=454582.8333333333, ans=0.125 2024-09-16 14:01:11,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=454582.8333333333, ans=0.125 2024-09-16 14:01:26,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=454611.1666666667, ans=0.07 2024-09-16 14:01:36,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.079e+02 2.247e+02 2.388e+02 2.917e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-16 14:01:37,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-09-16 14:01:43,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-16 14:01:51,878 INFO [train.py:1198] (1/2) Epoch 26, batch 700, loss[loss=0.2075, ctc_loss=0.1395, cr_loss=0.34, over 20870.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3784, over 3987122.49 frames. ], batch size: 57, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:02:13,340 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:03:07,609 INFO [train.py:1198] (1/2) Epoch 26, batch 750, loss[loss=0.207, ctc_loss=0.1388, cr_loss=0.341, over 20679.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3769, over 4024098.35 frames. ], batch size: 46, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:03:09,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=454809.5, ans=0.0 2024-09-16 14:03:21,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454837.8333333333, ans=0.1 2024-09-16 14:03:25,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=454837.8333333333, ans=0.125 2024-09-16 14:03:40,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=454866.1666666667, ans=12.0 2024-09-16 14:04:07,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.154e+02 2.284e+02 2.459e+02 3.150e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-16 14:04:15,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=454922.8333333333, ans=0.09899494936611666 2024-09-16 14:04:15,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=454922.8333333333, ans=0.125 2024-09-16 14:04:19,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2024-09-16 14:04:25,943 INFO [train.py:1198] (1/2) Epoch 26, batch 800, loss[loss=0.2399, ctc_loss=0.163, cr_loss=0.3845, over 21009.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1551, cr_loss=0.3776, over 4039323.01 frames. ], batch size: 61, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:05:02,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=455007.8333333333, ans=0.0 2024-09-16 14:05:08,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=455007.8333333333, ans=0.05 2024-09-16 14:05:30,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455064.5, ans=0.1 2024-09-16 14:05:41,530 INFO [train.py:1198] (1/2) Epoch 26, batch 850, loss[loss=0.2514, ctc_loss=0.1745, cr_loss=0.3846, over 14254.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3764, over 4048542.13 frames. ], batch size: 151, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:05:54,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=455092.8333333333, ans=0.0 2024-09-16 14:06:15,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=455149.5, ans=0.025 2024-09-16 14:06:38,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455177.8333333333, ans=0.125 2024-09-16 14:06:45,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.151e+02 2.266e+02 2.484e+02 4.339e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-16 14:07:00,624 INFO [train.py:1198] (1/2) Epoch 26, batch 900, loss[loss=0.2178, ctc_loss=0.1501, cr_loss=0.3386, over 20972.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1541, cr_loss=0.376, over 4055977.38 frames. ], batch size: 55, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:07:01,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455234.5, ans=0.1 2024-09-16 14:07:02,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455234.5, ans=0.125 2024-09-16 14:08:01,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=455347.8333333333, ans=0.2 2024-09-16 14:08:01,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=455347.8333333333, ans=0.125 2024-09-16 14:08:10,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=455347.8333333333, ans=0.0 2024-09-16 14:08:16,235 INFO [train.py:1198] (1/2) Epoch 26, batch 950, loss[loss=0.2361, ctc_loss=0.1603, cr_loss=0.3791, over 20867.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.154, cr_loss=0.3763, over 4070127.27 frames. ], batch size: 57, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:08:34,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=455404.5, ans=0.0 2024-09-16 14:08:36,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-16 14:09:17,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.059e+02 2.176e+02 2.334e+02 8.044e+02, threshold=4.353e+02, percent-clipped=2.0 2024-09-16 14:09:32,391 INFO [train.py:1198] (1/2) Epoch 26, batch 1000, loss[loss=0.2332, ctc_loss=0.1567, cr_loss=0.3826, over 21008.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3769, over 4090160.59 frames. ], batch size: 63, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:10:25,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=455602.8333333333, ans=22.5 2024-09-16 14:10:32,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=455602.8333333333, ans=0.035 2024-09-16 14:10:50,377 INFO [train.py:1198] (1/2) Epoch 26, batch 1050, loss[loss=0.2646, ctc_loss=0.185, cr_loss=0.3982, over 18451.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1542, cr_loss=0.3767, over 4084460.75 frames. ], batch size: 108, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:10:51,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2024-09-16 14:11:01,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455659.5, ans=0.1 2024-09-16 14:11:20,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=455716.1666666667, ans=0.2 2024-09-16 14:11:37,498 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:11:40,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=455744.5, ans=0.0 2024-09-16 14:11:50,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.137e+02 2.261e+02 2.471e+02 3.167e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-16 14:12:06,265 INFO [train.py:1198] (1/2) Epoch 26, batch 1100, loss[loss=0.2265, ctc_loss=0.1507, cr_loss=0.379, over 20868.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.154, cr_loss=0.376, over 4092656.12 frames. ], batch size: 57, lr: 3.21e-03, grad_scale: 16.0 2024-09-16 14:12:17,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2024-09-16 14:13:17,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=455914.5, ans=0.0 2024-09-16 14:13:20,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=455914.5, ans=0.2 2024-09-16 14:13:24,581 INFO [train.py:1198] (1/2) Epoch 26, batch 1150, loss[loss=0.2007, ctc_loss=0.1329, cr_loss=0.3392, over 20959.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.3771, over 4095598.97 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 16.0 2024-09-16 14:13:26,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455942.8333333333, ans=0.1 2024-09-16 14:13:42,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=455971.1666666667, ans=0.0 2024-09-16 14:14:26,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.109e+02 2.299e+02 2.482e+02 5.988e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-16 14:14:32,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=456056.1666666667, ans=0.125 2024-09-16 14:14:39,910 INFO [train.py:1198] (1/2) Epoch 26, batch 1200, loss[loss=0.2521, ctc_loss=0.1724, cr_loss=0.3984, over 21011.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.377, over 4097513.57 frames. ], batch size: 61, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:15:13,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=456141.1666666667, ans=0.125 2024-09-16 14:15:37,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=456169.5, ans=0.2 2024-09-16 14:15:48,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=456197.8333333333, ans=0.125 2024-09-16 14:15:57,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=456226.1666666667, ans=0.125 2024-09-16 14:15:58,316 INFO [train.py:1198] (1/2) Epoch 26, batch 1250, loss[loss=0.2016, ctc_loss=0.1301, cr_loss=0.3576, over 20967.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3768, over 4107076.11 frames. ], batch size: 52, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:16:21,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=456254.5, ans=12.0 2024-09-16 14:16:50,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:17:00,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.063e+02 2.198e+02 2.380e+02 3.087e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 14:17:13,812 INFO [train.py:1198] (1/2) Epoch 26, batch 1300, loss[loss=0.2512, ctc_loss=0.1694, cr_loss=0.4087, over 19482.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.3764, over 4106150.27 frames. ], batch size: 90, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:17:15,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456367.8333333333, ans=0.1 2024-09-16 14:18:13,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=456452.8333333333, ans=0.0 2024-09-16 14:18:22,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456481.1666666667, ans=0.1 2024-09-16 14:18:32,954 INFO [train.py:1198] (1/2) Epoch 26, batch 1350, loss[loss=0.2304, ctc_loss=0.1553, cr_loss=0.3755, over 20972.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.3769, over 4101846.60 frames. ], batch size: 50, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:19:35,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.159e+02 2.328e+02 2.603e+02 5.755e+02, threshold=4.657e+02, percent-clipped=2.0 2024-09-16 14:19:43,594 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:19:48,948 INFO [train.py:1198] (1/2) Epoch 26, batch 1400, loss[loss=0.216, ctc_loss=0.1462, cr_loss=0.3491, over 21081.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.155, cr_loss=0.3776, over 4088759.76 frames. ], batch size: 59, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:19:55,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=456651.1666666667, ans=0.125 2024-09-16 14:20:07,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456679.5, ans=0.1 2024-09-16 14:20:54,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=456764.5, ans=0.125 2024-09-16 14:21:04,574 INFO [train.py:1198] (1/2) Epoch 26, batch 1450, loss[loss=0.2098, ctc_loss=0.14, cr_loss=0.3493, over 20784.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1541, cr_loss=0.3761, over 4080740.89 frames. ], batch size: 56, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:21:15,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456792.8333333333, ans=0.1 2024-09-16 14:21:15,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=24.59 vs. limit=15.0 2024-09-16 14:21:18,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=456821.1666666667, ans=0.125 2024-09-16 14:21:24,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=456821.1666666667, ans=0.0 2024-09-16 14:21:30,666 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:21:35,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=456821.1666666667, ans=0.035 2024-09-16 14:21:41,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2024-09-16 14:22:09,856 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.102e+02 2.215e+02 2.442e+02 3.788e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-16 14:22:23,406 INFO [train.py:1198] (1/2) Epoch 26, batch 1500, loss[loss=0.2143, ctc_loss=0.143, cr_loss=0.3568, over 20795.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3776, over 4084798.21 frames. ], batch size: 56, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:22:55,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=456991.1666666667, ans=0.0 2024-09-16 14:23:22,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=457047.8333333333, ans=0.125 2024-09-16 14:23:41,533 INFO [train.py:1198] (1/2) Epoch 26, batch 1550, loss[loss=0.2362, ctc_loss=0.1572, cr_loss=0.3948, over 20841.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1554, cr_loss=0.3778, over 4075451.50 frames. ], batch size: 59, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:24:00,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2024-09-16 14:24:03,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=457104.5, ans=0.07 2024-09-16 14:24:36,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=457161.1666666667, ans=0.07 2024-09-16 14:24:39,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=457161.1666666667, ans=0.025 2024-09-16 14:24:43,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.098e+02 2.271e+02 2.395e+02 3.546e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-16 14:24:44,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-16 14:24:45,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=457189.5, ans=0.0 2024-09-16 14:24:57,063 INFO [train.py:1198] (1/2) Epoch 26, batch 1600, loss[loss=0.2104, ctc_loss=0.1406, cr_loss=0.3492, over 20946.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1546, cr_loss=0.3766, over 4074582.96 frames. ], batch size: 50, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:25:05,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=22.5 2024-09-16 14:25:35,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457274.5, ans=0.125 2024-09-16 14:25:45,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2024-09-16 14:25:47,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=457302.8333333333, ans=0.0 2024-09-16 14:25:47,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-16 14:25:52,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=457302.8333333333, ans=0.0 2024-09-16 14:26:11,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457359.5, ans=0.125 2024-09-16 14:26:12,597 INFO [train.py:1198] (1/2) Epoch 26, batch 1650, loss[loss=0.2347, ctc_loss=0.1565, cr_loss=0.3911, over 21071.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3776, over 4066765.05 frames. ], batch size: 59, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:26:17,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-09-16 14:26:20,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=457359.5, ans=0.0 2024-09-16 14:26:23,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2024-09-16 14:27:10,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=457444.5, ans=0.125 2024-09-16 14:27:17,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.106e+02 2.247e+02 2.451e+02 3.783e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 14:27:19,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457472.8333333333, ans=0.1 2024-09-16 14:27:30,930 INFO [train.py:1198] (1/2) Epoch 26, batch 1700, loss[loss=0.2341, ctc_loss=0.1634, cr_loss=0.3534, over 13877.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1541, cr_loss=0.376, over 4073952.63 frames. ], batch size: 149, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:27:41,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=457501.1666666667, ans=0.1 2024-09-16 14:27:41,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=457501.1666666667, ans=0.0 2024-09-16 14:28:21,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-16 14:28:34,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=457614.5, ans=0.2 2024-09-16 14:28:46,464 INFO [train.py:1198] (1/2) Epoch 26, batch 1750, loss[loss=0.2484, ctc_loss=0.1697, cr_loss=0.3937, over 20665.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3767, over 4075694.34 frames. ], batch size: 66, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:28:51,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=457642.8333333333, ans=0.0 2024-09-16 14:29:21,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457699.5, ans=0.1 2024-09-16 14:29:27,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=457699.5, ans=0.125 2024-09-16 14:29:51,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.071e+02 2.231e+02 2.467e+02 4.552e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 14:30:00,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=457756.1666666667, ans=0.125 2024-09-16 14:30:05,749 INFO [train.py:1198] (1/2) Epoch 26, batch 1800, loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3705, over 20790.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1532, cr_loss=0.3747, over 4095954.12 frames. ], batch size: 53, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:30:30,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=457812.8333333333, ans=0.0 2024-09-16 14:30:34,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=457841.1666666667, ans=0.125 2024-09-16 14:30:36,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457841.1666666667, ans=0.1 2024-09-16 14:30:55,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=457869.5, ans=0.125 2024-09-16 14:31:03,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-09-16 14:31:21,357 INFO [train.py:1198] (1/2) Epoch 26, batch 1850, loss[loss=0.2273, ctc_loss=0.1525, cr_loss=0.3742, over 21043.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1517, cr_loss=0.3722, over 4104399.79 frames. ], batch size: 56, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:31:59,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=457982.8333333333, ans=0.0 2024-09-16 14:32:11,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=458011.1666666667, ans=0.125 2024-09-16 14:32:23,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.117e+02 2.290e+02 2.418e+02 4.215e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-16 14:32:34,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458039.5, ans=0.125 2024-09-16 14:32:37,038 INFO [train.py:1198] (1/2) Epoch 26, batch 1900, loss[loss=0.2304, ctc_loss=0.1555, cr_loss=0.3742, over 20937.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1526, cr_loss=0.3738, over 4097014.59 frames. ], batch size: 60, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:33:16,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458124.5, ans=0.125 2024-09-16 14:33:21,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=458124.5, ans=0.0 2024-09-16 14:33:25,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=458152.8333333333, ans=0.125 2024-09-16 14:33:30,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=458152.8333333333, ans=0.2 2024-09-16 14:33:45,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=458181.1666666667, ans=0.0 2024-09-16 14:33:55,723 INFO [train.py:1198] (1/2) Epoch 26, batch 1950, loss[loss=0.1976, ctc_loss=0.1297, cr_loss=0.3396, over 20992.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1536, cr_loss=0.3753, over 4098380.86 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:34:06,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458209.5, ans=0.125 2024-09-16 14:34:13,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-16 14:34:18,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=458237.8333333333, ans=0.07 2024-09-16 14:34:44,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=458294.5, ans=0.125 2024-09-16 14:34:57,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.102e+02 2.228e+02 2.440e+02 6.596e+02, threshold=4.457e+02, percent-clipped=1.0 2024-09-16 14:35:14,250 INFO [train.py:1198] (1/2) Epoch 26, batch 2000, loss[loss=0.2206, ctc_loss=0.1502, cr_loss=0.3522, over 21034.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1527, cr_loss=0.3739, over 4105017.53 frames. ], batch size: 62, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:35:54,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-16 14:36:28,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=458464.5, ans=10.0 2024-09-16 14:36:28,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=458464.5, ans=0.125 2024-09-16 14:36:30,880 INFO [train.py:1198] (1/2) Epoch 26, batch 2050, loss[loss=0.2492, ctc_loss=0.1691, cr_loss=0.4007, over 20845.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.153, cr_loss=0.3748, over 4102875.79 frames. ], batch size: 65, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:36:31,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=458492.8333333333, ans=0.0 2024-09-16 14:36:40,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=458492.8333333333, ans=0.0 2024-09-16 14:37:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458549.5, ans=0.1 2024-09-16 14:37:33,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.127e+02 2.251e+02 2.406e+02 3.527e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 14:37:40,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458606.1666666667, ans=0.125 2024-09-16 14:37:46,633 INFO [train.py:1198] (1/2) Epoch 26, batch 2100, loss[loss=0.2061, ctc_loss=0.1372, cr_loss=0.3443, over 20974.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.376, over 4102315.48 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:38:20,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=458691.1666666667, ans=0.125 2024-09-16 14:38:20,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-16 14:38:27,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458691.1666666667, ans=0.125 2024-09-16 14:38:32,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-16 14:39:05,710 INFO [train.py:1198] (1/2) Epoch 26, batch 2150, loss[loss=0.2005, ctc_loss=0.1325, cr_loss=0.34, over 19855.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3767, over 4101065.06 frames. ], batch size: 44, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:39:09,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458776.1666666667, ans=0.1 2024-09-16 14:39:41,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=458832.8333333333, ans=0.07 2024-09-16 14:40:00,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=458861.1666666667, ans=0.0 2024-09-16 14:40:07,870 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.062e+02 2.213e+02 2.329e+02 3.338e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 14:40:21,575 INFO [train.py:1198] (1/2) Epoch 26, batch 2200, loss[loss=0.2476, ctc_loss=0.1686, cr_loss=0.3951, over 20777.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.153, cr_loss=0.375, over 4100284.34 frames. ], batch size: 71, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:40:28,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=458917.8333333333, ans=0.125 2024-09-16 14:40:55,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458974.5, ans=0.1 2024-09-16 14:40:56,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=458974.5, ans=0.0 2024-09-16 14:41:19,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=459002.8333333333, ans=0.2 2024-09-16 14:41:21,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2024-09-16 14:41:27,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=459031.1666666667, ans=0.125 2024-09-16 14:41:40,398 INFO [train.py:1198] (1/2) Epoch 26, batch 2250, loss[loss=0.2552, ctc_loss=0.1737, cr_loss=0.4078, over 20648.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1535, cr_loss=0.3756, over 4102979.89 frames. ], batch size: 66, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:42:21,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=459116.1666666667, ans=0.0 2024-09-16 14:42:42,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=459172.8333333333, ans=0.2 2024-09-16 14:42:43,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.074e+02 2.201e+02 2.373e+02 4.119e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-16 14:42:48,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=459172.8333333333, ans=15.0 2024-09-16 14:42:50,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=459172.8333333333, ans=0.0 2024-09-16 14:42:56,051 INFO [train.py:1198] (1/2) Epoch 26, batch 2300, loss[loss=0.2478, ctc_loss=0.168, cr_loss=0.3991, over 20021.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1527, cr_loss=0.3744, over 4101558.52 frames. ], batch size: 80, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:42:59,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=459201.1666666667, ans=0.125 2024-09-16 14:43:02,677 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:43:11,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=459229.5, ans=0.125 2024-09-16 14:44:12,600 INFO [train.py:1198] (1/2) Epoch 26, batch 2350, loss[loss=0.2128, ctc_loss=0.1413, cr_loss=0.3577, over 20961.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1529, cr_loss=0.3741, over 4080646.60 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:44:17,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-09-16 14:44:25,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=459342.8333333333, ans=0.125 2024-09-16 14:45:19,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.064e+02 2.189e+02 2.370e+02 5.401e+02, threshold=4.377e+02, percent-clipped=1.0 2024-09-16 14:45:32,232 INFO [train.py:1198] (1/2) Epoch 26, batch 2400, loss[loss=0.2104, ctc_loss=0.139, cr_loss=0.3571, over 20979.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3753, over 4094054.16 frames. ], batch size: 52, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:45:56,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=459512.8333333333, ans=0.125 2024-09-16 14:46:51,617 INFO [train.py:1198] (1/2) Epoch 26, batch 2450, loss[loss=0.2707, ctc_loss=0.1847, cr_loss=0.4302, over 20096.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3761, over 4096708.62 frames. ], batch size: 80, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:46:51,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=459626.1666666667, ans=0.0 2024-09-16 14:47:17,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=459654.5, ans=0.0 2024-09-16 14:47:19,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-16 14:47:22,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=459682.8333333333, ans=0.0 2024-09-16 14:47:28,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=459682.8333333333, ans=0.05 2024-09-16 14:47:35,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=459711.1666666667, ans=0.125 2024-09-16 14:47:47,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-16 14:47:55,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.096e+02 2.219e+02 2.449e+02 4.273e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 14:48:06,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=459767.8333333333, ans=10.0 2024-09-16 14:48:07,572 INFO [train.py:1198] (1/2) Epoch 26, batch 2500, loss[loss=0.1955, ctc_loss=0.1278, cr_loss=0.3385, over 20825.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3765, over 4091827.30 frames. ], batch size: 59, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:48:24,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=459796.1666666667, ans=0.0 2024-09-16 14:48:57,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=459852.8333333333, ans=0.025 2024-09-16 14:48:59,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-16 14:49:06,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=459881.1666666667, ans=0.2 2024-09-16 14:49:20,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=459881.1666666667, ans=0.2 2024-09-16 14:49:23,158 INFO [train.py:1198] (1/2) Epoch 26, batch 2550, loss[loss=0.2224, ctc_loss=0.149, cr_loss=0.3669, over 20802.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.375, over 4091451.78 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:49:29,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=459909.5, ans=0.0 2024-09-16 14:50:00,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2024-09-16 14:50:29,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.150e+02 2.315e+02 2.557e+02 3.804e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-16 14:50:33,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=460022.8333333333, ans=0.125 2024-09-16 14:50:39,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=460022.8333333333, ans=0.125 2024-09-16 14:50:42,039 INFO [train.py:1198] (1/2) Epoch 26, batch 2600, loss[loss=0.1951, ctc_loss=0.1268, cr_loss=0.3412, over 20984.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1522, cr_loss=0.3733, over 4094658.44 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:51:31,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=460136.1666666667, ans=0.125 2024-09-16 14:51:40,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-09-16 14:51:41,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 14:51:53,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=460164.5, ans=0.125 2024-09-16 14:51:57,399 INFO [train.py:1198] (1/2) Epoch 26, batch 2650, loss[loss=0.2681, ctc_loss=0.1929, cr_loss=0.3759, over 14322.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.3749, over 4095565.70 frames. ], batch size: 149, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:52:00,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460192.8333333333, ans=0.1 2024-09-16 14:52:05,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=460192.8333333333, ans=0.0 2024-09-16 14:52:05,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-16 14:52:05,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-16 14:52:14,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=460221.1666666667, ans=0.0 2024-09-16 14:52:29,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=460249.5, ans=0.2 2024-09-16 14:53:03,974 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.148e+02 2.328e+02 2.504e+02 3.630e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-16 14:53:04,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=460306.1666666667, ans=0.07 2024-09-16 14:53:16,071 INFO [train.py:1198] (1/2) Epoch 26, batch 2700, loss[loss=0.2111, ctc_loss=0.1404, cr_loss=0.3533, over 20831.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.3748, over 4091718.35 frames. ], batch size: 59, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:53:17,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460334.5, ans=0.1 2024-09-16 14:53:24,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=22.5 2024-09-16 14:53:46,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=460391.1666666667, ans=0.0 2024-09-16 14:54:19,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=460447.8333333333, ans=0.125 2024-09-16 14:54:24,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=460447.8333333333, ans=0.125 2024-09-16 14:54:31,821 INFO [train.py:1198] (1/2) Epoch 26, batch 2750, loss[loss=0.2407, ctc_loss=0.1597, cr_loss=0.405, over 20698.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3764, over 4086851.64 frames. ], batch size: 71, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:54:47,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=460504.5, ans=0.125 2024-09-16 14:55:14,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=460532.8333333333, ans=0.2 2024-09-16 14:55:36,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.176e+02 2.274e+02 2.489e+02 3.965e+02, threshold=4.548e+02, percent-clipped=0.0 2024-09-16 14:55:40,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460589.5, ans=0.125 2024-09-16 14:55:40,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=460589.5, ans=0.2 2024-09-16 14:55:50,346 INFO [train.py:1198] (1/2) Epoch 26, batch 2800, loss[loss=0.2321, ctc_loss=0.1558, cr_loss=0.3817, over 20995.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.377, over 4092837.44 frames. ], batch size: 61, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:55:50,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=460617.8333333333, ans=0.0 2024-09-16 14:56:28,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=460674.5, ans=0.5 2024-09-16 14:56:46,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=460702.8333333333, ans=0.125 2024-09-16 14:57:04,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=460759.5, ans=0.0 2024-09-16 14:57:06,129 INFO [train.py:1198] (1/2) Epoch 26, batch 2850, loss[loss=0.2011, ctc_loss=0.1356, cr_loss=0.3279, over 20980.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1545, cr_loss=0.378, over 4088088.07 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:57:09,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=22.5 2024-09-16 14:57:26,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-16 14:57:36,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=460816.1666666667, ans=0.2 2024-09-16 14:58:04,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460844.5, ans=0.1 2024-09-16 14:58:05,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.58 vs. limit=6.0 2024-09-16 14:58:06,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=460844.5, ans=0.125 2024-09-16 14:58:08,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-16 14:58:12,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=460872.8333333333, ans=10.0 2024-09-16 14:58:13,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.101e+02 2.214e+02 2.418e+02 3.989e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 14:58:18,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=460872.8333333333, ans=0.125 2024-09-16 14:58:18,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=460872.8333333333, ans=0.125 2024-09-16 14:58:24,445 INFO [train.py:1198] (1/2) Epoch 26, batch 2900, loss[loss=0.2333, ctc_loss=0.1576, cr_loss=0.3783, over 20820.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.3782, over 4084368.47 frames. ], batch size: 59, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:58:33,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=12.0 2024-09-16 14:59:40,284 INFO [train.py:1198] (1/2) Epoch 26, batch 2950, loss[loss=0.2136, ctc_loss=0.1447, cr_loss=0.3447, over 21000.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.3779, over 4092652.16 frames. ], batch size: 61, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 15:00:08,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461071.1666666667, ans=0.0 2024-09-16 15:00:23,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461099.5, ans=0.125 2024-09-16 15:00:24,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=461127.8333333333, ans=0.125 2024-09-16 15:00:38,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=461127.8333333333, ans=0.125 2024-09-16 15:00:45,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.151e+02 2.273e+02 2.442e+02 6.156e+02, threshold=4.546e+02, percent-clipped=1.0 2024-09-16 15:00:49,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=461156.1666666667, ans=0.0 2024-09-16 15:00:56,234 INFO [train.py:1198] (1/2) Epoch 26, batch 3000, loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3845, over 20521.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1553, cr_loss=0.3787, over 4072607.01 frames. ], batch size: 75, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 15:00:56,234 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 15:01:22,059 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.04181, ctc_loss=0.04181, cr_loss=1.185e-14, over 944034.00 frames. 2024-09-16 15:01:22,059 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 15:01:48,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461212.8333333333, ans=0.125 2024-09-16 15:01:57,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461241.1666666667, ans=0.0 2024-09-16 15:02:12,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=461269.5, ans=0.125 2024-09-16 15:02:27,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2024-09-16 15:02:28,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=461297.8333333333, ans=0.0 2024-09-16 15:02:37,119 INFO [train.py:1198] (1/2) Epoch 26, batch 3050, loss[loss=0.1911, ctc_loss=0.1275, cr_loss=0.3179, over 19927.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1553, cr_loss=0.3794, over 4081715.04 frames. ], batch size: 44, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:03:03,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=461354.5, ans=0.04949747468305833 2024-09-16 15:03:47,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.072e+02 2.240e+02 2.442e+02 3.474e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-16 15:03:56,066 INFO [train.py:1198] (1/2) Epoch 26, batch 3100, loss[loss=0.227, ctc_loss=0.1537, cr_loss=0.3668, over 20881.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1553, cr_loss=0.3792, over 4086620.41 frames. ], batch size: 65, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:03:59,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=461467.8333333333, ans=0.125 2024-09-16 15:04:00,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=461467.8333333333, ans=0.2 2024-09-16 15:04:35,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=461524.5, ans=0.025 2024-09-16 15:04:37,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=461524.5, ans=0.125 2024-09-16 15:05:11,838 INFO [train.py:1198] (1/2) Epoch 26, batch 3150, loss[loss=0.2176, ctc_loss=0.1441, cr_loss=0.3674, over 20934.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1545, cr_loss=0.3776, over 4092278.18 frames. ], batch size: 60, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:05:18,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2024-09-16 15:05:27,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=461637.8333333333, ans=0.0 2024-09-16 15:05:28,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=461637.8333333333, ans=0.0 2024-09-16 15:05:29,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.90 vs. limit=10.0 2024-09-16 15:05:31,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=461637.8333333333, ans=0.125 2024-09-16 15:06:12,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2024-09-16 15:06:18,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.175e+02 2.280e+02 2.487e+02 8.229e+02, threshold=4.561e+02, percent-clipped=2.0 2024-09-16 15:06:28,058 INFO [train.py:1198] (1/2) Epoch 26, batch 3200, loss[loss=0.2518, ctc_loss=0.1701, cr_loss=0.4084, over 20824.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1553, cr_loss=0.3787, over 4097390.70 frames. ], batch size: 59, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:06:39,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=461751.1666666667, ans=0.2 2024-09-16 15:07:26,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=461836.1666666667, ans=0.0 2024-09-16 15:07:46,856 INFO [train.py:1198] (1/2) Epoch 26, batch 3250, loss[loss=0.2147, ctc_loss=0.1404, cr_loss=0.3713, over 20890.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1549, cr_loss=0.3778, over 4097853.17 frames. ], batch size: 54, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:08:02,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2024-09-16 15:08:12,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=461921.1666666667, ans=0.125 2024-09-16 15:08:35,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-16 15:08:45,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=462006.1666666667, ans=0.125 2024-09-16 15:08:53,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.282e+02 2.441e+02 3.355e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 15:09:02,405 INFO [train.py:1198] (1/2) Epoch 26, batch 3300, loss[loss=0.1973, ctc_loss=0.1282, cr_loss=0.3453, over 21059.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3765, over 4100203.16 frames. ], batch size: 59, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:09:07,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=22.5 2024-09-16 15:09:16,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=462034.5, ans=0.025 2024-09-16 15:09:19,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=462062.8333333333, ans=0.125 2024-09-16 15:09:21,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-16 15:09:26,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=462062.8333333333, ans=0.0 2024-09-16 15:09:40,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=462091.1666666667, ans=0.125 2024-09-16 15:09:52,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462119.5, ans=0.125 2024-09-16 15:10:20,727 INFO [train.py:1198] (1/2) Epoch 26, batch 3350, loss[loss=0.2008, ctc_loss=0.1332, cr_loss=0.338, over 20972.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.376, over 4102425.27 frames. ], batch size: 48, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:10:21,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=462176.1666666667, ans=0.04949747468305833 2024-09-16 15:11:13,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=462261.1666666667, ans=0.2 2024-09-16 15:11:28,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.084e+02 2.230e+02 2.427e+02 3.051e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-16 15:11:35,999 INFO [train.py:1198] (1/2) Epoch 26, batch 3400, loss[loss=0.2728, ctc_loss=0.1946, cr_loss=0.3909, over 14521.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.3763, over 4100975.36 frames. ], batch size: 149, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:11:39,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=462317.8333333333, ans=0.125 2024-09-16 15:11:47,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=462317.8333333333, ans=0.0 2024-09-16 15:12:00,939 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:12:31,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462402.8333333333, ans=0.1 2024-09-16 15:12:52,455 INFO [train.py:1198] (1/2) Epoch 26, batch 3450, loss[loss=0.2006, ctc_loss=0.1352, cr_loss=0.3274, over 20959.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1528, cr_loss=0.3744, over 4104013.83 frames. ], batch size: 50, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:13:46,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=462544.5, ans=0.07 2024-09-16 15:13:51,332 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:13:55,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=462572.8333333333, ans=0.2 2024-09-16 15:13:59,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-09-16 15:14:00,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=462572.8333333333, ans=0.0 2024-09-16 15:14:03,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.154e+02 2.307e+02 2.519e+02 3.827e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-16 15:14:10,909 INFO [train.py:1198] (1/2) Epoch 26, batch 3500, loss[loss=0.2864, ctc_loss=0.2031, cr_loss=0.4166, over 14075.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1545, cr_loss=0.3769, over 4091047.38 frames. ], batch size: 149, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:14:29,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=462629.5, ans=0.125 2024-09-16 15:14:31,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462629.5, ans=0.125 2024-09-16 15:15:29,942 INFO [train.py:1198] (1/2) Epoch 26, batch 3550, loss[loss=0.2837, ctc_loss=0.1928, cr_loss=0.4545, over 20711.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1553, cr_loss=0.3786, over 4088356.03 frames. ], batch size: 71, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:15:33,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462742.8333333333, ans=0.125 2024-09-16 15:16:37,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.125e+02 2.236e+02 2.400e+02 6.141e+02, threshold=4.472e+02, percent-clipped=1.0 2024-09-16 15:16:45,234 INFO [train.py:1198] (1/2) Epoch 26, batch 3600, loss[loss=0.2285, ctc_loss=0.1481, cr_loss=0.4019, over 20870.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3784, over 4089112.41 frames. ], batch size: 54, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:17:12,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=462912.8333333333, ans=0.125 2024-09-16 15:17:32,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462969.5, ans=0.1 2024-09-16 15:17:44,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462997.8333333333, ans=0.1 2024-09-16 15:17:56,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462997.8333333333, ans=0.125 2024-09-16 15:18:01,037 INFO [train.py:1198] (1/2) Epoch 26, batch 3650, loss[loss=0.2361, ctc_loss=0.1579, cr_loss=0.3908, over 20776.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.3791, over 4091569.77 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:18:05,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463026.1666666667, ans=0.1 2024-09-16 15:18:24,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=12.0 2024-09-16 15:18:26,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=463054.5, ans=0.0 2024-09-16 15:18:43,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=8.0 2024-09-16 15:19:13,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.145e+02 2.278e+02 2.496e+02 3.091e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-16 15:19:19,979 INFO [train.py:1198] (1/2) Epoch 26, batch 3700, loss[loss=0.2157, ctc_loss=0.1439, cr_loss=0.3588, over 20997.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.156, cr_loss=0.3801, over 4098760.64 frames. ], batch size: 52, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:19:23,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-09-16 15:19:26,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=463167.8333333333, ans=0.125 2024-09-16 15:20:02,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-16 15:20:23,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=463281.1666666667, ans=0.025 2024-09-16 15:20:33,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463309.5, ans=0.1 2024-09-16 15:20:34,909 INFO [train.py:1198] (1/2) Epoch 26, batch 3750, loss[loss=0.2053, ctc_loss=0.1339, cr_loss=0.3569, over 20938.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1548, cr_loss=0.3787, over 4096423.92 frames. ], batch size: 50, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:20:40,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=22.5 2024-09-16 15:20:53,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-16 15:21:28,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=463394.5, ans=0.025 2024-09-16 15:21:47,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.062e+02 2.213e+02 2.366e+02 2.977e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 15:21:53,247 INFO [train.py:1198] (1/2) Epoch 26, batch 3800, loss[loss=0.2198, ctc_loss=0.1464, cr_loss=0.3667, over 20796.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1554, cr_loss=0.3799, over 4095301.42 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:21:57,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=463451.1666666667, ans=0.125 2024-09-16 15:22:12,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=463479.5, ans=0.0 2024-09-16 15:22:16,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-16 15:23:08,660 INFO [train.py:1198] (1/2) Epoch 26, batch 3850, loss[loss=0.2203, ctc_loss=0.1466, cr_loss=0.3685, over 20870.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1549, cr_loss=0.3792, over 4099070.84 frames. ], batch size: 54, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:23:31,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=463621.1666666667, ans=0.0 2024-09-16 15:23:45,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=463649.5, ans=0.0 2024-09-16 15:23:50,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=12.0 2024-09-16 15:23:56,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=463677.8333333333, ans=0.04949747468305833 2024-09-16 15:24:03,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=463677.8333333333, ans=0.125 2024-09-16 15:24:12,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463706.1666666667, ans=0.1 2024-09-16 15:24:21,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.136e+02 2.274e+02 2.464e+02 7.165e+02, threshold=4.548e+02, percent-clipped=1.0 2024-09-16 15:24:27,161 INFO [train.py:1198] (1/2) Epoch 26, batch 3900, loss[loss=0.1864, ctc_loss=0.123, cr_loss=0.3174, over 20965.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1552, cr_loss=0.3794, over 4082076.29 frames. ], batch size: 50, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:24:33,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=463734.5, ans=0.125 2024-09-16 15:24:44,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=463762.8333333333, ans=0.0 2024-09-16 15:25:29,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=22.5 2024-09-16 15:25:43,473 INFO [train.py:1198] (1/2) Epoch 26, batch 3950, loss[loss=0.2143, ctc_loss=0.1432, cr_loss=0.3557, over 21085.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.3779, over 4077249.89 frames. ], batch size: 59, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:25:59,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=463904.5, ans=0.2 2024-09-16 15:26:48,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=463989.5, ans=0.125 2024-09-16 15:26:56,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.103e+02 2.186e+02 2.306e+02 3.813e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-16 15:27:02,321 INFO [train.py:1198] (1/2) Epoch 26, batch 4000, loss[loss=0.2538, ctc_loss=0.1677, cr_loss=0.4302, over 20966.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1548, cr_loss=0.3787, over 4094252.74 frames. ], batch size: 64, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:27:16,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=464046.1666666667, ans=0.025 2024-09-16 15:27:31,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-09-16 15:27:41,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-16 15:27:46,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=464102.8333333333, ans=0.125 2024-09-16 15:28:00,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2024-09-16 15:28:04,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-16 15:28:06,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.96 vs. limit=10.0 2024-09-16 15:28:08,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464131.1666666667, ans=0.1 2024-09-16 15:28:18,836 INFO [train.py:1198] (1/2) Epoch 26, batch 4050, loss[loss=0.265, ctc_loss=0.1876, cr_loss=0.3871, over 14010.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1546, cr_loss=0.378, over 4094281.93 frames. ], batch size: 149, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:28:32,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464187.8333333333, ans=0.1 2024-09-16 15:28:33,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=12.0 2024-09-16 15:28:42,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=464187.8333333333, ans=0.125 2024-09-16 15:28:51,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=464216.1666666667, ans=0.0 2024-09-16 15:28:51,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=464216.1666666667, ans=0.025 2024-09-16 15:28:58,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=464216.1666666667, ans=0.0 2024-09-16 15:29:10,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=464244.5, ans=0.04949747468305833 2024-09-16 15:29:25,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=464272.8333333333, ans=0.2 2024-09-16 15:29:29,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.121e+02 2.242e+02 2.394e+02 2.748e+02, threshold=4.485e+02, percent-clipped=0.0 2024-09-16 15:29:34,335 INFO [train.py:1198] (1/2) Epoch 26, batch 4100, loss[loss=0.2192, ctc_loss=0.1491, cr_loss=0.3507, over 21020.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1536, cr_loss=0.3767, over 4102068.74 frames. ], batch size: 61, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:29:56,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=464329.5, ans=0.0 2024-09-16 15:29:57,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=464329.5, ans=0.015 2024-09-16 15:30:26,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=464386.1666666667, ans=0.0 2024-09-16 15:30:32,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=464386.1666666667, ans=0.025 2024-09-16 15:30:50,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=464414.5, ans=0.04949747468305833 2024-09-16 15:30:53,253 INFO [train.py:1198] (1/2) Epoch 26, batch 4150, loss[loss=0.2452, ctc_loss=0.1662, cr_loss=0.3953, over 21095.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3769, over 4085330.72 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:31:00,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2024-09-16 15:31:40,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 15:32:03,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.115e+02 2.247e+02 2.423e+02 3.297e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 15:32:08,319 INFO [train.py:1198] (1/2) Epoch 26, batch 4200, loss[loss=0.2362, ctc_loss=0.1584, cr_loss=0.3887, over 20851.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3765, over 4087040.33 frames. ], batch size: 65, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:32:28,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=464612.8333333333, ans=10.0 2024-09-16 15:32:43,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-16 15:32:54,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=464641.1666666667, ans=0.0 2024-09-16 15:33:04,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464669.5, ans=0.1 2024-09-16 15:33:07,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=464669.5, ans=0.125 2024-09-16 15:33:07,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=464669.5, ans=0.2 2024-09-16 15:33:26,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=464726.1666666667, ans=0.125 2024-09-16 15:33:27,987 INFO [train.py:1198] (1/2) Epoch 26, batch 4250, loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3661, over 20986.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3769, over 4085820.58 frames. ], batch size: 52, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:33:28,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=464726.1666666667, ans=0.2 2024-09-16 15:33:28,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-16 15:33:54,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=464754.5, ans=0.0 2024-09-16 15:34:36,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=464839.5, ans=0.125 2024-09-16 15:34:39,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.159e+02 2.292e+02 2.417e+02 4.857e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-16 15:34:40,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-16 15:34:41,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-16 15:34:43,338 INFO [train.py:1198] (1/2) Epoch 26, batch 4300, loss[loss=0.2227, ctc_loss=0.1486, cr_loss=0.3707, over 20962.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3761, over 4092463.10 frames. ], batch size: 55, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:34:45,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=464867.8333333333, ans=0.07 2024-09-16 15:35:18,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=464924.5, ans=0.0 2024-09-16 15:35:20,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=464924.5, ans=0.025 2024-09-16 15:35:29,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=464952.8333333333, ans=0.125 2024-09-16 15:35:30,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=464952.8333333333, ans=0.125 2024-09-16 15:35:43,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464981.1666666667, ans=0.125 2024-09-16 15:36:02,489 INFO [train.py:1198] (1/2) Epoch 26, batch 4350, loss[loss=0.2607, ctc_loss=0.1735, cr_loss=0.4361, over 20671.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1534, cr_loss=0.3755, over 4087773.63 frames. ], batch size: 66, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:36:16,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=465037.8333333333, ans=0.125 2024-09-16 15:36:28,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=465037.8333333333, ans=0.125 2024-09-16 15:36:47,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=465094.5, ans=0.125 2024-09-16 15:37:00,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=465094.5, ans=0.125 2024-09-16 15:37:13,807 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.115e+02 2.278e+02 2.465e+02 4.083e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 15:37:17,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-09-16 15:37:18,451 INFO [train.py:1198] (1/2) Epoch 26, batch 4400, loss[loss=0.2486, ctc_loss=0.166, cr_loss=0.4129, over 20867.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3757, over 4106960.71 frames. ], batch size: 57, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:37:18,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465151.1666666667, ans=0.1 2024-09-16 15:37:35,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=465179.5, ans=0.0 2024-09-16 15:37:51,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465207.8333333333, ans=0.1 2024-09-16 15:37:51,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465207.8333333333, ans=0.125 2024-09-16 15:37:55,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=465207.8333333333, ans=0.0 2024-09-16 15:38:01,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=465207.8333333333, ans=0.0 2024-09-16 15:38:07,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465236.1666666667, ans=0.1 2024-09-16 15:38:35,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465264.5, ans=0.125 2024-09-16 15:38:37,948 INFO [train.py:1198] (1/2) Epoch 26, batch 4450, loss[loss=0.1944, ctc_loss=0.1296, cr_loss=0.3236, over 20951.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1524, cr_loss=0.3745, over 4104419.42 frames. ], batch size: 49, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:38:44,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=465292.8333333333, ans=0.04949747468305833 2024-09-16 15:39:14,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2024-09-16 15:39:14,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=465349.5, ans=0.125 2024-09-16 15:39:22,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=15.0 2024-09-16 15:39:49,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.121e+02 2.254e+02 2.431e+02 3.646e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-16 15:39:53,808 INFO [train.py:1198] (1/2) Epoch 26, batch 4500, loss[loss=0.2358, ctc_loss=0.1589, cr_loss=0.3846, over 21069.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3748, over 4105569.47 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:40:01,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=465434.5, ans=0.125 2024-09-16 15:40:16,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=465462.8333333333, ans=0.125 2024-09-16 15:40:42,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=465519.5, ans=0.125 2024-09-16 15:41:09,453 INFO [train.py:1198] (1/2) Epoch 26, batch 4550, loss[loss=0.2259, ctc_loss=0.1523, cr_loss=0.3678, over 20841.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3753, over 4110511.70 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:41:16,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=24.95 vs. limit=15.0 2024-09-16 15:41:28,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=465604.5, ans=0.2 2024-09-16 15:41:48,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-16 15:42:23,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.165e+02 2.290e+02 2.435e+02 3.795e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-16 15:42:28,286 INFO [train.py:1198] (1/2) Epoch 26, batch 4600, loss[loss=0.2499, ctc_loss=0.1671, cr_loss=0.414, over 21002.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3767, over 4101631.29 frames. ], batch size: 61, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:42:54,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=22.5 2024-09-16 15:43:10,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=22.5 2024-09-16 15:43:26,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:43:44,113 INFO [train.py:1198] (1/2) Epoch 26, batch 4650, loss[loss=0.2428, ctc_loss=0.1656, cr_loss=0.3861, over 21045.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3766, over 4103152.29 frames. ], batch size: 62, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:44:34,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2024-09-16 15:44:48,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=465972.8333333333, ans=0.025 2024-09-16 15:44:58,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.162e+02 2.291e+02 2.463e+02 5.803e+02, threshold=4.581e+02, percent-clipped=1.0 2024-09-16 15:45:03,080 INFO [train.py:1198] (1/2) Epoch 26, batch 4700, loss[loss=0.2092, ctc_loss=0.139, cr_loss=0.3511, over 20964.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3755, over 4092235.22 frames. ], batch size: 49, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:45:12,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=466001.1666666667, ans=0.2 2024-09-16 15:46:05,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=466114.5, ans=0.125 2024-09-16 15:46:10,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.42 vs. limit=6.0 2024-09-16 15:46:19,020 INFO [train.py:1198] (1/2) Epoch 26, batch 4750, loss[loss=0.2088, ctc_loss=0.1405, cr_loss=0.3412, over 21068.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.153, cr_loss=0.3747, over 4087503.93 frames. ], batch size: 53, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:46:20,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=466142.8333333333, ans=0.125 2024-09-16 15:46:26,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=466142.8333333333, ans=0.0 2024-09-16 15:46:27,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=22.5 2024-09-16 15:46:51,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=466199.5, ans=0.0 2024-09-16 15:47:29,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466256.1666666667, ans=0.1 2024-09-16 15:47:33,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.136e+02 2.273e+02 2.486e+02 3.452e+02, threshold=4.546e+02, percent-clipped=0.0 2024-09-16 15:47:37,848 INFO [train.py:1198] (1/2) Epoch 26, batch 4800, loss[loss=0.2499, ctc_loss=0.1665, cr_loss=0.4172, over 21064.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1528, cr_loss=0.3747, over 4090469.84 frames. ], batch size: 56, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:47:45,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=466284.5, ans=0.0 2024-09-16 15:48:02,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=466312.8333333333, ans=0.125 2024-09-16 15:48:54,224 INFO [train.py:1198] (1/2) Epoch 26, batch 4850, loss[loss=0.1781, ctc_loss=0.1176, cr_loss=0.3027, over 20956.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.152, cr_loss=0.3737, over 4093581.58 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:48:56,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=466426.1666666667, ans=0.125 2024-09-16 15:49:13,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-16 15:49:31,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466482.8333333333, ans=0.1 2024-09-16 15:49:31,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=466482.8333333333, ans=0.2 2024-09-16 15:49:56,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=466539.5, ans=0.2 2024-09-16 15:50:08,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.094e+02 2.202e+02 2.350e+02 3.828e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 15:50:12,899 INFO [train.py:1198] (1/2) Epoch 26, batch 4900, loss[loss=0.1989, ctc_loss=0.1303, cr_loss=0.3428, over 20987.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3756, over 4089578.49 frames. ], batch size: 51, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:50:22,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466567.8333333333, ans=0.125 2024-09-16 15:51:13,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=466681.1666666667, ans=0.04949747468305833 2024-09-16 15:51:26,622 INFO [train.py:1198] (1/2) Epoch 26, batch 4950, loss[loss=0.1919, ctc_loss=0.1257, cr_loss=0.3312, over 20925.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3764, over 4088157.90 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:52:02,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=22.5 2024-09-16 15:52:35,980 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.122e+02 2.242e+02 2.372e+02 3.200e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 15:52:40,657 INFO [train.py:1198] (1/2) Epoch 26, batch 5000, loss[loss=0.2501, ctc_loss=0.1662, cr_loss=0.4196, over 20665.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3763, over 4075254.15 frames. ], batch size: 68, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:53:25,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=466936.1666666667, ans=0.125 2024-09-16 15:53:26,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=466936.1666666667, ans=0.125 2024-09-16 15:53:49,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-09-16 15:53:55,275 INFO [train.py:1198] (1/2) Epoch 26, batch 5050, loss[loss=0.2326, ctc_loss=0.1537, cr_loss=0.3942, over 21012.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3767, over 4073678.22 frames. ], batch size: 63, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:53:55,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=466992.8333333333, ans=0.025 2024-09-16 15:54:02,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=466992.8333333333, ans=0.05 2024-09-16 15:54:09,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=467021.1666666667, ans=0.125 2024-09-16 15:54:42,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=467077.8333333333, ans=0.125 2024-09-16 15:54:51,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=467077.8333333333, ans=10.0 2024-09-16 15:54:54,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=467106.1666666667, ans=0.0 2024-09-16 15:55:04,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.108e+02 2.244e+02 2.429e+02 3.114e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 15:55:06,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=467106.1666666667, ans=0.2 2024-09-16 15:55:09,362 INFO [train.py:1198] (1/2) Epoch 26, batch 5100, loss[loss=0.2369, ctc_loss=0.1589, cr_loss=0.3903, over 20888.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1539, cr_loss=0.3764, over 4084933.14 frames. ], batch size: 57, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:55:40,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=15.0 2024-09-16 15:55:52,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=467191.1666666667, ans=0.0 2024-09-16 15:56:26,769 INFO [train.py:1198] (1/2) Epoch 26, batch 5150, loss[loss=0.2322, ctc_loss=0.154, cr_loss=0.3907, over 20704.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3772, over 4085762.39 frames. ], batch size: 68, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:56:42,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=467304.5, ans=0.025 2024-09-16 15:56:51,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467304.5, ans=0.1 2024-09-16 15:56:51,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-16 15:56:52,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=467304.5, ans=0.125 2024-09-16 15:56:53,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=467304.5, ans=0.125 2024-09-16 15:56:56,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=467332.8333333333, ans=0.125 2024-09-16 15:57:01,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467332.8333333333, ans=0.1 2024-09-16 15:57:29,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=12.0 2024-09-16 15:57:32,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467389.5, ans=0.125 2024-09-16 15:57:36,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.070e+02 2.195e+02 2.383e+02 3.364e+02, threshold=4.389e+02, percent-clipped=0.0 2024-09-16 15:57:40,999 INFO [train.py:1198] (1/2) Epoch 26, batch 5200, loss[loss=0.2256, ctc_loss=0.1522, cr_loss=0.3671, over 21033.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1546, cr_loss=0.377, over 4087315.43 frames. ], batch size: 62, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:58:11,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=467474.5, ans=0.0 2024-09-16 15:58:47,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=467531.1666666667, ans=0.0 2024-09-16 15:58:55,831 INFO [train.py:1198] (1/2) Epoch 26, batch 5250, loss[loss=0.2912, ctc_loss=0.2023, cr_loss=0.4446, over 14618.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3765, over 4074332.43 frames. ], batch size: 149, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:59:23,048 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:59:39,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=467616.1666666667, ans=0.125 2024-09-16 15:59:58,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=467672.8333333333, ans=0.125 2024-09-16 16:00:08,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.171e+02 2.295e+02 2.457e+02 5.717e+02, threshold=4.590e+02, percent-clipped=1.0 2024-09-16 16:00:13,237 INFO [train.py:1198] (1/2) Epoch 26, batch 5300, loss[loss=0.2059, ctc_loss=0.1389, cr_loss=0.3353, over 20845.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1544, cr_loss=0.3768, over 4077188.68 frames. ], batch size: 59, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:00:16,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=467701.1666666667, ans=0.0 2024-09-16 16:00:28,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=467729.5, ans=0.2 2024-09-16 16:00:54,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=467757.8333333333, ans=0.5 2024-09-16 16:01:00,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=467786.1666666667, ans=0.125 2024-09-16 16:01:27,959 INFO [train.py:1198] (1/2) Epoch 26, batch 5350, loss[loss=0.2375, ctc_loss=0.1578, cr_loss=0.3986, over 20945.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.3759, over 4092143.07 frames. ], batch size: 60, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:02:05,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=467899.5, ans=15.0 2024-09-16 16:02:30,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=467956.1666666667, ans=0.125 2024-09-16 16:02:37,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.128e+02 2.209e+02 2.382e+02 3.490e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-16 16:02:38,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-16 16:02:42,232 INFO [train.py:1198] (1/2) Epoch 26, batch 5400, loss[loss=0.2107, ctc_loss=0.1399, cr_loss=0.354, over 21049.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1529, cr_loss=0.3755, over 4102693.11 frames. ], batch size: 53, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:02:59,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-09-16 16:03:04,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468012.8333333333, ans=0.1 2024-09-16 16:03:11,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-16 16:03:19,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=468041.1666666667, ans=0.125 2024-09-16 16:03:57,090 INFO [train.py:1198] (1/2) Epoch 26, batch 5450, loss[loss=0.231, ctc_loss=0.1531, cr_loss=0.3898, over 21003.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.3752, over 4102235.02 frames. ], batch size: 61, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:04:39,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=468182.8333333333, ans=0.125 2024-09-16 16:05:08,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.091e+02 2.226e+02 2.391e+02 3.014e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-16 16:05:09,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=468239.5, ans=0.125 2024-09-16 16:05:12,003 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:05:13,221 INFO [train.py:1198] (1/2) Epoch 26, batch 5500, loss[loss=0.2569, ctc_loss=0.1745, cr_loss=0.4119, over 20854.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3756, over 4113345.57 frames. ], batch size: 65, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:05:52,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=468324.5, ans=0.125 2024-09-16 16:06:03,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=468352.8333333333, ans=0.1 2024-09-16 16:06:17,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=468381.1666666667, ans=0.2 2024-09-16 16:06:27,202 INFO [train.py:1198] (1/2) Epoch 26, batch 5550, loss[loss=0.2857, ctc_loss=0.2017, cr_loss=0.4198, over 14768.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3764, over 4099260.86 frames. ], batch size: 149, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:06:45,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=468437.8333333333, ans=0.015 2024-09-16 16:07:14,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=468494.5, ans=0.0 2024-09-16 16:07:34,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-09-16 16:07:36,783 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.102e+02 2.239e+02 2.400e+02 3.458e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 16:07:41,356 INFO [train.py:1198] (1/2) Epoch 26, batch 5600, loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3729, over 20988.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1526, cr_loss=0.3742, over 4102481.42 frames. ], batch size: 55, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:07:43,232 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:07:49,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=468551.1666666667, ans=0.0 2024-09-16 16:07:55,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=468579.5, ans=0.125 2024-09-16 16:08:04,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=468579.5, ans=0.0 2024-09-16 16:08:30,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468636.1666666667, ans=0.1 2024-09-16 16:08:33,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=468636.1666666667, ans=0.0 2024-09-16 16:08:52,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-16 16:08:58,188 INFO [train.py:1198] (1/2) Epoch 26, batch 5650, loss[loss=0.1771, ctc_loss=0.1143, cr_loss=0.3138, over 20968.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1529, cr_loss=0.3746, over 4090289.99 frames. ], batch size: 50, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:08:58,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=468692.8333333333, ans=0.125 2024-09-16 16:09:38,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468749.5, ans=0.1 2024-09-16 16:09:44,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=468777.8333333333, ans=22.5 2024-09-16 16:09:59,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468806.1666666667, ans=0.125 2024-09-16 16:10:08,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.097e+02 2.214e+02 2.376e+02 4.979e+02, threshold=4.428e+02, percent-clipped=2.0 2024-09-16 16:10:12,679 INFO [train.py:1198] (1/2) Epoch 26, batch 5700, loss[loss=0.2685, ctc_loss=0.1851, cr_loss=0.4168, over 20073.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1536, cr_loss=0.375, over 4075779.89 frames. ], batch size: 80, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:10:14,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-16 16:10:18,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=468834.5, ans=0.125 2024-09-16 16:10:51,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=468891.1666666667, ans=0.0 2024-09-16 16:11:07,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468919.5, ans=0.125 2024-09-16 16:11:09,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=468919.5, ans=0.0 2024-09-16 16:11:18,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=468947.8333333333, ans=0.125 2024-09-16 16:11:26,924 INFO [train.py:1198] (1/2) Epoch 26, batch 5750, loss[loss=0.2197, ctc_loss=0.1459, cr_loss=0.3689, over 21019.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1527, cr_loss=0.3736, over 4084357.23 frames. ], batch size: 61, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:11:52,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=469004.5, ans=0.125 2024-09-16 16:11:53,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469004.5, ans=0.1 2024-09-16 16:12:36,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.114e+02 2.240e+02 2.384e+02 3.175e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 16:12:40,980 INFO [train.py:1198] (1/2) Epoch 26, batch 5800, loss[loss=0.2679, ctc_loss=0.1856, cr_loss=0.4116, over 19511.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1536, cr_loss=0.3749, over 4073405.28 frames. ], batch size: 90, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:12:47,195 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:12:55,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2024-09-16 16:13:20,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=469174.5, ans=0.125 2024-09-16 16:13:43,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469231.1666666667, ans=0.125 2024-09-16 16:13:57,609 INFO [train.py:1198] (1/2) Epoch 26, batch 5850, loss[loss=0.1952, ctc_loss=0.1298, cr_loss=0.3269, over 19901.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.3757, over 4075570.48 frames. ], batch size: 44, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:15:07,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.069e+02 2.237e+02 2.386e+02 3.435e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 16:15:08,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-16 16:15:12,152 INFO [train.py:1198] (1/2) Epoch 26, batch 5900, loss[loss=0.2355, ctc_loss=0.1571, cr_loss=0.3923, over 20980.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1531, cr_loss=0.3747, over 4088079.67 frames. ], batch size: 55, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:15:17,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=15.0 2024-09-16 16:15:32,096 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:15:36,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=469429.5, ans=0.0 2024-09-16 16:16:07,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=469486.1666666667, ans=0.125 2024-09-16 16:16:09,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-16 16:16:10,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-16 16:16:26,419 INFO [train.py:1198] (1/2) Epoch 26, batch 5950, loss[loss=0.2723, ctc_loss=0.1887, cr_loss=0.418, over 18574.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1531, cr_loss=0.3747, over 4075705.40 frames. ], batch size: 108, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:16:28,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=469542.8333333333, ans=0.0 2024-09-16 16:16:35,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=469542.8333333333, ans=0.125 2024-09-16 16:16:41,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=469571.1666666667, ans=0.2 2024-09-16 16:16:48,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-16 16:17:35,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=469656.1666666667, ans=0.0 2024-09-16 16:17:38,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.121e+02 2.246e+02 2.355e+02 2.859e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 16:17:43,104 INFO [train.py:1198] (1/2) Epoch 26, batch 6000, loss[loss=0.2586, ctc_loss=0.1742, cr_loss=0.4221, over 20965.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.154, cr_loss=0.376, over 4076625.25 frames. ], batch size: 64, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:17:43,104 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 16:18:05,668 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.0423, ctc_loss=0.0423, cr_loss=1.197e-14, over 944034.00 frames. 2024-09-16 16:18:05,669 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 16:18:10,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=469684.5, ans=0.125 2024-09-16 16:18:16,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=469684.5, ans=0.0 2024-09-16 16:19:20,372 INFO [train.py:1198] (1/2) Epoch 26, batch 6050, loss[loss=0.1969, ctc_loss=0.1303, cr_loss=0.333, over 20985.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3764, over 4066896.20 frames. ], batch size: 52, lr: 3.17e-03, grad_scale: 64.0 2024-09-16 16:19:21,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=469826.1666666667, ans=0.125 2024-09-16 16:19:50,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=22.5 2024-09-16 16:20:03,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=469882.8333333333, ans=0.0 2024-09-16 16:20:04,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=469911.1666666667, ans=0.0 2024-09-16 16:20:29,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=469939.5, ans=0.0 2024-09-16 16:20:31,747 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.107e+02 2.248e+02 2.460e+02 4.227e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 16:20:36,267 INFO [train.py:1198] (1/2) Epoch 26, batch 6100, loss[loss=0.2124, ctc_loss=0.1391, cr_loss=0.3666, over 20986.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1534, cr_loss=0.3754, over 4075323.78 frames. ], batch size: 55, lr: 3.17e-03, grad_scale: 64.0 2024-09-16 16:21:51,274 INFO [train.py:1198] (1/2) Epoch 26, batch 6150, loss[loss=0.2467, ctc_loss=0.1705, cr_loss=0.3812, over 14000.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.153, cr_loss=0.3745, over 4072325.33 frames. ], batch size: 149, lr: 3.16e-03, grad_scale: 64.0 2024-09-16 16:22:12,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=470137.8333333333, ans=0.025 2024-09-16 16:22:32,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=470166.1666666667, ans=0.0 2024-09-16 16:23:00,416 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.124e+02 2.281e+02 2.478e+02 3.468e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-16 16:23:03,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=470251.1666666667, ans=0.0 2024-09-16 16:23:04,723 INFO [train.py:1198] (1/2) Epoch 26, batch 6200, loss[loss=0.284, ctc_loss=0.1955, cr_loss=0.4424, over 18233.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3784, over 4060863.51 frames. ], batch size: 108, lr: 3.16e-03, grad_scale: 64.0 2024-09-16 16:23:45,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=22.5 2024-09-16 16:24:15,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=470364.5, ans=0.125 2024-09-16 16:24:17,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=470392.8333333333, ans=15.0 2024-09-16 16:24:18,211 INFO [train.py:1198] (1/2) Epoch 26, batch 6250, loss[loss=0.1977, ctc_loss=0.1311, cr_loss=0.3328, over 20889.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1571, cr_loss=0.3811, over 4049389.45 frames. ], batch size: 54, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:24:27,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=470392.8333333333, ans=0.125 2024-09-16 16:24:27,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-09-16 16:24:29,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=470392.8333333333, ans=0.0 2024-09-16 16:24:34,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=470421.1666666667, ans=0.0 2024-09-16 16:24:53,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=470449.5, ans=0.025 2024-09-16 16:25:07,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=470477.8333333333, ans=0.2 2024-09-16 16:25:27,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=470506.1666666667, ans=0.125 2024-09-16 16:25:30,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.173e+02 2.338e+02 2.531e+02 3.280e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-16 16:25:30,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=470506.1666666667, ans=0.0 2024-09-16 16:25:32,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=22.5 2024-09-16 16:25:32,973 INFO [train.py:1198] (1/2) Epoch 26, batch 6300, loss[loss=0.2429, ctc_loss=0.1638, cr_loss=0.3954, over 20880.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1587, cr_loss=0.3821, over 4010610.46 frames. ], batch size: 57, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:25:38,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-16 16:26:18,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=470619.5, ans=0.125 2024-09-16 16:26:40,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=470647.8333333333, ans=0.0 2024-09-16 16:26:42,691 INFO [train.py:1198] (1/2) Epoch 26, batch 6350, loss[loss=0.2712, ctc_loss=0.192, cr_loss=0.3959, over 14288.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1636, cr_loss=0.3854, over 3861151.49 frames. ], batch size: 150, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:26:43,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-16 16:26:58,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=470704.5, ans=0.125 2024-09-16 16:27:17,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-09-16 16:27:18,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=470732.8333333333, ans=0.0 2024-09-16 16:28:30,738 INFO [train.py:1198] (1/2) Epoch 27, batch 0, loss[loss=0.2241, ctc_loss=0.1509, cr_loss=0.3657, over 19499.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1509, cr_loss=0.3657, over 19499.00 frames. ], batch size: 43, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:28:30,738 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 16:28:39,708 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7650, 4.3322, 3.2911, 3.8400], device='cuda:1') 2024-09-16 16:28:49,100 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.04179, ctc_loss=0.04179, cr_loss=1.195e-14, over 944034.00 frames. 2024-09-16 16:28:49,100 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 16:29:00,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.404e+02 2.581e+02 2.766e+02 3.710e+02, threshold=5.163e+02, percent-clipped=0.0 2024-09-16 16:30:04,701 INFO [train.py:1198] (1/2) Epoch 27, batch 50, loss[loss=0.2077, ctc_loss=0.1378, cr_loss=0.3496, over 20869.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1562, cr_loss=0.38, over 916431.29 frames. ], batch size: 57, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:30:14,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=470931.1666666667, ans=0.125 2024-09-16 16:31:20,292 INFO [train.py:1198] (1/2) Epoch 27, batch 100, loss[loss=0.194, ctc_loss=0.1271, cr_loss=0.3345, over 20982.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1535, cr_loss=0.3752, over 1615244.34 frames. ], batch size: 50, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:31:29,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=471072.8333333333, ans=0.125 2024-09-16 16:31:31,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=471072.8333333333, ans=0.125 2024-09-16 16:31:32,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.089e+02 2.219e+02 2.330e+02 2.882e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 16:31:46,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=471101.1666666667, ans=0.0 2024-09-16 16:32:03,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-16 16:32:04,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=471129.5, ans=0.0 2024-09-16 16:32:41,981 INFO [train.py:1198] (1/2) Epoch 27, batch 150, loss[loss=0.1854, ctc_loss=0.122, cr_loss=0.317, over 20972.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3768, over 2161983.56 frames. ], batch size: 50, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:32:42,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=471214.5, ans=0.0 2024-09-16 16:32:52,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=471214.5, ans=0.125 2024-09-16 16:33:23,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-16 16:33:29,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=22.5 2024-09-16 16:33:34,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2024-09-16 16:33:58,084 INFO [train.py:1198] (1/2) Epoch 27, batch 200, loss[loss=0.2516, ctc_loss=0.1712, cr_loss=0.4022, over 18279.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.375, over 2587679.65 frames. ], batch size: 108, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:33:59,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=471356.1666666667, ans=0.0 2024-09-16 16:34:10,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.154e+02 2.299e+02 2.446e+02 3.804e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-16 16:34:27,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471412.8333333333, ans=0.0 2024-09-16 16:34:34,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471412.8333333333, ans=0.125 2024-09-16 16:34:39,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-16 16:35:07,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=471469.5, ans=0.125 2024-09-16 16:35:13,754 INFO [train.py:1198] (1/2) Epoch 27, batch 250, loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.3673, over 21078.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.153, cr_loss=0.3752, over 2921977.17 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:35:23,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-16 16:35:32,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-09-16 16:35:57,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=471582.8333333333, ans=0.0 2024-09-16 16:36:00,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=471582.8333333333, ans=0.125 2024-09-16 16:36:08,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=12.0 2024-09-16 16:36:29,078 INFO [train.py:1198] (1/2) Epoch 27, batch 300, loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3708, over 20868.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1538, cr_loss=0.3761, over 3175162.97 frames. ], batch size: 57, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:36:31,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-16 16:36:41,319 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.087e+02 2.225e+02 2.331e+02 3.250e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 16:37:13,219 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:37:27,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=471724.5, ans=12.0 2024-09-16 16:37:46,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=471781.1666666667, ans=0.09899494936611666 2024-09-16 16:37:47,519 INFO [train.py:1198] (1/2) Epoch 27, batch 350, loss[loss=0.2624, ctc_loss=0.1768, cr_loss=0.4278, over 20974.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1541, cr_loss=0.3762, over 3387052.71 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:38:05,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=471809.5, ans=0.125 2024-09-16 16:38:05,987 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:38:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=471809.5, ans=0.125 2024-09-16 16:38:21,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=471837.8333333333, ans=0.2 2024-09-16 16:38:30,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=471837.8333333333, ans=0.125 2024-09-16 16:38:30,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.39 vs. limit=6.0 2024-09-16 16:38:49,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-09-16 16:39:01,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=471894.5, ans=0.125 2024-09-16 16:39:05,953 INFO [train.py:1198] (1/2) Epoch 27, batch 400, loss[loss=0.198, ctc_loss=0.1331, cr_loss=0.3247, over 20959.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3782, over 3546209.83 frames. ], batch size: 48, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:39:17,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.163e+02 2.278e+02 2.427e+02 3.875e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 16:39:18,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=471922.8333333333, ans=0.2 2024-09-16 16:39:44,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471979.5, ans=0.125 2024-09-16 16:39:44,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=471979.5, ans=0.2 2024-09-16 16:40:00,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=472007.8333333333, ans=0.125 2024-09-16 16:40:20,505 INFO [train.py:1198] (1/2) Epoch 27, batch 450, loss[loss=0.2181, ctc_loss=0.145, cr_loss=0.3655, over 21000.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1555, cr_loss=0.3787, over 3669443.26 frames. ], batch size: 61, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:40:25,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=472064.5, ans=0.125 2024-09-16 16:40:34,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=472092.8333333333, ans=0.0 2024-09-16 16:40:47,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-16 16:41:16,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=472149.5, ans=0.025 2024-09-16 16:41:36,078 INFO [train.py:1198] (1/2) Epoch 27, batch 500, loss[loss=0.2127, ctc_loss=0.1409, cr_loss=0.3588, over 21062.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1553, cr_loss=0.379, over 3754177.52 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:41:47,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.127e+02 2.285e+02 2.456e+02 4.358e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 16:41:57,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-16 16:42:11,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472262.8333333333, ans=0.1 2024-09-16 16:42:26,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=472291.1666666667, ans=0.125 2024-09-16 16:42:39,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=472319.5, ans=0.125 2024-09-16 16:42:42,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=472319.5, ans=0.0 2024-09-16 16:42:51,562 INFO [train.py:1198] (1/2) Epoch 27, batch 550, loss[loss=0.2272, ctc_loss=0.1507, cr_loss=0.3827, over 21045.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1545, cr_loss=0.3775, over 3842313.89 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:43:02,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-16 16:43:09,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=472376.1666666667, ans=0.0 2024-09-16 16:43:15,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472376.1666666667, ans=0.1 2024-09-16 16:43:52,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-16 16:44:00,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=472461.1666666667, ans=0.025 2024-09-16 16:44:13,511 INFO [train.py:1198] (1/2) Epoch 27, batch 600, loss[loss=0.2425, ctc_loss=0.1625, cr_loss=0.3998, over 21021.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1542, cr_loss=0.3773, over 3903946.92 frames. ], batch size: 61, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:44:27,295 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.149e+02 2.267e+02 2.509e+02 3.416e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-16 16:44:44,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472546.1666666667, ans=0.1 2024-09-16 16:44:50,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=472546.1666666667, ans=0.125 2024-09-16 16:45:29,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=22.5 2024-09-16 16:45:29,874 INFO [train.py:1198] (1/2) Epoch 27, batch 650, loss[loss=0.2379, ctc_loss=0.1645, cr_loss=0.3668, over 20639.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1538, cr_loss=0.3766, over 3943708.81 frames. ], batch size: 71, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:45:30,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=472631.1666666667, ans=0.125 2024-09-16 16:45:31,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2024-09-16 16:45:34,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472631.1666666667, ans=0.1 2024-09-16 16:46:11,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=472687.8333333333, ans=0.125 2024-09-16 16:46:23,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=472716.1666666667, ans=0.0 2024-09-16 16:46:34,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=472744.5, ans=0.0 2024-09-16 16:46:46,022 INFO [train.py:1198] (1/2) Epoch 27, batch 700, loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3796, over 20275.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.3747, over 3975145.89 frames. ], batch size: 74, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:46:59,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.094e+02 2.211e+02 2.360e+02 4.010e+02, threshold=4.422e+02, percent-clipped=0.0 2024-09-16 16:47:03,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-16 16:47:53,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472886.1666666667, ans=0.125 2024-09-16 16:47:56,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=472886.1666666667, ans=0.125 2024-09-16 16:48:00,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-09-16 16:48:02,352 INFO [train.py:1198] (1/2) Epoch 27, batch 750, loss[loss=0.25, ctc_loss=0.1665, cr_loss=0.4174, over 20997.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3753, over 4003345.13 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:48:45,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=472999.5, ans=0.015 2024-09-16 16:48:52,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=472999.5, ans=0.05 2024-09-16 16:49:21,020 INFO [train.py:1198] (1/2) Epoch 27, batch 800, loss[loss=0.2001, ctc_loss=0.1311, cr_loss=0.3451, over 20978.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1528, cr_loss=0.3745, over 4024036.24 frames. ], batch size: 51, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:49:30,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=473056.1666666667, ans=0.2 2024-09-16 16:49:34,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.116e+02 2.263e+02 2.433e+02 6.148e+02, threshold=4.526e+02, percent-clipped=1.0 2024-09-16 16:49:42,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=473084.5, ans=0.125 2024-09-16 16:50:01,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=473112.8333333333, ans=0.125 2024-09-16 16:50:13,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473141.1666666667, ans=0.125 2024-09-16 16:50:14,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=473141.1666666667, ans=0.0 2024-09-16 16:50:40,198 INFO [train.py:1198] (1/2) Epoch 27, batch 850, loss[loss=0.1854, ctc_loss=0.1221, cr_loss=0.3167, over 20985.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1522, cr_loss=0.3741, over 4050501.74 frames. ], batch size: 51, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:50:55,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=473226.1666666667, ans=0.125 2024-09-16 16:50:59,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=473226.1666666667, ans=0.025 2024-09-16 16:51:07,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2024-09-16 16:51:24,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473282.8333333333, ans=0.125 2024-09-16 16:51:24,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=473282.8333333333, ans=0.125 2024-09-16 16:51:32,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=473282.8333333333, ans=0.125 2024-09-16 16:51:39,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-16 16:51:56,263 INFO [train.py:1198] (1/2) Epoch 27, batch 900, loss[loss=0.1908, ctc_loss=0.1261, cr_loss=0.3233, over 20995.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1524, cr_loss=0.3743, over 4062025.29 frames. ], batch size: 50, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:52:09,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.068e+02 2.184e+02 2.300e+02 3.115e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-16 16:52:20,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473367.8333333333, ans=0.125 2024-09-16 16:52:20,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=473367.8333333333, ans=0.2 2024-09-16 16:52:30,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.61 vs. limit=10.0 2024-09-16 16:52:31,287 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:52:46,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=473424.5, ans=0.0 2024-09-16 16:52:52,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-16 16:53:00,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=473452.8333333333, ans=0.0 2024-09-16 16:53:12,039 INFO [train.py:1198] (1/2) Epoch 27, batch 950, loss[loss=0.2106, ctc_loss=0.1373, cr_loss=0.3665, over 19873.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1508, cr_loss=0.3719, over 4074420.14 frames. ], batch size: 44, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:53:16,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=473481.1666666667, ans=0.0 2024-09-16 16:54:27,010 INFO [train.py:1198] (1/2) Epoch 27, batch 1000, loss[loss=0.216, ctc_loss=0.1464, cr_loss=0.3482, over 21060.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1513, cr_loss=0.3727, over 4082300.09 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:54:33,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-16 16:54:42,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.157e+02 2.304e+02 2.519e+02 3.560e+02, threshold=4.609e+02, percent-clipped=0.0 2024-09-16 16:55:35,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=473736.1666666667, ans=0.0 2024-09-16 16:55:48,923 INFO [train.py:1198] (1/2) Epoch 27, batch 1050, loss[loss=0.254, ctc_loss=0.1718, cr_loss=0.4109, over 20657.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1518, cr_loss=0.3737, over 4090229.13 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:55:49,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=473764.5, ans=0.125 2024-09-16 16:56:14,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-16 16:56:59,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=473877.8333333333, ans=0.125 2024-09-16 16:56:59,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=473877.8333333333, ans=0.2 2024-09-16 16:57:03,651 INFO [train.py:1198] (1/2) Epoch 27, batch 1100, loss[loss=0.2402, ctc_loss=0.1574, cr_loss=0.414, over 20996.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3753, over 4088080.01 frames. ], batch size: 52, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:57:18,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.177e+02 2.310e+02 2.412e+02 3.391e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 16:57:48,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=473991.1666666667, ans=0.0 2024-09-16 16:58:18,494 INFO [train.py:1198] (1/2) Epoch 27, batch 1150, loss[loss=0.1992, ctc_loss=0.1331, cr_loss=0.3304, over 21073.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3749, over 4102419.65 frames. ], batch size: 53, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:59:09,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2024-09-16 16:59:09,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=474132.8333333333, ans=0.125 2024-09-16 16:59:11,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-16 16:59:33,822 INFO [train.py:1198] (1/2) Epoch 27, batch 1200, loss[loss=0.2448, ctc_loss=0.1668, cr_loss=0.3897, over 20713.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1534, cr_loss=0.3769, over 4103154.82 frames. ], batch size: 71, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:59:41,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=474189.5, ans=0.025 2024-09-16 16:59:49,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.106e+02 2.226e+02 2.407e+02 2.911e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-16 17:00:28,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=474274.5, ans=0.0 2024-09-16 17:00:33,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=474302.8333333333, ans=0.0 2024-09-16 17:00:52,593 INFO [train.py:1198] (1/2) Epoch 27, batch 1250, loss[loss=0.218, ctc_loss=0.1434, cr_loss=0.3731, over 20346.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3749, over 4120647.20 frames. ], batch size: 45, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:01:15,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=474359.5, ans=0.125 2024-09-16 17:01:57,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=12.0 2024-09-16 17:02:00,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=474444.5, ans=0.0 2024-09-16 17:02:01,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=474444.5, ans=0.125 2024-09-16 17:02:10,564 INFO [train.py:1198] (1/2) Epoch 27, batch 1300, loss[loss=0.2389, ctc_loss=0.1575, cr_loss=0.4071, over 20970.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.3757, over 4127169.18 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:02:25,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.106e+02 2.284e+02 2.442e+02 3.899e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-16 17:02:26,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-16 17:02:50,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=474529.5, ans=0.0 2024-09-16 17:03:26,327 INFO [train.py:1198] (1/2) Epoch 27, batch 1350, loss[loss=0.2128, ctc_loss=0.1415, cr_loss=0.3566, over 20961.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1525, cr_loss=0.3749, over 4103956.08 frames. ], batch size: 51, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:03:38,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=474614.5, ans=0.05 2024-09-16 17:04:11,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=474699.5, ans=0.0 2024-09-16 17:04:39,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=474727.8333333333, ans=0.125 2024-09-16 17:04:41,905 INFO [train.py:1198] (1/2) Epoch 27, batch 1400, loss[loss=0.2335, ctc_loss=0.1567, cr_loss=0.384, over 20954.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1518, cr_loss=0.3738, over 4102763.12 frames. ], batch size: 67, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:04:56,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.119e+02 2.233e+02 2.429e+02 4.066e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 17:05:09,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.41 vs. limit=22.5 2024-09-16 17:05:19,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474812.8333333333, ans=0.1 2024-09-16 17:05:57,158 INFO [train.py:1198] (1/2) Epoch 27, batch 1450, loss[loss=0.2211, ctc_loss=0.1504, cr_loss=0.3536, over 21070.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1519, cr_loss=0.3744, over 4107408.44 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:07:18,898 INFO [train.py:1198] (1/2) Epoch 27, batch 1500, loss[loss=0.2372, ctc_loss=0.158, cr_loss=0.3958, over 20769.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3754, over 4092927.26 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:07:35,605 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.125e+02 2.242e+02 2.361e+02 4.299e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 17:07:47,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=475096.1666666667, ans=0.125 2024-09-16 17:07:54,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-16 17:08:03,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=22.5 2024-09-16 17:08:25,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=475152.8333333333, ans=0.125 2024-09-16 17:08:31,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=475152.8333333333, ans=0.2 2024-09-16 17:08:34,402 INFO [train.py:1198] (1/2) Epoch 27, batch 1550, loss[loss=0.2187, ctc_loss=0.1463, cr_loss=0.362, over 20832.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3766, over 4076357.09 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:08:46,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=475181.1666666667, ans=0.5 2024-09-16 17:08:47,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2024-09-16 17:09:00,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475209.5, ans=0.125 2024-09-16 17:09:15,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475237.8333333333, ans=0.125 2024-09-16 17:09:49,878 INFO [train.py:1198] (1/2) Epoch 27, batch 1600, loss[loss=0.1797, ctc_loss=0.1188, cr_loss=0.3045, over 19914.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.154, cr_loss=0.3776, over 4080163.37 frames. ], batch size: 44, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:09:57,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=475322.8333333333, ans=0.0 2024-09-16 17:09:59,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2024-09-16 17:10:02,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=475322.8333333333, ans=0.125 2024-09-16 17:10:06,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.111e+02 2.242e+02 2.482e+02 4.649e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-16 17:10:06,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=475351.1666666667, ans=0.0 2024-09-16 17:10:31,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=475379.5, ans=0.125 2024-09-16 17:10:33,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475379.5, ans=0.04949747468305833 2024-09-16 17:11:05,818 INFO [train.py:1198] (1/2) Epoch 27, batch 1650, loss[loss=0.2426, ctc_loss=0.1619, cr_loss=0.4033, over 20692.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3775, over 4076180.84 frames. ], batch size: 66, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:11:10,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=475464.5, ans=0.125 2024-09-16 17:11:39,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475521.1666666667, ans=0.1 2024-09-16 17:12:23,740 INFO [train.py:1198] (1/2) Epoch 27, batch 1700, loss[loss=0.2408, ctc_loss=0.1637, cr_loss=0.3854, over 20973.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1555, cr_loss=0.3793, over 4067507.77 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:12:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=475606.1666666667, ans=0.0 2024-09-16 17:12:42,902 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.149e+02 2.268e+02 2.466e+02 3.529e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 17:12:55,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=475662.8333333333, ans=0.125 2024-09-16 17:13:10,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=475691.1666666667, ans=0.125 2024-09-16 17:13:14,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=475691.1666666667, ans=0.125 2024-09-16 17:13:14,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475691.1666666667, ans=0.1 2024-09-16 17:13:23,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=475691.1666666667, ans=0.125 2024-09-16 17:13:25,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=475719.5, ans=0.125 2024-09-16 17:13:26,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=475719.5, ans=0.125 2024-09-16 17:13:41,854 INFO [train.py:1198] (1/2) Epoch 27, batch 1750, loss[loss=0.2293, ctc_loss=0.156, cr_loss=0.3667, over 21076.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1546, cr_loss=0.3784, over 4071112.82 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:14:06,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=475776.1666666667, ans=15.0 2024-09-16 17:14:09,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475776.1666666667, ans=0.125 2024-09-16 17:14:14,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475804.5, ans=0.125 2024-09-16 17:14:45,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=475861.1666666667, ans=10.0 2024-09-16 17:14:51,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475861.1666666667, ans=0.125 2024-09-16 17:14:57,496 INFO [train.py:1198] (1/2) Epoch 27, batch 1800, loss[loss=0.2509, ctc_loss=0.1703, cr_loss=0.4031, over 20674.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1546, cr_loss=0.3774, over 4047914.37 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:14:59,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=475889.5, ans=0.125 2024-09-16 17:15:14,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.149e+02 2.275e+02 2.481e+02 4.210e+02, threshold=4.550e+02, percent-clipped=0.0 2024-09-16 17:15:25,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=475917.8333333333, ans=0.125 2024-09-16 17:15:44,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=475974.5, ans=0.125 2024-09-16 17:16:01,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2024-09-16 17:16:08,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476002.8333333333, ans=0.1 2024-09-16 17:16:13,837 INFO [train.py:1198] (1/2) Epoch 27, batch 1850, loss[loss=0.2012, ctc_loss=0.133, cr_loss=0.3408, over 20985.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1542, cr_loss=0.3771, over 4056144.12 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:17:30,041 INFO [train.py:1198] (1/2) Epoch 27, batch 1900, loss[loss=0.256, ctc_loss=0.1734, cr_loss=0.4132, over 20981.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.376, over 4079006.18 frames. ], batch size: 64, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:17:34,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=476172.8333333333, ans=0.0 2024-09-16 17:17:39,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=476172.8333333333, ans=0.125 2024-09-16 17:17:50,864 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.122e+02 2.285e+02 2.570e+02 3.510e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-16 17:18:03,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476229.5, ans=0.1 2024-09-16 17:18:18,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=476229.5, ans=0.0 2024-09-16 17:18:50,776 INFO [train.py:1198] (1/2) Epoch 27, batch 1950, loss[loss=0.2413, ctc_loss=0.1606, cr_loss=0.4035, over 21041.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3765, over 4077101.23 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:18:58,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=22.5 2024-09-16 17:19:25,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=476371.1666666667, ans=0.125 2024-09-16 17:19:42,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=476399.5, ans=0.0 2024-09-16 17:19:42,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-16 17:20:06,240 INFO [train.py:1198] (1/2) Epoch 27, batch 2000, loss[loss=0.2331, ctc_loss=0.1542, cr_loss=0.3944, over 21067.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1536, cr_loss=0.3775, over 4088622.68 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:20:24,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.168e+02 2.267e+02 2.349e+02 3.904e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 17:20:26,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=476484.5, ans=0.2 2024-09-16 17:20:32,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=476484.5, ans=0.125 2024-09-16 17:20:35,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=476512.8333333333, ans=0.0 2024-09-16 17:21:11,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=476569.5, ans=0.09899494936611666 2024-09-16 17:21:20,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476597.8333333333, ans=0.1 2024-09-16 17:21:21,806 INFO [train.py:1198] (1/2) Epoch 27, batch 2050, loss[loss=0.2203, ctc_loss=0.1502, cr_loss=0.3504, over 21077.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1538, cr_loss=0.3769, over 4092240.55 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:21:37,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=476626.1666666667, ans=0.125 2024-09-16 17:22:37,523 INFO [train.py:1198] (1/2) Epoch 27, batch 2100, loss[loss=0.2383, ctc_loss=0.1565, cr_loss=0.409, over 20954.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1537, cr_loss=0.3763, over 4084501.10 frames. ], batch size: 64, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:22:55,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.174e+02 2.342e+02 2.611e+02 5.265e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-16 17:22:55,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=476767.8333333333, ans=0.2 2024-09-16 17:22:58,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=476767.8333333333, ans=0.125 2024-09-16 17:23:04,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476767.8333333333, ans=0.1 2024-09-16 17:23:09,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476796.1666666667, ans=0.1 2024-09-16 17:23:17,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476796.1666666667, ans=0.1 2024-09-16 17:23:18,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476796.1666666667, ans=0.1 2024-09-16 17:23:21,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=476824.5, ans=0.025 2024-09-16 17:23:24,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=476824.5, ans=0.125 2024-09-16 17:23:27,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=476824.5, ans=0.125 2024-09-16 17:23:56,174 INFO [train.py:1198] (1/2) Epoch 27, batch 2150, loss[loss=0.1841, ctc_loss=0.12, cr_loss=0.3202, over 20945.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3767, over 4084586.46 frames. ], batch size: 49, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:24:08,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=476881.1666666667, ans=0.5 2024-09-16 17:24:26,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=476909.5, ans=0.07 2024-09-16 17:24:31,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=476937.8333333333, ans=0.125 2024-09-16 17:24:54,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-09-16 17:25:14,295 INFO [train.py:1198] (1/2) Epoch 27, batch 2200, loss[loss=0.218, ctc_loss=0.147, cr_loss=0.355, over 20988.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3768, over 4092360.39 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:25:29,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=477051.1666666667, ans=0.0 2024-09-16 17:25:31,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=477051.1666666667, ans=0.0 2024-09-16 17:25:32,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.158e+02 2.310e+02 2.590e+02 4.056e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 17:26:29,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=477164.5, ans=0.2 2024-09-16 17:26:30,172 INFO [train.py:1198] (1/2) Epoch 27, batch 2250, loss[loss=0.2428, ctc_loss=0.1613, cr_loss=0.4076, over 21058.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1533, cr_loss=0.3772, over 4086040.82 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:26:57,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=477192.8333333333, ans=0.125 2024-09-16 17:27:10,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=477221.1666666667, ans=0.025 2024-09-16 17:27:36,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=477277.8333333333, ans=0.125 2024-09-16 17:27:45,136 INFO [train.py:1198] (1/2) Epoch 27, batch 2300, loss[loss=0.2108, ctc_loss=0.1403, cr_loss=0.3526, over 20798.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1536, cr_loss=0.3771, over 4082114.15 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:27:50,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477306.1666666667, ans=0.0 2024-09-16 17:28:03,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.086e+02 2.228e+02 2.361e+02 3.205e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 17:28:29,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=22.5 2024-09-16 17:29:01,279 INFO [train.py:1198] (1/2) Epoch 27, batch 2350, loss[loss=0.1745, ctc_loss=0.1139, cr_loss=0.3033, over 20981.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.153, cr_loss=0.3764, over 4092051.91 frames. ], batch size: 52, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:29:35,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=477504.5, ans=0.2 2024-09-16 17:29:58,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=477532.8333333333, ans=0.0 2024-09-16 17:30:01,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-16 17:30:14,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=477561.1666666667, ans=0.125 2024-09-16 17:30:23,481 INFO [train.py:1198] (1/2) Epoch 27, batch 2400, loss[loss=0.1898, ctc_loss=0.1248, cr_loss=0.3253, over 20988.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3759, over 4096422.64 frames. ], batch size: 51, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:30:29,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=477589.5, ans=0.0 2024-09-16 17:30:34,691 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:30:41,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.123e+02 2.202e+02 2.335e+02 3.360e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 17:30:43,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=477617.8333333333, ans=0.0 2024-09-16 17:31:03,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=477646.1666666667, ans=0.125 2024-09-16 17:31:06,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-16 17:31:07,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477674.5, ans=0.1 2024-09-16 17:31:19,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=477674.5, ans=0.2 2024-09-16 17:31:39,024 INFO [train.py:1198] (1/2) Epoch 27, batch 2450, loss[loss=0.2308, ctc_loss=0.1566, cr_loss=0.3709, over 20607.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.153, cr_loss=0.3767, over 4102482.84 frames. ], batch size: 75, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:32:39,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=477844.5, ans=0.04949747468305833 2024-09-16 17:32:44,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=477844.5, ans=0.2 2024-09-16 17:32:51,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477844.5, ans=0.1 2024-09-16 17:32:54,486 INFO [train.py:1198] (1/2) Epoch 27, batch 2500, loss[loss=0.2385, ctc_loss=0.16, cr_loss=0.3928, over 20680.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.153, cr_loss=0.3771, over 4100211.66 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:32:57,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477872.8333333333, ans=0.1 2024-09-16 17:33:12,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.125e+02 2.282e+02 2.490e+02 4.054e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-16 17:33:45,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=477957.8333333333, ans=0.0 2024-09-16 17:33:50,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-09-16 17:34:09,025 INFO [train.py:1198] (1/2) Epoch 27, batch 2550, loss[loss=0.2438, ctc_loss=0.1667, cr_loss=0.3858, over 20931.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1532, cr_loss=0.3763, over 4101223.84 frames. ], batch size: 60, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:34:24,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=478042.8333333333, ans=0.0 2024-09-16 17:34:35,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478042.8333333333, ans=0.125 2024-09-16 17:34:58,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=478099.5, ans=0.2 2024-09-16 17:35:07,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=478099.5, ans=0.125 2024-09-16 17:35:08,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-16 17:35:27,457 INFO [train.py:1198] (1/2) Epoch 27, batch 2600, loss[loss=0.1956, ctc_loss=0.129, cr_loss=0.333, over 20964.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3756, over 4099625.54 frames. ], batch size: 51, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:35:48,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.148e+02 2.229e+02 2.374e+02 2.756e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 17:36:02,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478212.8333333333, ans=0.125 2024-09-16 17:36:45,979 INFO [train.py:1198] (1/2) Epoch 27, batch 2650, loss[loss=0.2031, ctc_loss=0.1343, cr_loss=0.3439, over 20987.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.152, cr_loss=0.3742, over 4113609.05 frames. ], batch size: 52, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:36:58,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=478297.8333333333, ans=0.0 2024-09-16 17:37:13,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478326.1666666667, ans=0.1 2024-09-16 17:37:16,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=478354.5, ans=0.2 2024-09-16 17:38:01,817 INFO [train.py:1198] (1/2) Epoch 27, batch 2700, loss[loss=0.2262, ctc_loss=0.1526, cr_loss=0.3679, over 20770.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1522, cr_loss=0.3747, over 4110787.88 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:38:21,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.120e+02 2.240e+02 2.366e+02 5.113e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-16 17:38:38,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=478496.1666666667, ans=0.2 2024-09-16 17:39:17,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=478581.1666666667, ans=0.125 2024-09-16 17:39:18,135 INFO [train.py:1198] (1/2) Epoch 27, batch 2750, loss[loss=0.2179, ctc_loss=0.1446, cr_loss=0.3664, over 21007.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3746, over 4105313.18 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:40:02,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-16 17:40:02,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-16 17:40:29,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=478694.5, ans=0.125 2024-09-16 17:40:33,754 INFO [train.py:1198] (1/2) Epoch 27, batch 2800, loss[loss=0.2589, ctc_loss=0.1752, cr_loss=0.4186, over 20264.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.376, over 4098513.59 frames. ], batch size: 74, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:40:56,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.107e+02 2.245e+02 2.397e+02 3.647e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 17:41:34,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478807.8333333333, ans=0.1 2024-09-16 17:41:55,643 INFO [train.py:1198] (1/2) Epoch 27, batch 2850, loss[loss=0.1941, ctc_loss=0.1289, cr_loss=0.3261, over 20927.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3758, over 4082532.01 frames. ], batch size: 50, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:42:04,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=478864.5, ans=0.0 2024-09-16 17:42:17,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2024-09-16 17:42:19,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=478892.8333333333, ans=0.125 2024-09-16 17:42:59,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=478977.8333333333, ans=0.125 2024-09-16 17:43:04,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-16 17:43:05,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=478977.8333333333, ans=0.125 2024-09-16 17:43:10,903 INFO [train.py:1198] (1/2) Epoch 27, batch 2900, loss[loss=0.1865, ctc_loss=0.1225, cr_loss=0.32, over 20974.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.3757, over 4079247.54 frames. ], batch size: 51, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:43:30,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.088e+02 2.226e+02 2.373e+02 7.586e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 17:43:32,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479034.5, ans=0.1 2024-09-16 17:44:26,289 INFO [train.py:1198] (1/2) Epoch 27, batch 2950, loss[loss=0.2212, ctc_loss=0.1494, cr_loss=0.3591, over 21025.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.153, cr_loss=0.3753, over 4084736.79 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:44:26,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=479147.8333333333, ans=0.125 2024-09-16 17:44:52,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479176.1666666667, ans=0.125 2024-09-16 17:45:00,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-16 17:45:03,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479204.5, ans=0.1 2024-09-16 17:45:09,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=479204.5, ans=0.04949747468305833 2024-09-16 17:45:39,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=479261.1666666667, ans=0.125 2024-09-16 17:45:41,963 INFO [train.py:1198] (1/2) Epoch 27, batch 3000, loss[loss=0.2003, ctc_loss=0.1328, cr_loss=0.3374, over 21052.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1525, cr_loss=0.3742, over 4081637.23 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:45:41,964 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 17:46:01,848 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.04167, ctc_loss=0.04167, cr_loss=1.171e-14, over 944034.00 frames. 2024-09-16 17:46:01,849 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 17:46:17,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-16 17:46:24,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.138e+02 2.279e+02 2.473e+02 3.607e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-16 17:47:08,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479402.8333333333, ans=0.1 2024-09-16 17:47:12,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-16 17:47:23,530 INFO [train.py:1198] (1/2) Epoch 27, batch 3050, loss[loss=0.2211, ctc_loss=0.1459, cr_loss=0.3761, over 21016.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1521, cr_loss=0.3734, over 4089320.71 frames. ], batch size: 63, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:47:29,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479431.1666666667, ans=0.1 2024-09-16 17:47:40,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=479459.5, ans=0.5 2024-09-16 17:48:09,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=479516.1666666667, ans=0.125 2024-09-16 17:48:31,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=479544.5, ans=0.0 2024-09-16 17:48:40,123 INFO [train.py:1198] (1/2) Epoch 27, batch 3100, loss[loss=0.2432, ctc_loss=0.164, cr_loss=0.3961, over 20951.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1523, cr_loss=0.3729, over 4079164.63 frames. ], batch size: 60, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:48:54,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-16 17:48:59,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.123e+02 2.334e+02 2.478e+02 3.629e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-16 17:49:01,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479601.1666666667, ans=0.1 2024-09-16 17:49:56,121 INFO [train.py:1198] (1/2) Epoch 27, batch 3150, loss[loss=0.2521, ctc_loss=0.1733, cr_loss=0.394, over 20631.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1523, cr_loss=0.3736, over 4094406.27 frames. ], batch size: 68, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:49:59,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=479714.5, ans=0.125 2024-09-16 17:50:07,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-16 17:50:10,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=479742.8333333333, ans=0.0 2024-09-16 17:50:16,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=479742.8333333333, ans=0.125 2024-09-16 17:50:34,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=479771.1666666667, ans=0.125 2024-09-16 17:50:34,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=479771.1666666667, ans=0.125 2024-09-16 17:50:43,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=479799.5, ans=0.125 2024-09-16 17:50:45,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-16 17:51:11,924 INFO [train.py:1198] (1/2) Epoch 27, batch 3200, loss[loss=0.2354, ctc_loss=0.1627, cr_loss=0.3635, over 20795.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1535, cr_loss=0.3755, over 4081739.67 frames. ], batch size: 59, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:51:12,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=479856.1666666667, ans=0.125 2024-09-16 17:51:31,342 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.156e+02 2.280e+02 2.490e+02 4.987e+02, threshold=4.560e+02, percent-clipped=1.0 2024-09-16 17:51:51,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=479912.8333333333, ans=0.0 2024-09-16 17:52:07,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=22.5 2024-09-16 17:52:32,465 INFO [train.py:1198] (1/2) Epoch 27, batch 3250, loss[loss=0.2214, ctc_loss=0.1494, cr_loss=0.3601, over 21032.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1534, cr_loss=0.3758, over 4081976.95 frames. ], batch size: 56, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:52:40,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=479997.8333333333, ans=0.025 2024-09-16 17:52:51,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=15.0 2024-09-16 17:52:59,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=480026.1666666667, ans=0.0 2024-09-16 17:53:01,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=480054.5, ans=0.0 2024-09-16 17:53:02,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=480054.5, ans=0.05 2024-09-16 17:53:27,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480082.8333333333, ans=0.1 2024-09-16 17:53:40,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480111.1666666667, ans=0.1 2024-09-16 17:53:47,836 INFO [train.py:1198] (1/2) Epoch 27, batch 3300, loss[loss=0.2298, ctc_loss=0.1555, cr_loss=0.3713, over 20786.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3774, over 4074468.04 frames. ], batch size: 56, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:53:55,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480139.5, ans=0.125 2024-09-16 17:54:08,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.111e+02 2.240e+02 2.473e+02 4.220e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 17:54:10,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=480167.8333333333, ans=0.09899494936611666 2024-09-16 17:54:27,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=480196.1666666667, ans=0.125 2024-09-16 17:54:29,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-16 17:54:36,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=480224.5, ans=0.125 2024-09-16 17:55:00,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=480252.8333333333, ans=0.5 2024-09-16 17:55:03,725 INFO [train.py:1198] (1/2) Epoch 27, batch 3350, loss[loss=0.2352, ctc_loss=0.1595, cr_loss=0.3784, over 21008.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.377, over 4086814.55 frames. ], batch size: 61, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:55:05,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=480281.1666666667, ans=0.2 2024-09-16 17:55:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=480281.1666666667, ans=0.2 2024-09-16 17:55:40,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=480337.8333333333, ans=0.125 2024-09-16 17:55:44,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=480337.8333333333, ans=0.2 2024-09-16 17:56:18,540 INFO [train.py:1198] (1/2) Epoch 27, batch 3400, loss[loss=0.2401, ctc_loss=0.1596, cr_loss=0.4028, over 21024.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.377, over 4083369.53 frames. ], batch size: 63, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:56:25,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-16 17:56:39,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.130e+02 2.268e+02 2.467e+02 3.043e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-16 17:57:00,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480479.5, ans=0.125 2024-09-16 17:57:16,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480507.8333333333, ans=0.1 2024-09-16 17:57:34,159 INFO [train.py:1198] (1/2) Epoch 27, batch 3450, loss[loss=0.237, ctc_loss=0.16, cr_loss=0.3849, over 20701.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3766, over 4081141.42 frames. ], batch size: 71, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:57:34,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=480564.5, ans=0.04949747468305833 2024-09-16 17:57:57,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=480592.8333333333, ans=0.125 2024-09-16 17:58:01,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=480592.8333333333, ans=0.125 2024-09-16 17:58:16,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=480621.1666666667, ans=0.07 2024-09-16 17:58:18,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-16 17:58:22,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=480621.1666666667, ans=0.2 2024-09-16 17:58:31,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=480649.5, ans=0.0 2024-09-16 17:58:55,443 INFO [train.py:1198] (1/2) Epoch 27, batch 3500, loss[loss=0.2002, ctc_loss=0.1327, cr_loss=0.3378, over 20930.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3757, over 4093673.24 frames. ], batch size: 49, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:58:56,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-16 17:59:10,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=480734.5, ans=0.5 2024-09-16 17:59:16,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.133e+02 2.252e+02 2.402e+02 4.500e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-16 17:59:28,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=480762.8333333333, ans=0.125 2024-09-16 17:59:31,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=480762.8333333333, ans=0.125 2024-09-16 17:59:37,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-16 17:59:57,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480819.5, ans=0.0 2024-09-16 18:00:10,905 INFO [train.py:1198] (1/2) Epoch 27, batch 3550, loss[loss=0.2477, ctc_loss=0.1683, cr_loss=0.3966, over 19662.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3757, over 4097411.67 frames. ], batch size: 90, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 18:00:18,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=480847.8333333333, ans=10.0 2024-09-16 18:00:49,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480904.5, ans=0.0 2024-09-16 18:00:56,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480932.8333333333, ans=0.1 2024-09-16 18:00:56,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=480932.8333333333, ans=0.0 2024-09-16 18:00:58,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480932.8333333333, ans=0.125 2024-09-16 18:01:27,041 INFO [train.py:1198] (1/2) Epoch 27, batch 3600, loss[loss=0.2297, ctc_loss=0.1569, cr_loss=0.364, over 20976.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3762, over 4098810.67 frames. ], batch size: 58, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:01:48,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.089e+02 2.227e+02 2.380e+02 4.273e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-16 18:01:53,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=481017.8333333333, ans=0.05 2024-09-16 18:02:25,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=481074.5, ans=0.05 2024-09-16 18:02:26,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=481102.8333333333, ans=0.0 2024-09-16 18:02:42,657 INFO [train.py:1198] (1/2) Epoch 27, batch 3650, loss[loss=0.2746, ctc_loss=0.1938, cr_loss=0.404, over 14311.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.376, over 4083624.95 frames. ], batch size: 149, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:02:53,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=481131.1666666667, ans=0.125 2024-09-16 18:02:58,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=481159.5, ans=0.2 2024-09-16 18:03:04,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481159.5, ans=0.1 2024-09-16 18:03:15,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=481187.8333333333, ans=0.0 2024-09-16 18:03:18,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=481187.8333333333, ans=0.0 2024-09-16 18:03:34,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481216.1666666667, ans=0.125 2024-09-16 18:03:41,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=481216.1666666667, ans=0.02 2024-09-16 18:03:52,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2024-09-16 18:03:54,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481244.5, ans=0.1 2024-09-16 18:04:00,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=481244.5, ans=0.1 2024-09-16 18:04:05,029 INFO [train.py:1198] (1/2) Epoch 27, batch 3700, loss[loss=0.2344, ctc_loss=0.1587, cr_loss=0.3785, over 21026.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3757, over 4099582.93 frames. ], batch size: 63, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:04:15,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=481272.8333333333, ans=0.025 2024-09-16 18:04:26,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.106e+02 2.281e+02 2.483e+02 8.071e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-16 18:04:47,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=481329.5, ans=0.025 2024-09-16 18:05:19,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-16 18:05:20,564 INFO [train.py:1198] (1/2) Epoch 27, batch 3750, loss[loss=0.2597, ctc_loss=0.1771, cr_loss=0.4132, over 21069.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.153, cr_loss=0.3769, over 4101060.41 frames. ], batch size: 59, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:05:25,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-16 18:05:33,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=481414.5, ans=0.05 2024-09-16 18:05:36,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=481442.8333333333, ans=0.125 2024-09-16 18:05:57,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=481471.1666666667, ans=0.1 2024-09-16 18:06:05,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-16 18:06:16,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481499.5, ans=0.1 2024-09-16 18:06:34,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=481556.1666666667, ans=0.125 2024-09-16 18:06:36,104 INFO [train.py:1198] (1/2) Epoch 27, batch 3800, loss[loss=0.235, ctc_loss=0.1575, cr_loss=0.3877, over 20964.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3758, over 4100442.53 frames. ], batch size: 58, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:06:57,075 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.125e+02 2.236e+02 2.403e+02 2.968e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 18:06:57,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=481584.5, ans=0.04949747468305833 2024-09-16 18:06:57,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=22.5 2024-09-16 18:07:21,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481641.1666666667, ans=0.125 2024-09-16 18:07:26,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481641.1666666667, ans=0.125 2024-09-16 18:07:29,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-16 18:07:32,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481641.1666666667, ans=0.125 2024-09-16 18:07:52,052 INFO [train.py:1198] (1/2) Epoch 27, batch 3850, loss[loss=0.2604, ctc_loss=0.1759, cr_loss=0.4222, over 20957.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3759, over 4110936.27 frames. ], batch size: 64, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:07:52,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=481697.8333333333, ans=0.0 2024-09-16 18:07:55,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481697.8333333333, ans=0.1 2024-09-16 18:07:58,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481697.8333333333, ans=0.125 2024-09-16 18:08:04,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481697.8333333333, ans=0.1 2024-09-16 18:08:54,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=481811.1666666667, ans=0.125 2024-09-16 18:09:00,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=481811.1666666667, ans=0.0 2024-09-16 18:09:07,950 INFO [train.py:1198] (1/2) Epoch 27, batch 3900, loss[loss=0.221, ctc_loss=0.1449, cr_loss=0.3805, over 20792.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3761, over 4103139.62 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:09:08,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=481839.5, ans=0.125 2024-09-16 18:09:31,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.667e+02 2.176e+02 2.309e+02 2.481e+02 3.316e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-16 18:10:29,182 INFO [train.py:1198] (1/2) Epoch 27, batch 3950, loss[loss=0.2559, ctc_loss=0.1785, cr_loss=0.3873, over 14654.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1541, cr_loss=0.3775, over 4086569.43 frames. ], batch size: 149, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:10:38,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481981.1666666667, ans=0.1 2024-09-16 18:10:46,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=482009.5, ans=0.0 2024-09-16 18:10:50,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=482009.5, ans=0.015 2024-09-16 18:11:04,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=482037.8333333333, ans=0.2 2024-09-16 18:11:04,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=482037.8333333333, ans=0.0 2024-09-16 18:11:14,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-16 18:11:45,305 INFO [train.py:1198] (1/2) Epoch 27, batch 4000, loss[loss=0.1823, ctc_loss=0.1188, cr_loss=0.3176, over 20362.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3754, over 4092740.74 frames. ], batch size: 45, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:11:51,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482122.8333333333, ans=0.125 2024-09-16 18:12:00,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.33 vs. limit=6.0 2024-09-16 18:12:06,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.138e+02 2.269e+02 2.399e+02 2.868e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-16 18:12:11,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=482151.1666666667, ans=0.125 2024-09-16 18:12:12,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=482151.1666666667, ans=0.04949747468305833 2024-09-16 18:12:13,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=482179.5, ans=0.2 2024-09-16 18:12:14,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=482179.5, ans=0.125 2024-09-16 18:12:23,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=482179.5, ans=0.125 2024-09-16 18:12:30,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=482207.8333333333, ans=0.0 2024-09-16 18:12:32,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=482207.8333333333, ans=0.2 2024-09-16 18:12:52,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=482236.1666666667, ans=0.2 2024-09-16 18:13:00,951 INFO [train.py:1198] (1/2) Epoch 27, batch 4050, loss[loss=0.2093, ctc_loss=0.1377, cr_loss=0.3581, over 20981.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3763, over 4088150.14 frames. ], batch size: 52, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:13:10,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-16 18:13:38,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2024-09-16 18:14:04,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=482377.8333333333, ans=0.125 2024-09-16 18:14:07,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=482377.8333333333, ans=0.0 2024-09-16 18:14:10,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=482377.8333333333, ans=0.95 2024-09-16 18:14:11,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-09-16 18:14:16,191 INFO [train.py:1198] (1/2) Epoch 27, batch 4100, loss[loss=0.2328, ctc_loss=0.155, cr_loss=0.3889, over 20967.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3763, over 4092814.21 frames. ], batch size: 58, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:14:36,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=482434.5, ans=0.125 2024-09-16 18:14:36,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-16 18:14:38,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.110e+02 2.285e+02 2.392e+02 3.456e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 18:14:43,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=482434.5, ans=0.125 2024-09-16 18:15:25,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482519.5, ans=0.1 2024-09-16 18:15:37,236 INFO [train.py:1198] (1/2) Epoch 27, batch 4150, loss[loss=0.2325, ctc_loss=0.1537, cr_loss=0.3938, over 20949.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3764, over 4099770.50 frames. ], batch size: 67, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:15:38,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-16 18:15:46,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482547.8333333333, ans=0.1 2024-09-16 18:15:51,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=482576.1666666667, ans=0.125 2024-09-16 18:16:10,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=482604.5, ans=0.0 2024-09-16 18:16:43,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=482661.1666666667, ans=0.2 2024-09-16 18:16:53,642 INFO [train.py:1198] (1/2) Epoch 27, batch 4200, loss[loss=0.2567, ctc_loss=0.1747, cr_loss=0.4098, over 20967.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1526, cr_loss=0.3747, over 4101998.60 frames. ], batch size: 64, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:16:57,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-16 18:17:06,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=482689.5, ans=0.035 2024-09-16 18:17:16,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.112e+02 2.267e+02 2.413e+02 2.899e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 18:17:42,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=482774.5, ans=0.0 2024-09-16 18:18:08,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-16 18:18:09,741 INFO [train.py:1198] (1/2) Epoch 27, batch 4250, loss[loss=0.2092, ctc_loss=0.1412, cr_loss=0.3397, over 20996.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.3755, over 4115067.12 frames. ], batch size: 63, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:18:22,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482831.1666666667, ans=0.1 2024-09-16 18:18:41,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=482887.8333333333, ans=0.125 2024-09-16 18:18:59,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=482916.1666666667, ans=0.0 2024-09-16 18:19:25,091 INFO [train.py:1198] (1/2) Epoch 27, batch 4300, loss[loss=0.2249, ctc_loss=0.1507, cr_loss=0.3712, over 21045.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1528, cr_loss=0.3759, over 4117728.39 frames. ], batch size: 62, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:19:25,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=482972.8333333333, ans=0.125 2024-09-16 18:19:47,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.311e+02 2.523e+02 4.886e+02, threshold=4.622e+02, percent-clipped=1.0 2024-09-16 18:20:06,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=483029.5, ans=15.0 2024-09-16 18:20:40,962 INFO [train.py:1198] (1/2) Epoch 27, batch 4350, loss[loss=0.268, ctc_loss=0.1805, cr_loss=0.4371, over 19369.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.153, cr_loss=0.3765, over 4107359.18 frames. ], batch size: 90, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:20:59,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483142.8333333333, ans=0.1 2024-09-16 18:21:00,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=483142.8333333333, ans=0.0 2024-09-16 18:21:11,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483142.8333333333, ans=0.1 2024-09-16 18:21:41,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=483199.5, ans=0.0 2024-09-16 18:22:02,133 INFO [train.py:1198] (1/2) Epoch 27, batch 4400, loss[loss=0.1918, ctc_loss=0.1247, cr_loss=0.3358, over 21039.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.3736, over 4103466.09 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:22:05,605 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:22:20,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=483284.5, ans=0.0 2024-09-16 18:22:24,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.129e+02 2.248e+02 2.540e+02 4.104e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 18:22:59,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=483341.1666666667, ans=0.125 2024-09-16 18:23:17,566 INFO [train.py:1198] (1/2) Epoch 27, batch 4450, loss[loss=0.2445, ctc_loss=0.1645, cr_loss=0.4, over 21046.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1519, cr_loss=0.3741, over 4107697.56 frames. ], batch size: 59, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:23:25,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=483397.8333333333, ans=0.0 2024-09-16 18:23:34,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=483426.1666666667, ans=0.125 2024-09-16 18:23:52,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=483454.5, ans=0.035 2024-09-16 18:23:58,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=483454.5, ans=0.125 2024-09-16 18:24:12,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=483482.8333333333, ans=0.0 2024-09-16 18:24:33,680 INFO [train.py:1198] (1/2) Epoch 27, batch 4500, loss[loss=0.2534, ctc_loss=0.1715, cr_loss=0.4093, over 20059.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1519, cr_loss=0.3746, over 4116894.20 frames. ], batch size: 80, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:24:56,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.085e+02 2.250e+02 2.356e+02 2.921e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-16 18:25:22,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=483624.5, ans=0.0 2024-09-16 18:25:38,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=483652.8333333333, ans=0.09899494936611666 2024-09-16 18:25:49,119 INFO [train.py:1198] (1/2) Epoch 27, batch 4550, loss[loss=0.267, ctc_loss=0.182, cr_loss=0.4252, over 20681.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3753, over 4120703.54 frames. ], batch size: 66, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:26:02,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483709.5, ans=0.1 2024-09-16 18:26:14,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=483709.5, ans=0.05 2024-09-16 18:26:54,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=483794.5, ans=0.125 2024-09-16 18:26:56,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483794.5, ans=0.125 2024-09-16 18:27:09,736 INFO [train.py:1198] (1/2) Epoch 27, batch 4600, loss[loss=0.2477, ctc_loss=0.1644, cr_loss=0.4167, over 20668.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3756, over 4125193.99 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:27:13,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2024-09-16 18:27:32,228 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.168e+02 2.267e+02 2.436e+02 3.097e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-16 18:27:50,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483879.5, ans=0.1 2024-09-16 18:28:24,827 INFO [train.py:1198] (1/2) Epoch 27, batch 4650, loss[loss=0.2604, ctc_loss=0.1743, cr_loss=0.4301, over 21002.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3768, over 4109997.84 frames. ], batch size: 63, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:28:29,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=483964.5, ans=0.125 2024-09-16 18:28:39,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-16 18:28:54,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=484021.1666666667, ans=0.125 2024-09-16 18:29:25,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=484077.8333333333, ans=0.0 2024-09-16 18:29:28,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2024-09-16 18:29:32,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=484077.8333333333, ans=0.0 2024-09-16 18:29:40,972 INFO [train.py:1198] (1/2) Epoch 27, batch 4700, loss[loss=0.1855, ctc_loss=0.1208, cr_loss=0.3232, over 20984.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1532, cr_loss=0.3763, over 4106625.33 frames. ], batch size: 51, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:30:03,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.157e+02 2.324e+02 2.567e+02 3.307e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-16 18:30:08,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484134.5, ans=0.125 2024-09-16 18:30:18,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=484162.8333333333, ans=0.2 2024-09-16 18:30:56,647 INFO [train.py:1198] (1/2) Epoch 27, batch 4750, loss[loss=0.2287, ctc_loss=0.1547, cr_loss=0.3701, over 20660.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3761, over 4111467.57 frames. ], batch size: 66, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:31:07,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=484247.8333333333, ans=0.0 2024-09-16 18:31:13,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=484276.1666666667, ans=0.125 2024-09-16 18:31:27,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-16 18:32:11,967 INFO [train.py:1198] (1/2) Epoch 27, batch 4800, loss[loss=0.2059, ctc_loss=0.1346, cr_loss=0.3561, over 20783.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3755, over 4090133.37 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:32:25,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=484417.8333333333, ans=0.0 2024-09-16 18:32:25,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=484417.8333333333, ans=0.125 2024-09-16 18:32:36,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-16 18:32:40,117 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.084e+02 2.249e+02 2.422e+02 7.487e+02, threshold=4.497e+02, percent-clipped=2.0 2024-09-16 18:32:48,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=484446.1666666667, ans=0.125 2024-09-16 18:33:09,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=12.0 2024-09-16 18:33:27,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=484502.8333333333, ans=0.0 2024-09-16 18:33:33,380 INFO [train.py:1198] (1/2) Epoch 27, batch 4850, loss[loss=0.2159, ctc_loss=0.1426, cr_loss=0.3665, over 20989.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1522, cr_loss=0.3748, over 4095326.27 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:34:08,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484587.8333333333, ans=0.125 2024-09-16 18:34:49,180 INFO [train.py:1198] (1/2) Epoch 27, batch 4900, loss[loss=0.1915, ctc_loss=0.1253, cr_loss=0.331, over 20954.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1519, cr_loss=0.3741, over 4107026.86 frames. ], batch size: 51, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:35:04,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484701.1666666667, ans=0.1 2024-09-16 18:35:08,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484701.1666666667, ans=0.1 2024-09-16 18:35:11,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.096e+02 2.248e+02 2.383e+02 4.576e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-16 18:35:13,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-16 18:35:14,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484701.1666666667, ans=0.1 2024-09-16 18:35:20,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=484729.5, ans=0.0 2024-09-16 18:35:41,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484757.8333333333, ans=0.1 2024-09-16 18:35:49,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=22.5 2024-09-16 18:35:56,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-16 18:36:03,191 INFO [train.py:1198] (1/2) Epoch 27, batch 4950, loss[loss=0.2497, ctc_loss=0.1667, cr_loss=0.415, over 20206.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.376, over 4110717.80 frames. ], batch size: 80, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:36:11,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484814.5, ans=0.125 2024-09-16 18:36:42,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484871.1666666667, ans=0.1 2024-09-16 18:37:16,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=484956.1666666667, ans=0.0 2024-09-16 18:37:17,566 INFO [train.py:1198] (1/2) Epoch 27, batch 5000, loss[loss=0.2795, ctc_loss=0.1907, cr_loss=0.4437, over 18191.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1518, cr_loss=0.3745, over 4105751.95 frames. ], batch size: 108, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:37:21,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484956.1666666667, ans=0.125 2024-09-16 18:37:39,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.098e+02 2.266e+02 2.407e+02 4.241e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-16 18:37:53,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485012.8333333333, ans=0.125 2024-09-16 18:37:53,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=485012.8333333333, ans=0.0 2024-09-16 18:37:55,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485012.8333333333, ans=0.125 2024-09-16 18:37:56,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-09-16 18:38:06,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2024-09-16 18:38:23,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=485069.5, ans=0.0 2024-09-16 18:38:31,957 INFO [train.py:1198] (1/2) Epoch 27, batch 5050, loss[loss=0.2504, ctc_loss=0.1687, cr_loss=0.4085, over 20693.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1518, cr_loss=0.3745, over 4090471.03 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:38:46,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-16 18:39:33,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-09-16 18:39:45,971 INFO [train.py:1198] (1/2) Epoch 27, batch 5100, loss[loss=0.2399, ctc_loss=0.161, cr_loss=0.3945, over 20964.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1526, cr_loss=0.3754, over 4089824.36 frames. ], batch size: 67, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:40:08,340 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.101e+02 2.228e+02 2.381e+02 5.913e+02, threshold=4.457e+02, percent-clipped=1.0 2024-09-16 18:40:10,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=485267.8333333333, ans=0.2 2024-09-16 18:40:20,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485296.1666666667, ans=0.125 2024-09-16 18:40:25,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=485296.1666666667, ans=0.125 2024-09-16 18:41:05,680 INFO [train.py:1198] (1/2) Epoch 27, batch 5150, loss[loss=0.2144, ctc_loss=0.1428, cr_loss=0.3582, over 20663.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1534, cr_loss=0.3771, over 4086215.38 frames. ], batch size: 66, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:41:13,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=485381.1666666667, ans=0.025 2024-09-16 18:41:27,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=485409.5, ans=0.0 2024-09-16 18:41:27,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=485409.5, ans=0.125 2024-09-16 18:41:37,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=485437.8333333333, ans=0.2 2024-09-16 18:41:48,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=485437.8333333333, ans=0.125 2024-09-16 18:41:56,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=485466.1666666667, ans=0.0 2024-09-16 18:42:02,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485466.1666666667, ans=0.1 2024-09-16 18:42:19,973 INFO [train.py:1198] (1/2) Epoch 27, batch 5200, loss[loss=0.1891, ctc_loss=0.1241, cr_loss=0.325, over 20998.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1526, cr_loss=0.3755, over 4100297.09 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:42:20,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=22.5 2024-09-16 18:42:21,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=485522.8333333333, ans=0.125 2024-09-16 18:42:30,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485522.8333333333, ans=0.1 2024-09-16 18:42:41,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.075e+02 2.213e+02 2.363e+02 2.743e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 18:42:43,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485551.1666666667, ans=0.125 2024-09-16 18:42:58,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=485579.5, ans=0.125 2024-09-16 18:43:09,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485607.8333333333, ans=0.125 2024-09-16 18:43:21,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=485636.1666666667, ans=0.125 2024-09-16 18:43:26,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-16 18:43:34,238 INFO [train.py:1198] (1/2) Epoch 27, batch 5250, loss[loss=0.2385, ctc_loss=0.1616, cr_loss=0.3841, over 20830.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3754, over 4100914.48 frames. ], batch size: 59, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:44:01,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=485692.8333333333, ans=0.0 2024-09-16 18:44:13,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=485721.1666666667, ans=0.125 2024-09-16 18:44:19,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=485749.5, ans=0.0 2024-09-16 18:44:19,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485749.5, ans=0.1 2024-09-16 18:44:48,602 INFO [train.py:1198] (1/2) Epoch 27, batch 5300, loss[loss=0.2608, ctc_loss=0.1783, cr_loss=0.4124, over 18458.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1535, cr_loss=0.3762, over 4089154.70 frames. ], batch size: 108, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:45:10,864 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.129e+02 2.269e+02 2.416e+02 3.705e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 18:45:32,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-16 18:45:36,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=485891.1666666667, ans=0.0 2024-09-16 18:45:53,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=485919.5, ans=0.0 2024-09-16 18:46:02,485 INFO [train.py:1198] (1/2) Epoch 27, batch 5350, loss[loss=0.187, ctc_loss=0.1215, cr_loss=0.3274, over 20956.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3768, over 4084941.92 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:46:07,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=485947.8333333333, ans=0.125 2024-09-16 18:46:16,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-09-16 18:46:28,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=485976.1666666667, ans=0.125 2024-09-16 18:47:17,269 INFO [train.py:1198] (1/2) Epoch 27, batch 5400, loss[loss=0.2502, ctc_loss=0.1707, cr_loss=0.3976, over 20968.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1542, cr_loss=0.378, over 4095106.63 frames. ], batch size: 64, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:47:34,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-16 18:47:39,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.139e+02 2.231e+02 2.415e+02 3.061e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 18:47:46,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=22.5 2024-09-16 18:48:22,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-16 18:48:31,479 INFO [train.py:1198] (1/2) Epoch 27, batch 5450, loss[loss=0.2715, ctc_loss=0.1825, cr_loss=0.445, over 20631.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3775, over 4093052.69 frames. ], batch size: 66, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:48:39,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=486231.1666666667, ans=0.125 2024-09-16 18:49:25,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=486316.1666666667, ans=10.0 2024-09-16 18:49:27,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-16 18:49:34,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486344.5, ans=0.125 2024-09-16 18:49:43,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486344.5, ans=0.1 2024-09-16 18:49:46,053 INFO [train.py:1198] (1/2) Epoch 27, batch 5500, loss[loss=0.2378, ctc_loss=0.1599, cr_loss=0.3895, over 20937.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1541, cr_loss=0.3776, over 4102207.73 frames. ], batch size: 60, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:50:11,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.091e+02 2.230e+02 2.356e+02 4.028e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-16 18:50:24,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=486429.5, ans=0.125 2024-09-16 18:50:58,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=486486.1666666667, ans=0.2 2024-09-16 18:51:05,458 INFO [train.py:1198] (1/2) Epoch 27, batch 5550, loss[loss=0.2245, ctc_loss=0.1504, cr_loss=0.3706, over 19402.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3777, over 4098597.42 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:51:13,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=486514.5, ans=0.2 2024-09-16 18:51:31,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-16 18:51:53,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486599.5, ans=0.125 2024-09-16 18:52:06,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486627.8333333333, ans=0.1 2024-09-16 18:52:19,808 INFO [train.py:1198] (1/2) Epoch 27, batch 5600, loss[loss=0.219, ctc_loss=0.1464, cr_loss=0.3628, over 20993.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1552, cr_loss=0.3786, over 4081887.94 frames. ], batch size: 52, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:52:30,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=486656.1666666667, ans=0.0 2024-09-16 18:52:41,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.128e+02 2.236e+02 2.434e+02 4.346e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 18:52:52,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486712.8333333333, ans=0.1 2024-09-16 18:52:55,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486712.8333333333, ans=0.1 2024-09-16 18:52:57,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=486712.8333333333, ans=0.125 2024-09-16 18:53:18,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=486769.5, ans=0.2 2024-09-16 18:53:34,444 INFO [train.py:1198] (1/2) Epoch 27, batch 5650, loss[loss=0.2024, ctc_loss=0.1327, cr_loss=0.3486, over 20768.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.378, over 4079748.17 frames. ], batch size: 53, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:53:51,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=486826.1666666667, ans=0.125 2024-09-16 18:53:52,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=486826.1666666667, ans=0.0 2024-09-16 18:54:05,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=486854.5, ans=0.2 2024-09-16 18:54:14,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486854.5, ans=0.1 2024-09-16 18:54:43,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-16 18:54:49,213 INFO [train.py:1198] (1/2) Epoch 27, batch 5700, loss[loss=0.1945, ctc_loss=0.1274, cr_loss=0.3357, over 20962.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.378, over 4064659.14 frames. ], batch size: 49, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:55:11,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.191e+02 2.338e+02 2.537e+02 5.088e+02, threshold=4.676e+02, percent-clipped=1.0 2024-09-16 18:55:19,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486996.1666666667, ans=0.125 2024-09-16 18:55:30,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:55:49,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=487052.8333333333, ans=0.125 2024-09-16 18:55:52,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=487052.8333333333, ans=0.125 2024-09-16 18:56:04,084 INFO [train.py:1198] (1/2) Epoch 27, batch 5750, loss[loss=0.2499, ctc_loss=0.1663, cr_loss=0.4182, over 20841.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1545, cr_loss=0.377, over 4073321.02 frames. ], batch size: 65, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:56:09,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487081.1666666667, ans=0.1 2024-09-16 18:56:09,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=487081.1666666667, ans=0.025 2024-09-16 18:56:31,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=22.5 2024-09-16 18:56:35,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487137.8333333333, ans=0.1 2024-09-16 18:56:43,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=487137.8333333333, ans=0.125 2024-09-16 18:57:00,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=487166.1666666667, ans=0.035 2024-09-16 18:57:01,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-16 18:57:06,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=487194.5, ans=0.125 2024-09-16 18:57:06,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.93 vs. limit=10.0 2024-09-16 18:57:11,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=487194.5, ans=0.025 2024-09-16 18:57:19,109 INFO [train.py:1198] (1/2) Epoch 27, batch 5800, loss[loss=0.2478, ctc_loss=0.1698, cr_loss=0.3898, over 21070.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1533, cr_loss=0.3751, over 4078902.14 frames. ], batch size: 59, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:57:41,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.107e+02 2.242e+02 2.438e+02 6.579e+02, threshold=4.484e+02, percent-clipped=1.0 2024-09-16 18:58:20,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=487336.1666666667, ans=0.2 2024-09-16 18:58:33,843 INFO [train.py:1198] (1/2) Epoch 27, batch 5850, loss[loss=0.2983, ctc_loss=0.2116, cr_loss=0.4335, over 14023.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1527, cr_loss=0.3742, over 4078500.88 frames. ], batch size: 151, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:59:01,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487392.8333333333, ans=0.1 2024-09-16 18:59:26,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=487449.5, ans=0.0 2024-09-16 18:59:53,454 INFO [train.py:1198] (1/2) Epoch 27, batch 5900, loss[loss=0.2373, ctc_loss=0.1586, cr_loss=0.3934, over 20841.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.152, cr_loss=0.3741, over 4085314.33 frames. ], batch size: 65, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:59:53,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=487506.1666666667, ans=0.2 2024-09-16 19:00:11,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=487534.5, ans=0.125 2024-09-16 19:00:11,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487534.5, ans=0.1 2024-09-16 19:00:16,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.096e+02 2.262e+02 2.458e+02 4.394e+02, threshold=4.524e+02, percent-clipped=0.0 2024-09-16 19:00:29,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=487562.8333333333, ans=0.0 2024-09-16 19:00:29,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=487562.8333333333, ans=0.2 2024-09-16 19:00:52,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=487619.5, ans=0.0 2024-09-16 19:00:53,678 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:01:08,240 INFO [train.py:1198] (1/2) Epoch 27, batch 5950, loss[loss=0.1998, ctc_loss=0.1332, cr_loss=0.3327, over 19895.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.3742, over 4100140.08 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:01:32,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=487676.1666666667, ans=0.125 2024-09-16 19:02:04,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=487732.8333333333, ans=0.025 2024-09-16 19:02:22,683 INFO [train.py:1198] (1/2) Epoch 27, batch 6000, loss[loss=0.2234, ctc_loss=0.1483, cr_loss=0.3756, over 21004.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1516, cr_loss=0.3736, over 4104666.54 frames. ], batch size: 61, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:02:22,683 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 19:02:44,556 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.04154, ctc_loss=0.04154, cr_loss=1.252e-14, over 944034.00 frames. 2024-09-16 19:02:44,557 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 19:03:07,051 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.233e+02 2.350e+02 2.661e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 19:03:35,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=487874.5, ans=0.125 2024-09-16 19:03:36,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=487874.5, ans=0.125 2024-09-16 19:03:59,399 INFO [train.py:1198] (1/2) Epoch 27, batch 6050, loss[loss=0.23, ctc_loss=0.1533, cr_loss=0.3833, over 20768.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1515, cr_loss=0.373, over 4101980.07 frames. ], batch size: 56, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:04:23,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=487959.5, ans=0.125 2024-09-16 19:04:39,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=487987.8333333333, ans=0.125 2024-09-16 19:05:14,295 INFO [train.py:1198] (1/2) Epoch 27, batch 6100, loss[loss=0.2247, ctc_loss=0.1474, cr_loss=0.3866, over 21069.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1524, cr_loss=0.3742, over 4099454.60 frames. ], batch size: 56, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:05:38,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.139e+02 2.338e+02 2.495e+02 3.055e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-16 19:06:27,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488214.5, ans=0.0 2024-09-16 19:06:28,648 INFO [train.py:1198] (1/2) Epoch 27, batch 6150, loss[loss=0.2358, ctc_loss=0.1607, cr_loss=0.3755, over 21059.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1517, cr_loss=0.3726, over 4094453.52 frames. ], batch size: 62, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:06:57,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=488242.8333333333, ans=0.125 2024-09-16 19:07:02,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-09-16 19:07:22,423 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:07:43,592 INFO [train.py:1198] (1/2) Epoch 27, batch 6200, loss[loss=0.2217, ctc_loss=0.1487, cr_loss=0.365, over 21038.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3763, over 4058339.46 frames. ], batch size: 62, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:07:53,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.96 vs. limit=10.0 2024-09-16 19:08:06,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.164e+02 2.279e+02 2.489e+02 3.203e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-16 19:08:21,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=488412.8333333333, ans=0.07 2024-09-16 19:08:29,426 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:08:32,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488441.1666666667, ans=0.1 2024-09-16 19:08:34,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-09-16 19:08:51,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=488469.5, ans=0.1 2024-09-16 19:08:57,344 INFO [train.py:1198] (1/2) Epoch 27, batch 6250, loss[loss=0.2429, ctc_loss=0.1634, cr_loss=0.3973, over 20662.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1563, cr_loss=0.3786, over 4019605.15 frames. ], batch size: 71, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:09:05,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-16 19:09:10,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488526.1666666667, ans=0.1 2024-09-16 19:09:16,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488526.1666666667, ans=0.125 2024-09-16 19:10:09,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-16 19:10:10,367 INFO [train.py:1198] (1/2) Epoch 27, batch 6300, loss[loss=0.249, ctc_loss=0.1657, cr_loss=0.4167, over 20701.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1577, cr_loss=0.3808, over 3983113.84 frames. ], batch size: 71, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:10:33,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.246e+02 2.457e+02 2.703e+02 3.753e+02, threshold=4.914e+02, percent-clipped=0.0 2024-09-16 19:10:50,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=488696.1666666667, ans=0.125 2024-09-16 19:11:00,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=488724.5, ans=0.125 2024-09-16 19:11:08,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-16 19:11:23,013 INFO [train.py:1198] (1/2) Epoch 27, batch 6350, loss[loss=0.2743, ctc_loss=0.1962, cr_loss=0.3902, over 14761.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3837, over 3813827.43 frames. ], batch size: 150, lr: 3.04e-03, grad_scale: 32.0 2024-09-16 19:11:31,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=488781.1666666667, ans=0.2 2024-09-16 19:11:37,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=488809.5, ans=0.125 2024-09-16 19:12:06,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=488866.1666666667, ans=0.5 2024-09-16 19:13:12,512 INFO [train.py:1198] (1/2) Epoch 28, batch 0, loss[loss=0.2504, ctc_loss=0.1665, cr_loss=0.4191, over 21069.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1665, cr_loss=0.4191, over 21069.00 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:13:12,513 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 19:13:30,797 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.04112, ctc_loss=0.04112, cr_loss=1.192e-14, over 944034.00 frames. 2024-09-16 19:13:30,798 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 19:13:47,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=15.0 2024-09-16 19:13:56,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=488925.6666666667, ans=0.5 2024-09-16 19:14:07,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488954.0, ans=0.0 2024-09-16 19:14:08,364 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.339e+02 2.616e+02 2.805e+02 4.235e+02, threshold=5.232e+02, percent-clipped=0.0 2024-09-16 19:14:25,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-16 19:14:34,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=489010.6666666667, ans=0.125 2024-09-16 19:14:46,358 INFO [train.py:1198] (1/2) Epoch 28, batch 50, loss[loss=0.2308, ctc_loss=0.1542, cr_loss=0.3833, over 21039.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1546, cr_loss=0.378, over 933443.59 frames. ], batch size: 62, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:14:52,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=489039.0, ans=0.0 2024-09-16 19:15:13,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:15:40,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489124.0, ans=0.1 2024-09-16 19:15:55,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-16 19:16:00,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489180.6666666667, ans=0.1 2024-09-16 19:16:01,213 INFO [train.py:1198] (1/2) Epoch 28, batch 100, loss[loss=0.2178, ctc_loss=0.1449, cr_loss=0.3646, over 21047.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3754, over 1637177.09 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:16:01,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=489180.6666666667, ans=0.125 2024-09-16 19:16:13,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=489180.6666666667, ans=0.125 2024-09-16 19:16:38,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.049e+02 2.186e+02 2.346e+02 2.862e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-16 19:16:53,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=489265.6666666667, ans=0.2 2024-09-16 19:17:01,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-16 19:17:02,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=489294.0, ans=0.2 2024-09-16 19:17:16,261 INFO [train.py:1198] (1/2) Epoch 28, batch 150, loss[loss=0.2447, ctc_loss=0.1625, cr_loss=0.4108, over 20683.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3772, over 2182653.04 frames. ], batch size: 66, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:18:14,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489407.3333333333, ans=0.1 2024-09-16 19:18:38,056 INFO [train.py:1198] (1/2) Epoch 28, batch 200, loss[loss=0.232, ctc_loss=0.1552, cr_loss=0.384, over 20939.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1527, cr_loss=0.3767, over 2613878.75 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:18:38,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489464.0, ans=0.125 2024-09-16 19:18:41,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489464.0, ans=0.125 2024-09-16 19:19:08,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489520.6666666667, ans=0.1 2024-09-16 19:19:15,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.113e+02 2.202e+02 2.382e+02 3.605e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 19:19:28,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=12.0 2024-09-16 19:19:37,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=489577.3333333333, ans=0.125 2024-09-16 19:19:53,343 INFO [train.py:1198] (1/2) Epoch 28, batch 250, loss[loss=0.235, ctc_loss=0.1572, cr_loss=0.3891, over 20925.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3767, over 2935558.28 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:20:06,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-16 19:20:07,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=489634.0, ans=0.0 2024-09-16 19:20:17,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=489634.0, ans=0.0 2024-09-16 19:20:26,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2024-09-16 19:20:40,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489690.6666666667, ans=0.1 2024-09-16 19:20:53,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=489719.0, ans=0.2 2024-09-16 19:20:59,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=489719.0, ans=0.0 2024-09-16 19:21:08,511 INFO [train.py:1198] (1/2) Epoch 28, batch 300, loss[loss=0.2615, ctc_loss=0.1796, cr_loss=0.4094, over 18083.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3754, over 3188438.82 frames. ], batch size: 108, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:21:14,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=489747.3333333333, ans=0.125 2024-09-16 19:21:36,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=489775.6666666667, ans=0.125 2024-09-16 19:21:37,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489804.0, ans=0.1 2024-09-16 19:21:46,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.146e+02 2.308e+02 2.546e+02 4.169e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-16 19:21:46,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=489804.0, ans=0.125 2024-09-16 19:22:00,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-16 19:22:13,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-16 19:22:24,322 INFO [train.py:1198] (1/2) Epoch 28, batch 350, loss[loss=0.2062, ctc_loss=0.1359, cr_loss=0.3518, over 20962.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3764, over 3397825.14 frames. ], batch size: 51, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:22:24,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=489889.0, ans=0.125 2024-09-16 19:22:47,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489917.3333333333, ans=0.125 2024-09-16 19:22:53,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489945.6666666667, ans=0.125 2024-09-16 19:23:28,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=490002.3333333333, ans=0.0 2024-09-16 19:23:40,126 INFO [train.py:1198] (1/2) Epoch 28, batch 400, loss[loss=0.2495, ctc_loss=0.168, cr_loss=0.4074, over 20684.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3758, over 3550084.10 frames. ], batch size: 66, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:23:44,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=490030.6666666667, ans=0.2 2024-09-16 19:24:07,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=490059.0, ans=0.0 2024-09-16 19:24:24,051 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.058e+02 2.177e+02 2.307e+02 4.460e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 19:24:25,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490087.3333333333, ans=0.0 2024-09-16 19:24:27,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=490087.3333333333, ans=0.2 2024-09-16 19:24:30,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490115.6666666667, ans=0.1 2024-09-16 19:24:36,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-16 19:24:43,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=490115.6666666667, ans=0.2 2024-09-16 19:25:01,609 INFO [train.py:1198] (1/2) Epoch 28, batch 450, loss[loss=0.1996, ctc_loss=0.1308, cr_loss=0.3437, over 21002.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3751, over 3663914.35 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:25:18,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=490200.6666666667, ans=0.125 2024-09-16 19:25:56,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=490257.3333333333, ans=0.2 2024-09-16 19:26:15,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-09-16 19:26:17,348 INFO [train.py:1198] (1/2) Epoch 28, batch 500, loss[loss=0.2623, ctc_loss=0.1801, cr_loss=0.4106, over 20007.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3754, over 3769481.88 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:26:54,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.113e+02 2.221e+02 2.416e+02 4.040e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-16 19:27:02,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=490399.0, ans=0.125 2024-09-16 19:27:08,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=490399.0, ans=0.0 2024-09-16 19:27:25,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-16 19:27:32,675 INFO [train.py:1198] (1/2) Epoch 28, batch 550, loss[loss=0.2683, ctc_loss=0.1865, cr_loss=0.4092, over 18165.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3758, over 3835489.94 frames. ], batch size: 108, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:28:39,910 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:28:41,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=12.0 2024-09-16 19:28:47,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=490597.3333333333, ans=0.125 2024-09-16 19:28:48,517 INFO [train.py:1198] (1/2) Epoch 28, batch 600, loss[loss=0.1994, ctc_loss=0.1313, cr_loss=0.3406, over 20988.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3745, over 3904841.75 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:28:57,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=490597.3333333333, ans=0.2 2024-09-16 19:29:08,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=490625.6666666667, ans=15.0 2024-09-16 19:29:20,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490654.0, ans=0.0 2024-09-16 19:29:26,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.094e+02 2.223e+02 2.358e+02 3.482e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 19:29:30,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=490654.0, ans=0.2 2024-09-16 19:30:06,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-09-16 19:30:10,158 INFO [train.py:1198] (1/2) Epoch 28, batch 650, loss[loss=0.2059, ctc_loss=0.1376, cr_loss=0.3417, over 20824.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1516, cr_loss=0.3737, over 3938064.70 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:30:24,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490767.3333333333, ans=0.0 2024-09-16 19:31:26,247 INFO [train.py:1198] (1/2) Epoch 28, batch 700, loss[loss=0.1994, ctc_loss=0.1289, cr_loss=0.3524, over 20963.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1511, cr_loss=0.3734, over 3984506.71 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:31:40,228 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:31:49,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=490909.0, ans=0.0 2024-09-16 19:31:56,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=490937.3333333333, ans=0.125 2024-09-16 19:32:03,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.093e+02 2.264e+02 2.478e+02 2.916e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 19:32:41,686 INFO [train.py:1198] (1/2) Epoch 28, batch 750, loss[loss=0.2138, ctc_loss=0.1403, cr_loss=0.3675, over 21045.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3732, over 4007707.61 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:32:52,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491022.3333333333, ans=0.1 2024-09-16 19:33:00,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=491050.6666666667, ans=0.0 2024-09-16 19:33:19,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=491079.0, ans=0.2 2024-09-16 19:33:50,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491135.6666666667, ans=0.1 2024-09-16 19:33:57,416 INFO [train.py:1198] (1/2) Epoch 28, batch 800, loss[loss=0.1921, ctc_loss=0.1261, cr_loss=0.33, over 20995.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3732, over 4019927.78 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:33:59,345 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:34:02,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-16 19:34:08,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=491164.0, ans=0.0 2024-09-16 19:34:35,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.101e+02 2.245e+02 2.423e+02 3.043e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 19:34:47,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=491249.0, ans=0.035 2024-09-16 19:34:49,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=491249.0, ans=0.125 2024-09-16 19:34:50,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=491249.0, ans=0.5 2024-09-16 19:35:13,316 INFO [train.py:1198] (1/2) Epoch 28, batch 850, loss[loss=0.2114, ctc_loss=0.1419, cr_loss=0.3478, over 20790.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3745, over 4045846.69 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:35:48,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=491362.3333333333, ans=0.035 2024-09-16 19:36:35,781 INFO [train.py:1198] (1/2) Epoch 28, batch 900, loss[loss=0.2006, ctc_loss=0.1323, cr_loss=0.3412, over 20960.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1503, cr_loss=0.372, over 4066158.09 frames. ], batch size: 50, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:36:49,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491475.6666666667, ans=0.125 2024-09-16 19:37:13,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.131e+02 2.254e+02 2.374e+02 2.915e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-16 19:37:25,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=491532.3333333333, ans=0.2 2024-09-16 19:37:38,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491560.6666666667, ans=0.125 2024-09-16 19:37:51,806 INFO [train.py:1198] (1/2) Epoch 28, batch 950, loss[loss=0.2177, ctc_loss=0.1467, cr_loss=0.3553, over 20879.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.151, cr_loss=0.3733, over 4073262.69 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:38:20,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=491645.6666666667, ans=0.025 2024-09-16 19:38:35,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=491674.0, ans=0.015 2024-09-16 19:38:39,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491674.0, ans=0.125 2024-09-16 19:38:44,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-16 19:39:01,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=491702.3333333333, ans=0.0 2024-09-16 19:39:07,125 INFO [train.py:1198] (1/2) Epoch 28, batch 1000, loss[loss=0.2372, ctc_loss=0.1563, cr_loss=0.4043, over 20963.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1505, cr_loss=0.3724, over 4073487.47 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:39:07,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=491730.6666666667, ans=0.025 2024-09-16 19:39:13,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=491730.6666666667, ans=0.02 2024-09-16 19:39:23,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=491759.0, ans=0.125 2024-09-16 19:39:45,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.136e+02 2.281e+02 2.453e+02 5.009e+02, threshold=4.561e+02, percent-clipped=2.0 2024-09-16 19:39:46,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=491787.3333333333, ans=0.07 2024-09-16 19:40:22,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491872.3333333333, ans=0.125 2024-09-16 19:40:23,360 INFO [train.py:1198] (1/2) Epoch 28, batch 1050, loss[loss=0.263, ctc_loss=0.1813, cr_loss=0.4089, over 19597.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1505, cr_loss=0.3724, over 4072978.20 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:40:40,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=491900.6666666667, ans=0.07 2024-09-16 19:40:59,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=491929.0, ans=0.0 2024-09-16 19:41:25,593 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:41:44,453 INFO [train.py:1198] (1/2) Epoch 28, batch 1100, loss[loss=0.2144, ctc_loss=0.14, cr_loss=0.3721, over 21077.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1508, cr_loss=0.3727, over 4080323.85 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:41:49,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492014.0, ans=0.125 2024-09-16 19:42:04,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=492042.3333333333, ans=0.0 2024-09-16 19:42:18,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-09-16 19:42:21,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492070.6666666667, ans=0.1 2024-09-16 19:42:22,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.136e+02 2.231e+02 2.490e+02 3.661e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 19:42:22,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=492070.6666666667, ans=0.125 2024-09-16 19:42:28,519 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:42:32,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492099.0, ans=0.1 2024-09-16 19:42:48,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=492127.3333333333, ans=0.125 2024-09-16 19:42:57,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=492127.3333333333, ans=0.0 2024-09-16 19:42:59,863 INFO [train.py:1198] (1/2) Epoch 28, batch 1150, loss[loss=0.2037, ctc_loss=0.1321, cr_loss=0.3581, over 20885.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1504, cr_loss=0.3722, over 4096467.31 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 16.0 2024-09-16 19:43:16,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=492184.0, ans=0.125 2024-09-16 19:43:37,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492212.3333333333, ans=0.125 2024-09-16 19:43:58,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=492269.0, ans=0.125 2024-09-16 19:44:15,287 INFO [train.py:1198] (1/2) Epoch 28, batch 1200, loss[loss=0.2132, ctc_loss=0.1444, cr_loss=0.3442, over 20828.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3736, over 4096452.14 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:44:16,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2024-09-16 19:44:54,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.207e+02 2.302e+02 2.494e+02 3.514e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-16 19:45:03,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-16 19:45:07,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-16 19:45:30,913 INFO [train.py:1198] (1/2) Epoch 28, batch 1250, loss[loss=0.2342, ctc_loss=0.1552, cr_loss=0.3951, over 20963.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1519, cr_loss=0.3749, over 4095783.71 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:45:40,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=492439.0, ans=0.125 2024-09-16 19:45:49,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492467.3333333333, ans=0.125 2024-09-16 19:46:22,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=492524.0, ans=0.0 2024-09-16 19:46:46,587 INFO [train.py:1198] (1/2) Epoch 28, batch 1300, loss[loss=0.2213, ctc_loss=0.1488, cr_loss=0.3622, over 20834.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3754, over 4100186.22 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:47:28,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.102e+02 2.236e+02 2.427e+02 7.756e+02, threshold=4.473e+02, percent-clipped=1.0 2024-09-16 19:47:38,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=492665.6666666667, ans=0.125 2024-09-16 19:47:56,800 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:48:05,282 INFO [train.py:1198] (1/2) Epoch 28, batch 1350, loss[loss=0.1829, ctc_loss=0.1204, cr_loss=0.3125, over 19952.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1525, cr_loss=0.3751, over 4099463.37 frames. ], batch size: 44, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:48:25,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=492750.6666666667, ans=0.02 2024-09-16 19:48:40,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=492779.0, ans=0.2 2024-09-16 19:48:48,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=492779.0, ans=0.0 2024-09-16 19:49:00,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=492807.3333333333, ans=0.2 2024-09-16 19:49:11,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=492835.6666666667, ans=0.2 2024-09-16 19:49:21,317 INFO [train.py:1198] (1/2) Epoch 28, batch 1400, loss[loss=0.2127, ctc_loss=0.1384, cr_loss=0.3714, over 20984.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3758, over 4093666.49 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:49:26,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=22.5 2024-09-16 19:49:32,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=492864.0, ans=0.0 2024-09-16 19:49:43,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492892.3333333333, ans=0.1 2024-09-16 19:49:55,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-16 19:50:01,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.130e+02 2.235e+02 2.387e+02 3.813e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-16 19:50:07,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492949.0, ans=0.1 2024-09-16 19:50:13,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=492949.0, ans=0.09899494936611666 2024-09-16 19:50:30,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492977.3333333333, ans=0.1 2024-09-16 19:50:33,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-16 19:50:37,463 INFO [train.py:1198] (1/2) Epoch 28, batch 1450, loss[loss=0.206, ctc_loss=0.137, cr_loss=0.3449, over 20793.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1529, cr_loss=0.3768, over 4090198.56 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:50:55,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=493034.0, ans=0.0 2024-09-16 19:51:03,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=493034.0, ans=0.0 2024-09-16 19:51:23,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-09-16 19:51:39,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=493119.0, ans=0.2 2024-09-16 19:51:52,875 INFO [train.py:1198] (1/2) Epoch 28, batch 1500, loss[loss=0.2039, ctc_loss=0.1339, cr_loss=0.35, over 19872.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1524, cr_loss=0.3764, over 4099753.53 frames. ], batch size: 44, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:52:20,602 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:52:32,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.152e+02 2.248e+02 2.416e+02 3.527e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 19:53:04,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-09-16 19:53:14,710 INFO [train.py:1198] (1/2) Epoch 28, batch 1550, loss[loss=0.1878, ctc_loss=0.1238, cr_loss=0.3198, over 20983.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1524, cr_loss=0.376, over 4101507.30 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:53:28,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=493317.3333333333, ans=0.0 2024-09-16 19:53:35,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-16 19:53:51,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-16 19:53:54,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=493345.6666666667, ans=0.125 2024-09-16 19:54:22,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=493402.3333333333, ans=0.2 2024-09-16 19:54:30,056 INFO [train.py:1198] (1/2) Epoch 28, batch 1600, loss[loss=0.2082, ctc_loss=0.1367, cr_loss=0.3579, over 20888.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3754, over 4102284.52 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:55:09,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.057e+02 2.231e+02 2.372e+02 4.901e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 19:55:14,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=493515.6666666667, ans=0.0 2024-09-16 19:55:29,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=493544.0, ans=0.05 2024-09-16 19:55:44,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=493572.3333333333, ans=0.125 2024-09-16 19:55:45,740 INFO [train.py:1198] (1/2) Epoch 28, batch 1650, loss[loss=0.2108, ctc_loss=0.141, cr_loss=0.3492, over 20957.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1521, cr_loss=0.3758, over 4097910.35 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:55:52,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493572.3333333333, ans=0.1 2024-09-16 19:56:05,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=493600.6666666667, ans=0.0 2024-09-16 19:56:52,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=493685.6666666667, ans=15.0 2024-09-16 19:56:57,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=493685.6666666667, ans=0.125 2024-09-16 19:56:57,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=493685.6666666667, ans=0.2 2024-09-16 19:57:01,438 INFO [train.py:1198] (1/2) Epoch 28, batch 1700, loss[loss=0.183, ctc_loss=0.1203, cr_loss=0.3133, over 20962.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.376, over 4089747.65 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:57:11,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.35 vs. limit=6.0 2024-09-16 19:57:27,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493742.3333333333, ans=0.125 2024-09-16 19:57:39,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=493770.6666666667, ans=0.125 2024-09-16 19:57:40,166 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.190e+02 2.320e+02 2.535e+02 3.922e+02, threshold=4.640e+02, percent-clipped=1.0 2024-09-16 19:57:46,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493799.0, ans=0.1 2024-09-16 19:57:49,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493799.0, ans=0.125 2024-09-16 19:57:54,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=493799.0, ans=0.125 2024-09-16 19:58:06,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=493827.3333333333, ans=0.0 2024-09-16 19:58:06,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-16 19:58:15,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493855.6666666667, ans=0.1 2024-09-16 19:58:16,310 INFO [train.py:1198] (1/2) Epoch 28, batch 1750, loss[loss=0.2692, ctc_loss=0.1825, cr_loss=0.4336, over 20019.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.153, cr_loss=0.3763, over 4088313.28 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:58:39,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=493884.0, ans=0.0 2024-09-16 19:58:43,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=493884.0, ans=0.2 2024-09-16 19:59:02,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-09-16 19:59:05,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-16 19:59:08,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=493940.6666666667, ans=0.2 2024-09-16 19:59:23,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493969.0, ans=0.1 2024-09-16 19:59:38,645 INFO [train.py:1198] (1/2) Epoch 28, batch 1800, loss[loss=0.2369, ctc_loss=0.1615, cr_loss=0.3772, over 20851.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3768, over 4084766.53 frames. ], batch size: 65, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:59:40,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=493997.3333333333, ans=0.0 2024-09-16 20:00:03,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=494025.6666666667, ans=0.125 2024-09-16 20:00:15,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=12.0 2024-09-16 20:00:17,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.135e+02 2.255e+02 2.429e+02 3.215e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-16 20:00:42,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-16 20:00:54,193 INFO [train.py:1198] (1/2) Epoch 28, batch 1850, loss[loss=0.198, ctc_loss=0.1308, cr_loss=0.3358, over 20979.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1524, cr_loss=0.376, over 4093320.14 frames. ], batch size: 51, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:00:57,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=494139.0, ans=0.2 2024-09-16 20:01:26,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-16 20:01:33,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=494195.6666666667, ans=0.2 2024-09-16 20:01:56,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=494252.3333333333, ans=0.125 2024-09-16 20:02:04,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-16 20:02:09,258 INFO [train.py:1198] (1/2) Epoch 28, batch 1900, loss[loss=0.2292, ctc_loss=0.152, cr_loss=0.386, over 20969.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1529, cr_loss=0.3774, over 4087516.52 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:02:44,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=494337.3333333333, ans=0.125 2024-09-16 20:02:48,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.108e+02 2.247e+02 2.390e+02 3.525e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 20:03:25,338 INFO [train.py:1198] (1/2) Epoch 28, batch 1950, loss[loss=0.2462, ctc_loss=0.1636, cr_loss=0.4129, over 20873.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1532, cr_loss=0.3777, over 4090765.28 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:03:36,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494422.3333333333, ans=0.125 2024-09-16 20:03:50,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=494450.6666666667, ans=0.025 2024-09-16 20:03:50,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=494450.6666666667, ans=0.125 2024-09-16 20:04:08,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-16 20:04:44,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-16 20:04:46,761 INFO [train.py:1198] (1/2) Epoch 28, batch 2000, loss[loss=0.2259, ctc_loss=0.1481, cr_loss=0.3889, over 20899.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1524, cr_loss=0.3766, over 4095264.19 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:05:07,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-16 20:05:16,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=494620.6666666667, ans=0.125 2024-09-16 20:05:22,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494620.6666666667, ans=0.125 2024-09-16 20:05:26,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.133e+02 2.302e+02 2.529e+02 5.901e+02, threshold=4.605e+02, percent-clipped=3.0 2024-09-16 20:05:34,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=494649.0, ans=0.125 2024-09-16 20:05:34,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 20:05:54,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494677.3333333333, ans=0.1 2024-09-16 20:06:03,204 INFO [train.py:1198] (1/2) Epoch 28, batch 2050, loss[loss=0.1882, ctc_loss=0.122, cr_loss=0.3314, over 20364.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1518, cr_loss=0.3758, over 4099341.85 frames. ], batch size: 45, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:06:27,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=494734.0, ans=0.125 2024-09-16 20:06:36,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=494762.3333333333, ans=0.125 2024-09-16 20:06:44,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=494762.3333333333, ans=0.035 2024-09-16 20:06:59,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=494790.6666666667, ans=0.0 2024-09-16 20:07:04,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-16 20:07:06,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=494819.0, ans=0.2 2024-09-16 20:07:14,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=494819.0, ans=0.0 2024-09-16 20:07:18,431 INFO [train.py:1198] (1/2) Epoch 28, batch 2100, loss[loss=0.2207, ctc_loss=0.1476, cr_loss=0.3653, over 20781.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3743, over 4106447.74 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:07:30,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=494847.3333333333, ans=0.2 2024-09-16 20:07:38,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=494875.6666666667, ans=0.0 2024-09-16 20:07:53,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=494904.0, ans=0.2 2024-09-16 20:07:57,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.122e+02 2.272e+02 2.437e+02 5.450e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 20:07:57,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=494904.0, ans=0.125 2024-09-16 20:08:02,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=494932.3333333333, ans=0.125 2024-09-16 20:08:12,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494932.3333333333, ans=0.1 2024-09-16 20:08:21,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=494960.6666666667, ans=0.125 2024-09-16 20:08:26,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-16 20:08:33,321 INFO [train.py:1198] (1/2) Epoch 28, batch 2150, loss[loss=0.2411, ctc_loss=0.1616, cr_loss=0.3979, over 20988.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3764, over 4092165.60 frames. ], batch size: 64, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:08:41,548 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:08:47,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=495017.3333333333, ans=0.125 2024-09-16 20:09:07,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-16 20:09:36,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=495102.3333333333, ans=0.125 2024-09-16 20:09:39,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=495102.3333333333, ans=0.125 2024-09-16 20:09:52,277 INFO [train.py:1198] (1/2) Epoch 28, batch 2200, loss[loss=0.2275, ctc_loss=0.1522, cr_loss=0.3767, over 21050.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3765, over 4093528.39 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:10:03,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=495130.6666666667, ans=0.0 2024-09-16 20:10:20,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-16 20:10:29,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=495187.3333333333, ans=0.025 2024-09-16 20:10:34,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.141e+02 2.278e+02 2.440e+02 4.602e+02, threshold=4.557e+02, percent-clipped=1.0 2024-09-16 20:11:03,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-16 20:11:08,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495244.0, ans=0.1 2024-09-16 20:11:11,562 INFO [train.py:1198] (1/2) Epoch 28, batch 2250, loss[loss=0.2336, ctc_loss=0.1559, cr_loss=0.3886, over 20659.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1535, cr_loss=0.3776, over 4084238.70 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:11:29,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=495300.6666666667, ans=0.2 2024-09-16 20:11:38,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=495300.6666666667, ans=0.125 2024-09-16 20:11:54,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-16 20:12:11,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=495385.6666666667, ans=0.125 2024-09-16 20:12:17,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495385.6666666667, ans=0.1 2024-09-16 20:12:26,591 INFO [train.py:1198] (1/2) Epoch 28, batch 2300, loss[loss=0.1934, ctc_loss=0.129, cr_loss=0.3221, over 20349.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1532, cr_loss=0.3768, over 4078552.30 frames. ], batch size: 45, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:12:33,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=495414.0, ans=0.125 2024-09-16 20:12:34,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=495414.0, ans=0.0 2024-09-16 20:12:39,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=495414.0, ans=0.2 2024-09-16 20:12:46,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495442.3333333333, ans=0.1 2024-09-16 20:13:06,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.112e+02 2.326e+02 2.444e+02 2.935e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-16 20:13:10,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=495499.0, ans=0.0 2024-09-16 20:13:42,050 INFO [train.py:1198] (1/2) Epoch 28, batch 2350, loss[loss=0.2293, ctc_loss=0.1528, cr_loss=0.3826, over 21049.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1546, cr_loss=0.3791, over 4074834.94 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:13:42,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=495555.6666666667, ans=0.125 2024-09-16 20:14:55,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=495669.0, ans=0.2 2024-09-16 20:14:57,526 INFO [train.py:1198] (1/2) Epoch 28, batch 2400, loss[loss=0.2558, ctc_loss=0.1715, cr_loss=0.4211, over 19958.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1536, cr_loss=0.3773, over 4069137.40 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:15:41,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.113e+02 2.271e+02 2.494e+02 3.632e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 20:16:07,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=495810.6666666667, ans=0.025 2024-09-16 20:16:18,516 INFO [train.py:1198] (1/2) Epoch 28, batch 2450, loss[loss=0.231, ctc_loss=0.1515, cr_loss=0.3972, over 20881.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3767, over 4082616.18 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:16:35,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=495867.3333333333, ans=0.0 2024-09-16 20:16:52,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495895.6666666667, ans=0.1 2024-09-16 20:17:23,964 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:17:34,059 INFO [train.py:1198] (1/2) Epoch 28, batch 2500, loss[loss=0.221, ctc_loss=0.148, cr_loss=0.3654, over 20791.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3777, over 4082717.88 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:18:08,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-09-16 20:18:12,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=496037.3333333333, ans=0.2 2024-09-16 20:18:13,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.187e+02 2.313e+02 2.428e+02 3.220e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-16 20:18:19,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=496065.6666666667, ans=0.02 2024-09-16 20:18:47,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496122.3333333333, ans=0.0 2024-09-16 20:18:47,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496122.3333333333, ans=0.0 2024-09-16 20:18:47,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=496122.3333333333, ans=0.125 2024-09-16 20:18:48,958 INFO [train.py:1198] (1/2) Epoch 28, batch 2550, loss[loss=0.2502, ctc_loss=0.1665, cr_loss=0.4184, over 20712.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3762, over 4089596.32 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:18:50,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=15.0 2024-09-16 20:19:13,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496150.6666666667, ans=0.1 2024-09-16 20:19:49,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496235.6666666667, ans=0.1 2024-09-16 20:19:50,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496235.6666666667, ans=0.1 2024-09-16 20:20:05,432 INFO [train.py:1198] (1/2) Epoch 28, batch 2600, loss[loss=0.2076, ctc_loss=0.1392, cr_loss=0.342, over 20773.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3753, over 4096470.64 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:20:22,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=22.5 2024-09-16 20:20:35,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496320.6666666667, ans=0.0 2024-09-16 20:20:42,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-16 20:20:43,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=12.0 2024-09-16 20:20:44,473 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.141e+02 2.289e+02 2.459e+02 4.311e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-16 20:21:01,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=496349.0, ans=0.125 2024-09-16 20:21:24,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=496377.3333333333, ans=0.0 2024-09-16 20:21:26,805 INFO [train.py:1198] (1/2) Epoch 28, batch 2650, loss[loss=0.1748, ctc_loss=0.113, cr_loss=0.309, over 20957.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.3737, over 4090961.45 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:21:49,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=496434.0, ans=0.0 2024-09-16 20:22:42,063 INFO [train.py:1198] (1/2) Epoch 28, batch 2700, loss[loss=0.1859, ctc_loss=0.1227, cr_loss=0.3158, over 20976.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1517, cr_loss=0.3738, over 4081728.65 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:23:21,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.128e+02 2.256e+02 2.425e+02 3.621e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-16 20:23:21,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=496604.0, ans=0.2 2024-09-16 20:23:37,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=496632.3333333333, ans=0.09899494936611666 2024-09-16 20:23:50,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=496660.6666666667, ans=0.125 2024-09-16 20:23:55,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=496660.6666666667, ans=0.125 2024-09-16 20:23:57,861 INFO [train.py:1198] (1/2) Epoch 28, batch 2750, loss[loss=0.2191, ctc_loss=0.1454, cr_loss=0.3682, over 21063.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3755, over 4091752.56 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:24:02,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=496689.0, ans=0.2 2024-09-16 20:24:16,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=496717.3333333333, ans=0.0 2024-09-16 20:24:40,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-16 20:24:58,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=496802.3333333333, ans=0.2 2024-09-16 20:25:13,432 INFO [train.py:1198] (1/2) Epoch 28, batch 2800, loss[loss=0.2508, ctc_loss=0.1672, cr_loss=0.418, over 20988.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.376, over 4096962.55 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:25:37,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=496859.0, ans=0.125 2024-09-16 20:25:52,568 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.109e+02 2.232e+02 2.388e+02 2.954e+02, threshold=4.464e+02, percent-clipped=0.0 2024-09-16 20:25:59,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=496915.6666666667, ans=0.125 2024-09-16 20:26:23,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=496944.0, ans=10.0 2024-09-16 20:26:29,524 INFO [train.py:1198] (1/2) Epoch 28, batch 2850, loss[loss=0.2467, ctc_loss=0.1622, cr_loss=0.4229, over 19877.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.375, over 4108070.73 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:26:30,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 20:26:51,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=497000.6666666667, ans=0.2 2024-09-16 20:27:22,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=497057.3333333333, ans=0.0 2024-09-16 20:27:51,527 INFO [train.py:1198] (1/2) Epoch 28, batch 2900, loss[loss=0.2264, ctc_loss=0.1499, cr_loss=0.3825, over 20794.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3744, over 4120610.68 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:27:59,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=497114.0, ans=0.125 2024-09-16 20:28:04,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2024-09-16 20:28:22,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=497170.6666666667, ans=0.125 2024-09-16 20:28:28,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=497170.6666666667, ans=0.0 2024-09-16 20:28:31,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.125e+02 2.271e+02 2.398e+02 2.995e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-16 20:29:07,767 INFO [train.py:1198] (1/2) Epoch 28, batch 2950, loss[loss=0.2654, ctc_loss=0.1782, cr_loss=0.436, over 19989.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3745, over 4122143.63 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:29:24,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=497284.0, ans=0.125 2024-09-16 20:29:45,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=497312.3333333333, ans=0.0 2024-09-16 20:30:13,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2024-09-16 20:30:23,428 INFO [train.py:1198] (1/2) Epoch 28, batch 3000, loss[loss=0.215, ctc_loss=0.143, cr_loss=0.36, over 19023.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3744, over 4101422.27 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:30:23,429 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 20:30:47,054 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5401, 5.5864, 5.4921, 4.9768], device='cuda:1') 2024-09-16 20:30:48,321 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.04093, ctc_loss=0.04093, cr_loss=1.238e-14, over 944034.00 frames. 2024-09-16 20:30:48,322 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 20:31:19,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=497454.0, ans=0.125 2024-09-16 20:31:27,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.138e+02 2.291e+02 2.531e+02 4.310e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-16 20:31:38,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=497482.3333333333, ans=0.125 2024-09-16 20:31:41,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-16 20:32:02,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=497539.0, ans=0.2 2024-09-16 20:32:04,175 INFO [train.py:1198] (1/2) Epoch 28, batch 3050, loss[loss=0.2082, ctc_loss=0.1381, cr_loss=0.3503, over 19502.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3749, over 4107027.07 frames. ], batch size: 43, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:33:25,901 INFO [train.py:1198] (1/2) Epoch 28, batch 3100, loss[loss=0.2008, ctc_loss=0.1348, cr_loss=0.3301, over 20784.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3733, over 4098869.81 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:33:40,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 20:33:49,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497709.0, ans=0.1 2024-09-16 20:34:02,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-16 20:34:04,466 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.109e+02 2.242e+02 2.400e+02 5.396e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-16 20:34:18,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=497765.6666666667, ans=0.2 2024-09-16 20:34:24,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497794.0, ans=0.125 2024-09-16 20:34:24,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-16 20:34:40,640 INFO [train.py:1198] (1/2) Epoch 28, batch 3150, loss[loss=0.2081, ctc_loss=0.1382, cr_loss=0.3494, over 20979.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.152, cr_loss=0.3749, over 4079146.82 frames. ], batch size: 51, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:35:48,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-16 20:35:56,663 INFO [train.py:1198] (1/2) Epoch 28, batch 3200, loss[loss=0.2506, ctc_loss=0.1699, cr_loss=0.4039, over 21029.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1521, cr_loss=0.375, over 4079373.92 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:36:07,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=497964.0, ans=0.0 2024-09-16 20:36:08,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-09-16 20:36:12,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=497992.3333333333, ans=0.125 2024-09-16 20:36:37,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.199e+02 2.317e+02 2.478e+02 3.979e+02, threshold=4.634e+02, percent-clipped=0.0 2024-09-16 20:37:11,797 INFO [train.py:1198] (1/2) Epoch 28, batch 3250, loss[loss=0.2358, ctc_loss=0.1605, cr_loss=0.3765, over 20853.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1524, cr_loss=0.3753, over 4080646.99 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:37:25,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498134.0, ans=0.125 2024-09-16 20:37:45,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=22.5 2024-09-16 20:37:55,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=498190.6666666667, ans=0.0 2024-09-16 20:37:55,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=498190.6666666667, ans=0.125 2024-09-16 20:38:06,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=498190.6666666667, ans=0.125 2024-09-16 20:38:24,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=498219.0, ans=0.0 2024-09-16 20:38:29,955 INFO [train.py:1198] (1/2) Epoch 28, batch 3300, loss[loss=0.2054, ctc_loss=0.1345, cr_loss=0.3543, over 19943.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3755, over 4092815.79 frames. ], batch size: 44, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:38:33,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=498247.3333333333, ans=0.125 2024-09-16 20:38:48,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-16 20:39:10,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=498304.0, ans=0.0 2024-09-16 20:39:13,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.130e+02 2.280e+02 2.472e+02 3.865e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-16 20:39:35,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=498360.6666666667, ans=0.025 2024-09-16 20:39:48,682 INFO [train.py:1198] (1/2) Epoch 28, batch 3350, loss[loss=0.2807, ctc_loss=0.2, cr_loss=0.404, over 14797.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1526, cr_loss=0.3762, over 4084367.63 frames. ], batch size: 150, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:39:59,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=498389.0, ans=0.125 2024-09-16 20:40:11,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=498417.3333333333, ans=0.125 2024-09-16 20:40:11,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=12.0 2024-09-16 20:40:25,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=498445.6666666667, ans=0.2 2024-09-16 20:41:04,434 INFO [train.py:1198] (1/2) Epoch 28, batch 3400, loss[loss=0.2275, ctc_loss=0.1512, cr_loss=0.3816, over 21027.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3755, over 4098955.22 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:41:09,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=498530.6666666667, ans=0.2 2024-09-16 20:41:30,776 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:41:45,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.146e+02 2.277e+02 2.460e+02 4.514e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-16 20:42:21,019 INFO [train.py:1198] (1/2) Epoch 28, batch 3450, loss[loss=0.2219, ctc_loss=0.1465, cr_loss=0.377, over 21065.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1527, cr_loss=0.3764, over 4091043.29 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:42:35,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=498700.6666666667, ans=0.125 2024-09-16 20:42:52,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=22.5 2024-09-16 20:43:09,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=498757.3333333333, ans=0.0 2024-09-16 20:43:36,473 INFO [train.py:1198] (1/2) Epoch 28, batch 3500, loss[loss=0.2998, ctc_loss=0.2113, cr_loss=0.4421, over 14912.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.3763, over 4088064.11 frames. ], batch size: 149, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:44:15,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=498870.6666666667, ans=0.125 2024-09-16 20:44:22,558 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.183e+02 2.312e+02 2.447e+02 5.682e+02, threshold=4.625e+02, percent-clipped=1.0 2024-09-16 20:44:41,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=498927.3333333333, ans=0.2 2024-09-16 20:44:57,447 INFO [train.py:1198] (1/2) Epoch 28, batch 3550, loss[loss=0.1934, ctc_loss=0.1257, cr_loss=0.3386, over 21051.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.3762, over 4075592.20 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:45:03,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=498955.6666666667, ans=0.0 2024-09-16 20:45:05,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2024-09-16 20:45:21,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=498984.0, ans=0.125 2024-09-16 20:45:41,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=499040.6666666667, ans=0.125 2024-09-16 20:45:55,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499040.6666666667, ans=0.125 2024-09-16 20:46:12,797 INFO [train.py:1198] (1/2) Epoch 28, batch 3600, loss[loss=0.2334, ctc_loss=0.1572, cr_loss=0.3811, over 20642.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.153, cr_loss=0.3773, over 4090373.29 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:46:53,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.146e+02 2.288e+02 2.475e+02 4.845e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-16 20:47:11,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=499210.6666666667, ans=0.2 2024-09-16 20:47:23,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-16 20:47:28,420 INFO [train.py:1198] (1/2) Epoch 28, batch 3650, loss[loss=0.2181, ctc_loss=0.1456, cr_loss=0.3622, over 20768.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1538, cr_loss=0.378, over 4085035.56 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:47:28,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=499239.0, ans=0.04949747468305833 2024-09-16 20:47:29,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=499239.0, ans=22.5 2024-09-16 20:48:09,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=499295.6666666667, ans=0.2 2024-09-16 20:48:15,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=499324.0, ans=0.125 2024-09-16 20:48:27,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=499352.3333333333, ans=0.09899494936611666 2024-09-16 20:48:44,197 INFO [train.py:1198] (1/2) Epoch 28, batch 3700, loss[loss=0.2384, ctc_loss=0.1567, cr_loss=0.4084, over 20971.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3776, over 4087127.80 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:49:10,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499409.0, ans=0.125 2024-09-16 20:49:25,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.098e+02 2.247e+02 2.386e+02 3.394e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 20:49:58,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499494.0, ans=0.1 2024-09-16 20:50:06,017 INFO [train.py:1198] (1/2) Epoch 28, batch 3750, loss[loss=0.2456, ctc_loss=0.1656, cr_loss=0.4003, over 20761.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3767, over 4092049.33 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:50:13,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=499522.3333333333, ans=0.035 2024-09-16 20:50:25,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=499550.6666666667, ans=0.125 2024-09-16 20:50:30,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-16 20:51:21,710 INFO [train.py:1198] (1/2) Epoch 28, batch 3800, loss[loss=0.2027, ctc_loss=0.1336, cr_loss=0.3454, over 20948.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3757, over 4093513.62 frames. ], batch size: 50, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:52:02,248 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.126e+02 2.237e+02 2.383e+02 2.888e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 20:52:16,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=499749.0, ans=0.05 2024-09-16 20:52:34,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499777.3333333333, ans=0.1 2024-09-16 20:52:37,244 INFO [train.py:1198] (1/2) Epoch 28, batch 3850, loss[loss=0.2405, ctc_loss=0.1621, cr_loss=0.3921, over 20652.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1518, cr_loss=0.3754, over 4098422.26 frames. ], batch size: 66, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:52:44,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=499805.6666666667, ans=0.125 2024-09-16 20:53:03,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=499834.0, ans=0.125 2024-09-16 20:53:06,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=499862.3333333333, ans=0.125 2024-09-16 20:53:18,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=499862.3333333333, ans=0.125 2024-09-16 20:53:27,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499890.6666666667, ans=0.1 2024-09-16 20:53:41,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=499919.0, ans=0.0 2024-09-16 20:53:52,744 INFO [train.py:1198] (1/2) Epoch 28, batch 3900, loss[loss=0.2686, ctc_loss=0.1824, cr_loss=0.4306, over 20051.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3753, over 4092322.02 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:54:15,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=499975.6666666667, ans=0.0 2024-09-16 20:54:17,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-16 20:54:33,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.115e+02 2.236e+02 2.375e+02 3.783e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 20:55:08,581 INFO [train.py:1198] (1/2) Epoch 28, batch 3950, loss[loss=0.2376, ctc_loss=0.1618, cr_loss=0.379, over 20702.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1532, cr_loss=0.3766, over 4072146.99 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:55:50,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-16 20:55:58,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=500174.0, ans=0.5 2024-09-16 20:56:02,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2024-09-16 20:56:30,377 INFO [train.py:1198] (1/2) Epoch 28, batch 4000, loss[loss=0.251, ctc_loss=0.1656, cr_loss=0.4273, over 20687.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1541, cr_loss=0.3775, over 4058660.25 frames. ], batch size: 68, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 20:57:10,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-09-16 20:57:11,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.144e+02 2.276e+02 2.484e+02 3.087e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 20:57:46,279 INFO [train.py:1198] (1/2) Epoch 28, batch 4050, loss[loss=0.2255, ctc_loss=0.1516, cr_loss=0.3694, over 20830.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3755, over 4071394.98 frames. ], batch size: 59, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 20:58:13,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=500400.6666666667, ans=0.025 2024-09-16 20:58:45,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500485.6666666667, ans=0.0 2024-09-16 20:58:56,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=500485.6666666667, ans=10.0 2024-09-16 20:59:01,995 INFO [train.py:1198] (1/2) Epoch 28, batch 4100, loss[loss=0.2371, ctc_loss=0.1599, cr_loss=0.3861, over 20998.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3747, over 4090645.52 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 20:59:40,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500570.6666666667, ans=0.1 2024-09-16 20:59:44,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.085e+02 2.201e+02 2.340e+02 3.271e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-16 20:59:47,992 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:00:01,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-16 21:00:07,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=500627.3333333333, ans=0.125 2024-09-16 21:00:11,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=500627.3333333333, ans=0.0 2024-09-16 21:00:17,568 INFO [train.py:1198] (1/2) Epoch 28, batch 4150, loss[loss=0.2548, ctc_loss=0.1746, cr_loss=0.4009, over 18374.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.152, cr_loss=0.3752, over 4096040.20 frames. ], batch size: 108, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:01:35,879 INFO [train.py:1198] (1/2) Epoch 28, batch 4200, loss[loss=0.2256, ctc_loss=0.1519, cr_loss=0.3688, over 20987.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3765, over 4085636.60 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:01:43,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=500797.3333333333, ans=0.125 2024-09-16 21:02:16,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=500854.0, ans=0.2 2024-09-16 21:02:20,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.165e+02 2.284e+02 2.410e+02 6.079e+02, threshold=4.568e+02, percent-clipped=1.0 2024-09-16 21:02:54,294 INFO [train.py:1198] (1/2) Epoch 28, batch 4250, loss[loss=0.2585, ctc_loss=0.1776, cr_loss=0.4046, over 19956.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3772, over 4086780.30 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:02:59,355 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:03:23,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=500995.6666666667, ans=0.2 2024-09-16 21:03:43,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501024.0, ans=0.1 2024-09-16 21:03:43,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=501024.0, ans=0.2 2024-09-16 21:04:10,323 INFO [train.py:1198] (1/2) Epoch 28, batch 4300, loss[loss=0.2318, ctc_loss=0.1537, cr_loss=0.3905, over 20667.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3764, over 4087107.59 frames. ], batch size: 66, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:04:13,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=501080.6666666667, ans=0.125 2024-09-16 21:04:45,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=501137.3333333333, ans=0.2 2024-09-16 21:04:48,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-16 21:04:52,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.142e+02 2.228e+02 2.405e+02 4.426e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 21:05:04,995 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:05:25,695 INFO [train.py:1198] (1/2) Epoch 28, batch 4350, loss[loss=0.2177, ctc_loss=0.1458, cr_loss=0.3596, over 20963.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1527, cr_loss=0.3762, over 4087367.45 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:05:35,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-09-16 21:05:49,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=501250.6666666667, ans=0.125 2024-09-16 21:05:56,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=501279.0, ans=0.125 2024-09-16 21:06:01,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=501279.0, ans=0.125 2024-09-16 21:06:01,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=501279.0, ans=0.125 2024-09-16 21:06:07,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=501279.0, ans=10.0 2024-09-16 21:06:27,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=501335.6666666667, ans=0.2 2024-09-16 21:06:39,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=501335.6666666667, ans=15.0 2024-09-16 21:06:44,319 INFO [train.py:1198] (1/2) Epoch 28, batch 4400, loss[loss=0.2675, ctc_loss=0.1867, cr_loss=0.4039, over 13759.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1538, cr_loss=0.3779, over 4083524.97 frames. ], batch size: 150, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:06:49,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=501364.0, ans=0.04949747468305833 2024-09-16 21:06:56,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=501364.0, ans=0.025 2024-09-16 21:07:10,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=501392.3333333333, ans=0.125 2024-09-16 21:07:30,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.147e+02 2.251e+02 2.378e+02 5.998e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-16 21:07:35,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=501449.0, ans=15.0 2024-09-16 21:08:04,211 INFO [train.py:1198] (1/2) Epoch 28, batch 4450, loss[loss=0.217, ctc_loss=0.1448, cr_loss=0.3612, over 20840.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3768, over 4092345.01 frames. ], batch size: 59, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:08:50,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=22.5 2024-09-16 21:08:56,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=501590.6666666667, ans=0.07 2024-09-16 21:08:59,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=501590.6666666667, ans=15.0 2024-09-16 21:09:16,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=501619.0, ans=0.07 2024-09-16 21:09:19,615 INFO [train.py:1198] (1/2) Epoch 28, batch 4500, loss[loss=0.2139, ctc_loss=0.1425, cr_loss=0.3569, over 21058.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1535, cr_loss=0.3769, over 4078276.07 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:09:27,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501647.3333333333, ans=0.1 2024-09-16 21:10:01,878 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.133e+02 2.291e+02 2.453e+02 3.158e+02, threshold=4.581e+02, percent-clipped=0.0 2024-09-16 21:10:17,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=501732.3333333333, ans=0.125 2024-09-16 21:10:35,603 INFO [train.py:1198] (1/2) Epoch 28, batch 4550, loss[loss=0.2342, ctc_loss=0.1575, cr_loss=0.3837, over 21075.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1533, cr_loss=0.3763, over 4079171.17 frames. ], batch size: 53, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:11:14,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=501845.6666666667, ans=0.125 2024-09-16 21:11:29,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=501874.0, ans=0.0 2024-09-16 21:11:44,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=501902.3333333333, ans=0.0 2024-09-16 21:11:52,227 INFO [train.py:1198] (1/2) Epoch 28, batch 4600, loss[loss=0.2375, ctc_loss=0.1613, cr_loss=0.3808, over 18092.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.376, over 4073271.74 frames. ], batch size: 108, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:12:08,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=501959.0, ans=0.025 2024-09-16 21:12:20,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-16 21:12:37,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=501987.3333333333, ans=0.0 2024-09-16 21:12:39,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.146e+02 2.237e+02 2.412e+02 3.153e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 21:12:41,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-16 21:12:51,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502015.6666666667, ans=0.125 2024-09-16 21:13:13,557 INFO [train.py:1198] (1/2) Epoch 28, batch 4650, loss[loss=0.1875, ctc_loss=0.1234, cr_loss=0.3205, over 21052.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3761, over 4087617.77 frames. ], batch size: 53, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:13:48,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=502129.0, ans=0.125 2024-09-16 21:14:04,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502157.3333333333, ans=0.1 2024-09-16 21:14:07,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=502157.3333333333, ans=0.09899494936611666 2024-09-16 21:14:09,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=502157.3333333333, ans=0.09899494936611666 2024-09-16 21:14:28,656 INFO [train.py:1198] (1/2) Epoch 28, batch 4700, loss[loss=0.2252, ctc_loss=0.1511, cr_loss=0.3705, over 20884.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3758, over 4095150.67 frames. ], batch size: 54, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:14:38,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=502214.0, ans=0.125 2024-09-16 21:14:50,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502242.3333333333, ans=0.1 2024-09-16 21:14:57,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=502270.6666666667, ans=0.2 2024-09-16 21:15:12,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.129e+02 2.268e+02 2.494e+02 5.015e+02, threshold=4.537e+02, percent-clipped=1.0 2024-09-16 21:15:19,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=502299.0, ans=0.07 2024-09-16 21:15:34,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2024-09-16 21:15:44,772 INFO [train.py:1198] (1/2) Epoch 28, batch 4750, loss[loss=0.238, ctc_loss=0.16, cr_loss=0.3901, over 21043.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1524, cr_loss=0.3754, over 4089065.62 frames. ], batch size: 62, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:17:01,161 INFO [train.py:1198] (1/2) Epoch 28, batch 4800, loss[loss=0.2417, ctc_loss=0.1647, cr_loss=0.3849, over 20341.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.3758, over 4104224.47 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:17:08,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502497.3333333333, ans=0.1 2024-09-16 21:17:13,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=502497.3333333333, ans=0.125 2024-09-16 21:17:44,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.224e+02 2.306e+02 2.486e+02 3.165e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-16 21:18:19,046 INFO [train.py:1198] (1/2) Epoch 28, batch 4850, loss[loss=0.2364, ctc_loss=0.1608, cr_loss=0.378, over 20951.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3765, over 4097278.68 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:18:28,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502639.0, ans=0.1 2024-09-16 21:18:35,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=502667.3333333333, ans=0.2 2024-09-16 21:19:03,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=502695.6666666667, ans=0.125 2024-09-16 21:19:30,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=502752.3333333333, ans=0.2 2024-09-16 21:19:37,146 INFO [train.py:1198] (1/2) Epoch 28, batch 4900, loss[loss=0.2596, ctc_loss=0.1772, cr_loss=0.4122, over 20281.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3749, over 4112151.73 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:20:16,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=502837.3333333333, ans=0.0 2024-09-16 21:20:19,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-16 21:20:20,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.110e+02 2.257e+02 2.460e+02 3.997e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-16 21:20:23,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=502865.6666666667, ans=0.025 2024-09-16 21:20:27,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=502865.6666666667, ans=0.125 2024-09-16 21:20:35,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=502894.0, ans=0.0 2024-09-16 21:20:51,010 INFO [train.py:1198] (1/2) Epoch 28, batch 4950, loss[loss=0.241, ctc_loss=0.1633, cr_loss=0.3888, over 20750.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3733, over 4113629.27 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:20:57,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=502922.3333333333, ans=0.5 2024-09-16 21:20:58,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502922.3333333333, ans=0.1 2024-09-16 21:21:04,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502950.6666666667, ans=0.1 2024-09-16 21:21:10,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=502950.6666666667, ans=0.0 2024-09-16 21:21:15,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=502950.6666666667, ans=0.125 2024-09-16 21:22:04,922 INFO [train.py:1198] (1/2) Epoch 28, batch 5000, loss[loss=0.2241, ctc_loss=0.1487, cr_loss=0.3766, over 20767.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3745, over 4113808.74 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:22:30,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503092.3333333333, ans=0.0 2024-09-16 21:22:35,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=503120.6666666667, ans=0.0 2024-09-16 21:22:38,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503120.6666666667, ans=0.1 2024-09-16 21:22:45,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=503120.6666666667, ans=0.025 2024-09-16 21:22:49,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.130e+02 2.261e+02 2.428e+02 2.897e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-16 21:23:19,579 INFO [train.py:1198] (1/2) Epoch 28, batch 5050, loss[loss=0.2693, ctc_loss=0.1846, cr_loss=0.4236, over 19463.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.3745, over 4109697.66 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:23:19,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=503205.6666666667, ans=0.2 2024-09-16 21:23:22,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=503205.6666666667, ans=0.125 2024-09-16 21:23:37,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=503234.0, ans=0.125 2024-09-16 21:24:05,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=503290.6666666667, ans=0.125 2024-09-16 21:24:12,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=503290.6666666667, ans=0.125 2024-09-16 21:24:21,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-16 21:24:33,128 INFO [train.py:1198] (1/2) Epoch 28, batch 5100, loss[loss=0.2015, ctc_loss=0.1348, cr_loss=0.3337, over 20979.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3752, over 4108316.14 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 8.0 2024-09-16 21:24:59,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-16 21:25:18,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.141e+02 2.268e+02 2.427e+02 3.703e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 21:25:42,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=503460.6666666667, ans=0.2 2024-09-16 21:25:46,628 INFO [train.py:1198] (1/2) Epoch 28, batch 5150, loss[loss=0.2607, ctc_loss=0.1781, cr_loss=0.4132, over 20149.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3764, over 4104734.48 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 8.0 2024-09-16 21:25:55,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=503489.0, ans=0.2 2024-09-16 21:26:10,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503517.3333333333, ans=0.1 2024-09-16 21:26:25,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=503545.6666666667, ans=0.95 2024-09-16 21:27:03,388 INFO [train.py:1198] (1/2) Epoch 28, batch 5200, loss[loss=0.2311, ctc_loss=0.1545, cr_loss=0.3825, over 21015.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3773, over 4107315.46 frames. ], batch size: 63, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:27:20,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-16 21:27:21,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503659.0, ans=0.1 2024-09-16 21:27:34,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=503687.3333333333, ans=0.95 2024-09-16 21:27:49,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.200e+02 2.332e+02 2.586e+02 6.274e+02, threshold=4.663e+02, percent-clipped=2.0 2024-09-16 21:28:09,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=503744.0, ans=0.125 2024-09-16 21:28:15,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=503744.0, ans=0.125 2024-09-16 21:28:17,910 INFO [train.py:1198] (1/2) Epoch 28, batch 5250, loss[loss=0.2181, ctc_loss=0.1434, cr_loss=0.3737, over 20950.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1536, cr_loss=0.3781, over 4105270.97 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:28:18,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 21:29:05,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503857.3333333333, ans=0.125 2024-09-16 21:29:19,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503885.6666666667, ans=0.1 2024-09-16 21:29:35,071 INFO [train.py:1198] (1/2) Epoch 28, batch 5300, loss[loss=0.1972, ctc_loss=0.1322, cr_loss=0.3245, over 20993.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1537, cr_loss=0.3782, over 4111517.41 frames. ], batch size: 52, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:29:51,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-16 21:29:51,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-16 21:30:21,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.155e+02 2.302e+02 2.472e+02 3.709e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-16 21:30:41,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.90 vs. limit=10.0 2024-09-16 21:30:44,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504027.3333333333, ans=0.125 2024-09-16 21:30:48,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=504055.6666666667, ans=0.125 2024-09-16 21:30:49,530 INFO [train.py:1198] (1/2) Epoch 28, batch 5350, loss[loss=0.1859, ctc_loss=0.122, cr_loss=0.3196, over 20975.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1534, cr_loss=0.3774, over 4106290.69 frames. ], batch size: 48, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:31:07,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=504084.0, ans=0.2 2024-09-16 21:31:15,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2024-09-16 21:32:04,145 INFO [train.py:1198] (1/2) Epoch 28, batch 5400, loss[loss=0.1907, ctc_loss=0.1257, cr_loss=0.3252, over 20989.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3769, over 4102938.97 frames. ], batch size: 52, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:32:14,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504197.3333333333, ans=0.1 2024-09-16 21:32:47,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=504282.3333333333, ans=0.025 2024-09-16 21:32:47,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=504282.3333333333, ans=0.025 2024-09-16 21:32:49,830 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.092e+02 2.222e+02 2.408e+02 7.406e+02, threshold=4.444e+02, percent-clipped=2.0 2024-09-16 21:33:16,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=504339.0, ans=0.125 2024-09-16 21:33:18,124 INFO [train.py:1198] (1/2) Epoch 28, batch 5450, loss[loss=0.2285, ctc_loss=0.1526, cr_loss=0.3798, over 20771.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1537, cr_loss=0.378, over 4095458.71 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:33:27,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=504339.0, ans=0.0 2024-09-16 21:33:49,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=504395.6666666667, ans=0.0 2024-09-16 21:33:58,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504395.6666666667, ans=0.1 2024-09-16 21:34:01,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=504424.0, ans=0.025 2024-09-16 21:34:32,140 INFO [train.py:1198] (1/2) Epoch 28, batch 5500, loss[loss=0.2131, ctc_loss=0.1396, cr_loss=0.3677, over 20957.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1525, cr_loss=0.3764, over 4097861.50 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:34:53,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-09-16 21:34:54,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=504509.0, ans=0.2 2024-09-16 21:35:18,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.105e+02 2.280e+02 2.480e+02 5.090e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-16 21:35:21,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=504565.6666666667, ans=0.035 2024-09-16 21:35:32,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-16 21:35:41,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=504594.0, ans=0.125 2024-09-16 21:35:47,199 INFO [train.py:1198] (1/2) Epoch 28, batch 5550, loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3682, over 20793.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1528, cr_loss=0.3771, over 4101551.93 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:36:09,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504650.6666666667, ans=0.0 2024-09-16 21:36:36,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=504707.3333333333, ans=0.05 2024-09-16 21:36:54,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=504735.6666666667, ans=0.05 2024-09-16 21:37:04,278 INFO [train.py:1198] (1/2) Epoch 28, batch 5600, loss[loss=0.2501, ctc_loss=0.1696, cr_loss=0.4024, over 20677.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3769, over 4081304.23 frames. ], batch size: 71, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:37:13,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=504764.0, ans=0.0 2024-09-16 21:37:31,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504792.3333333333, ans=0.125 2024-09-16 21:37:43,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504820.6666666667, ans=0.1 2024-09-16 21:37:53,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.146e+02 2.252e+02 2.387e+02 5.267e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 21:37:59,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-16 21:38:02,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=504849.0, ans=0.0 2024-09-16 21:38:04,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-09-16 21:38:06,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=504877.3333333333, ans=0.125 2024-09-16 21:38:21,319 INFO [train.py:1198] (1/2) Epoch 28, batch 5650, loss[loss=0.1956, ctc_loss=0.1294, cr_loss=0.3307, over 21005.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3757, over 4082443.96 frames. ], batch size: 52, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:38:34,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504934.0, ans=0.1 2024-09-16 21:38:37,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=504934.0, ans=0.95 2024-09-16 21:38:52,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=504962.3333333333, ans=0.125 2024-09-16 21:38:52,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.25 vs. limit=6.0 2024-09-16 21:38:56,989 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:39:02,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=504962.3333333333, ans=0.125 2024-09-16 21:39:14,308 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:39:14,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504990.6666666667, ans=0.1 2024-09-16 21:39:33,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=505047.3333333333, ans=0.125 2024-09-16 21:39:34,638 INFO [train.py:1198] (1/2) Epoch 28, batch 5700, loss[loss=0.2156, ctc_loss=0.1439, cr_loss=0.3588, over 21055.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1521, cr_loss=0.3748, over 4081748.55 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:39:48,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=505075.6666666667, ans=0.2 2024-09-16 21:39:51,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505075.6666666667, ans=0.1 2024-09-16 21:39:55,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=505075.6666666667, ans=0.5 2024-09-16 21:39:57,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=505075.6666666667, ans=0.0 2024-09-16 21:40:20,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.140e+02 2.238e+02 2.342e+02 2.999e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 21:40:29,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505132.3333333333, ans=0.1 2024-09-16 21:40:48,548 INFO [train.py:1198] (1/2) Epoch 28, batch 5750, loss[loss=0.2321, ctc_loss=0.157, cr_loss=0.3756, over 20884.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.3739, over 4089406.94 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:40:54,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=505189.0, ans=0.125 2024-09-16 21:41:00,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505189.0, ans=0.125 2024-09-16 21:41:15,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=505217.3333333333, ans=0.04949747468305833 2024-09-16 21:41:15,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=505217.3333333333, ans=0.07 2024-09-16 21:41:20,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=505245.6666666667, ans=0.0 2024-09-16 21:42:03,456 INFO [train.py:1198] (1/2) Epoch 28, batch 5800, loss[loss=0.2391, ctc_loss=0.1614, cr_loss=0.3883, over 20959.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1517, cr_loss=0.3743, over 4092577.64 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:42:22,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-09-16 21:42:26,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-16 21:42:45,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505387.3333333333, ans=0.125 2024-09-16 21:42:51,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.134e+02 2.272e+02 2.544e+02 5.505e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 21:43:18,144 INFO [train.py:1198] (1/2) Epoch 28, batch 5850, loss[loss=0.246, ctc_loss=0.1648, cr_loss=0.4061, over 20681.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1509, cr_loss=0.3733, over 4101217.52 frames. ], batch size: 66, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:43:46,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=505529.0, ans=0.2 2024-09-16 21:44:10,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=505557.3333333333, ans=0.025 2024-09-16 21:44:29,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=505585.6666666667, ans=0.125 2024-09-16 21:44:32,286 INFO [train.py:1198] (1/2) Epoch 28, batch 5900, loss[loss=0.187, ctc_loss=0.1219, cr_loss=0.3254, over 20952.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3743, over 4094423.51 frames. ], batch size: 49, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:44:43,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505614.0, ans=0.1 2024-09-16 21:45:02,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=505642.3333333333, ans=0.125 2024-09-16 21:45:06,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=505670.6666666667, ans=0.125 2024-09-16 21:45:22,687 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.117e+02 2.265e+02 2.412e+02 3.728e+02, threshold=4.529e+02, percent-clipped=0.0 2024-09-16 21:45:49,403 INFO [train.py:1198] (1/2) Epoch 28, batch 5950, loss[loss=0.2058, ctc_loss=0.1333, cr_loss=0.3628, over 20993.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.3741, over 4097033.98 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:46:07,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=505784.0, ans=0.2 2024-09-16 21:46:07,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.25 vs. limit=6.0 2024-09-16 21:46:50,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=505869.0, ans=0.2 2024-09-16 21:46:57,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=505869.0, ans=0.025 2024-09-16 21:47:06,094 INFO [train.py:1198] (1/2) Epoch 28, batch 6000, loss[loss=0.2163, ctc_loss=0.1435, cr_loss=0.3642, over 20987.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1516, cr_loss=0.374, over 4093707.49 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:47:06,094 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 21:47:29,412 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.04197, ctc_loss=0.04197, cr_loss=1.216e-14, over 944034.00 frames. 2024-09-16 21:47:29,413 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 21:47:44,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505925.6666666667, ans=0.1 2024-09-16 21:48:17,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.116e+02 2.217e+02 2.360e+02 3.028e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 21:48:39,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=506010.6666666667, ans=0.125 2024-09-16 21:48:39,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=506010.6666666667, ans=0.125 2024-09-16 21:48:44,085 INFO [train.py:1198] (1/2) Epoch 28, batch 6050, loss[loss=0.2299, ctc_loss=0.1542, cr_loss=0.3788, over 20976.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.152, cr_loss=0.3748, over 4093433.28 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:49:01,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=506067.3333333333, ans=0.125 2024-09-16 21:49:15,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=506095.6666666667, ans=0.2 2024-09-16 21:49:22,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=506095.6666666667, ans=0.125 2024-09-16 21:49:25,737 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:49:46,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=506152.3333333333, ans=0.125 2024-09-16 21:49:59,972 INFO [train.py:1198] (1/2) Epoch 28, batch 6100, loss[loss=0.2412, ctc_loss=0.16, cr_loss=0.4058, over 20969.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3762, over 4093147.44 frames. ], batch size: 64, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:50:28,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=506237.3333333333, ans=0.95 2024-09-16 21:50:47,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.156e+02 2.328e+02 2.488e+02 3.502e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-16 21:51:07,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=506294.0, ans=0.125 2024-09-16 21:51:14,449 INFO [train.py:1198] (1/2) Epoch 28, batch 6150, loss[loss=0.2423, ctc_loss=0.1659, cr_loss=0.3818, over 20351.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3759, over 4083422.29 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:52:09,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.79 vs. limit=10.0 2024-09-16 21:52:23,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=506435.6666666667, ans=0.125 2024-09-16 21:52:26,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=506435.6666666667, ans=0.125 2024-09-16 21:52:29,176 INFO [train.py:1198] (1/2) Epoch 28, batch 6200, loss[loss=0.245, ctc_loss=0.1669, cr_loss=0.3904, over 20242.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1536, cr_loss=0.3769, over 4046651.09 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:52:33,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2024-09-16 21:52:54,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=506492.3333333333, ans=0.125 2024-09-16 21:53:16,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.160e+02 2.342e+02 2.500e+02 5.895e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-16 21:53:19,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=506549.0, ans=0.125 2024-09-16 21:53:26,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506577.3333333333, ans=0.1 2024-09-16 21:53:28,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=506577.3333333333, ans=0.125 2024-09-16 21:53:38,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=506577.3333333333, ans=0.2 2024-09-16 21:53:43,201 INFO [train.py:1198] (1/2) Epoch 28, batch 6250, loss[loss=0.2278, ctc_loss=0.1518, cr_loss=0.3803, over 21039.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1548, cr_loss=0.3773, over 4003448.70 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:54:04,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=506634.0, ans=0.125 2024-09-16 21:54:17,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=12.0 2024-09-16 21:54:43,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=506719.0, ans=0.0 2024-09-16 21:54:52,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=506719.0, ans=0.0 2024-09-16 21:54:57,927 INFO [train.py:1198] (1/2) Epoch 28, batch 6300, loss[loss=0.2516, ctc_loss=0.1713, cr_loss=0.4015, over 18151.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3764, over 3980733.65 frames. ], batch size: 108, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:55:02,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=506747.3333333333, ans=0.0 2024-09-16 21:55:03,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2024-09-16 21:55:08,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=506747.3333333333, ans=0.035 2024-09-16 21:55:36,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-09-16 21:55:44,319 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.280e+02 2.444e+02 2.673e+02 4.130e+02, threshold=4.889e+02, percent-clipped=0.0 2024-09-16 21:55:56,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506860.6666666667, ans=0.1 2024-09-16 21:56:03,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=506860.6666666667, ans=0.2 2024-09-16 21:56:08,857 INFO [train.py:1198] (1/2) Epoch 28, batch 6350, loss[loss=0.2821, ctc_loss=0.2003, cr_loss=0.4088, over 14101.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1601, cr_loss=0.3819, over 3872318.92 frames. ], batch size: 149, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:56:56,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=506974.0, ans=0.0 2024-09-16 21:57:56,070 INFO [train.py:1198] (1/2) Epoch 29, batch 0, loss[loss=0.2179, ctc_loss=0.1426, cr_loss=0.3767, over 21048.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1426, cr_loss=0.3767, over 21048.00 frames. ], batch size: 53, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 21:57:56,070 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 21:58:07,763 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7771, 4.0015, 4.5064, 4.5522, 4.0014, 4.4319, 3.7538, 3.8325], device='cuda:1') 2024-09-16 21:58:14,220 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.04134, ctc_loss=0.04134, cr_loss=1.227e-14, over 944034.00 frames. 2024-09-16 21:58:14,221 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 21:58:15,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=507005.1666666667, ans=0.2 2024-09-16 21:58:49,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2024-09-16 21:59:18,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.224e+02 2.498e+02 2.716e+02 3.407e+02, threshold=4.995e+02, percent-clipped=0.0 2024-09-16 21:59:32,279 INFO [train.py:1198] (1/2) Epoch 29, batch 50, loss[loss=0.2538, ctc_loss=0.1764, cr_loss=0.3869, over 13980.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1561, cr_loss=0.38, over 917580.41 frames. ], batch size: 149, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:00:08,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=507203.5, ans=0.2 2024-09-16 22:00:35,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=507260.1666666667, ans=0.0 2024-09-16 22:00:41,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=507260.1666666667, ans=0.125 2024-09-16 22:00:47,534 INFO [train.py:1198] (1/2) Epoch 29, batch 100, loss[loss=0.2616, ctc_loss=0.18, cr_loss=0.4078, over 19328.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1537, cr_loss=0.3757, over 1621674.11 frames. ], batch size: 90, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:01:40,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=507373.5, ans=0.0 2024-09-16 22:01:52,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.132e+02 2.252e+02 2.426e+02 2.986e+02, threshold=4.505e+02, percent-clipped=0.0 2024-09-16 22:02:01,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507401.8333333333, ans=0.1 2024-09-16 22:02:05,616 INFO [train.py:1198] (1/2) Epoch 29, batch 150, loss[loss=0.2198, ctc_loss=0.1451, cr_loss=0.3739, over 20972.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.3749, over 2173924.16 frames. ], batch size: 58, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:02:08,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507430.1666666667, ans=0.0 2024-09-16 22:02:57,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=507515.1666666667, ans=15.0 2024-09-16 22:03:12,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-16 22:03:21,560 INFO [train.py:1198] (1/2) Epoch 29, batch 200, loss[loss=0.2325, ctc_loss=0.1572, cr_loss=0.3763, over 21006.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3756, over 2605933.19 frames. ], batch size: 61, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:03:23,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=22.5 2024-09-16 22:03:30,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=507571.8333333333, ans=0.07 2024-09-16 22:03:30,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=507571.8333333333, ans=0.125 2024-09-16 22:03:51,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=507628.5, ans=0.125 2024-09-16 22:04:08,941 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:04:16,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=507656.8333333333, ans=0.125 2024-09-16 22:04:20,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=507685.1666666667, ans=0.125 2024-09-16 22:04:23,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.105e+02 2.221e+02 2.379e+02 4.827e+02, threshold=4.441e+02, percent-clipped=1.0 2024-09-16 22:04:28,779 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:04:40,408 INFO [train.py:1198] (1/2) Epoch 29, batch 250, loss[loss=0.1894, ctc_loss=0.1214, cr_loss=0.3403, over 20956.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1523, cr_loss=0.3766, over 2940053.14 frames. ], batch size: 50, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:04:47,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2024-09-16 22:04:49,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=507713.5, ans=0.125 2024-09-16 22:05:18,831 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:05:32,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=507798.5, ans=0.125 2024-09-16 22:05:56,758 INFO [train.py:1198] (1/2) Epoch 29, batch 300, loss[loss=0.183, ctc_loss=0.1178, cr_loss=0.3263, over 20978.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1529, cr_loss=0.3771, over 3176277.98 frames. ], batch size: 51, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:06:15,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=507883.5, ans=0.0 2024-09-16 22:06:16,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=507883.5, ans=10.0 2024-09-16 22:06:19,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=507883.5, ans=0.0 2024-09-16 22:06:37,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=507911.8333333333, ans=0.0 2024-09-16 22:06:58,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.127e+02 2.263e+02 2.443e+02 4.033e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 22:07:15,511 INFO [train.py:1198] (1/2) Epoch 29, batch 350, loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3727, over 21053.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3749, over 3385944.88 frames. ], batch size: 56, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:07:27,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=507996.8333333333, ans=0.035 2024-09-16 22:07:27,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=507996.8333333333, ans=0.125 2024-09-16 22:07:38,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=508025.1666666667, ans=0.0 2024-09-16 22:08:02,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508081.8333333333, ans=0.1 2024-09-16 22:08:04,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-16 22:08:09,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=508081.8333333333, ans=0.125 2024-09-16 22:08:09,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=508081.8333333333, ans=0.025 2024-09-16 22:08:20,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-16 22:08:29,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=508138.5, ans=0.125 2024-09-16 22:08:30,472 INFO [train.py:1198] (1/2) Epoch 29, batch 400, loss[loss=0.2309, ctc_loss=0.1586, cr_loss=0.3618, over 20266.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1527, cr_loss=0.3763, over 3535380.17 frames. ], batch size: 74, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:08:30,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=508138.5, ans=0.05 2024-09-16 22:08:39,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508138.5, ans=0.1 2024-09-16 22:09:31,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.124e+02 2.239e+02 2.409e+02 3.301e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-16 22:09:45,121 INFO [train.py:1198] (1/2) Epoch 29, batch 450, loss[loss=0.1791, ctc_loss=0.117, cr_loss=0.3106, over 20982.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1529, cr_loss=0.377, over 3663008.76 frames. ], batch size: 51, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:09:46,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=508280.1666666667, ans=0.125 2024-09-16 22:10:06,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-09-16 22:11:04,186 INFO [train.py:1198] (1/2) Epoch 29, batch 500, loss[loss=0.2071, ctc_loss=0.136, cr_loss=0.3555, over 21047.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1534, cr_loss=0.3779, over 3769924.02 frames. ], batch size: 56, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:11:13,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=508421.8333333333, ans=0.125 2024-09-16 22:11:18,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=508450.1666666667, ans=0.125 2024-09-16 22:11:22,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=508450.1666666667, ans=0.125 2024-09-16 22:12:07,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.095e+02 2.179e+02 2.359e+02 3.125e+02, threshold=4.358e+02, percent-clipped=0.0 2024-09-16 22:12:19,827 INFO [train.py:1198] (1/2) Epoch 29, batch 550, loss[loss=0.2628, ctc_loss=0.1787, cr_loss=0.4203, over 20828.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1525, cr_loss=0.3767, over 3851473.55 frames. ], batch size: 65, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:12:44,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=508591.8333333333, ans=0.125 2024-09-16 22:13:06,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=508620.1666666667, ans=0.035 2024-09-16 22:13:19,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=508648.5, ans=0.2 2024-09-16 22:13:38,281 INFO [train.py:1198] (1/2) Epoch 29, batch 600, loss[loss=0.2868, ctc_loss=0.2071, cr_loss=0.3984, over 14426.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1527, cr_loss=0.3764, over 3898075.98 frames. ], batch size: 152, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:14:43,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.148e+02 2.274e+02 2.476e+02 4.327e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-16 22:14:54,337 INFO [train.py:1198] (1/2) Epoch 29, batch 650, loss[loss=0.2022, ctc_loss=0.1308, cr_loss=0.3568, over 19872.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3751, over 3936238.25 frames. ], batch size: 44, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:15:42,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=508931.8333333333, ans=0.125 2024-09-16 22:16:03,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-09-16 22:16:05,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-16 22:16:13,544 INFO [train.py:1198] (1/2) Epoch 29, batch 700, loss[loss=0.2235, ctc_loss=0.1501, cr_loss=0.367, over 20877.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3745, over 3970742.39 frames. ], batch size: 57, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:16:13,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=508988.5, ans=0.07 2024-09-16 22:17:18,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.151e+02 2.230e+02 2.410e+02 6.479e+02, threshold=4.459e+02, percent-clipped=1.0 2024-09-16 22:17:28,597 INFO [train.py:1198] (1/2) Epoch 29, batch 750, loss[loss=0.2363, ctc_loss=0.1588, cr_loss=0.3874, over 20662.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1515, cr_loss=0.3748, over 4005766.82 frames. ], batch size: 66, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:17:39,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=509130.1666666667, ans=0.0 2024-09-16 22:18:03,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=509186.8333333333, ans=0.2 2024-09-16 22:18:15,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=509215.1666666667, ans=0.125 2024-09-16 22:18:41,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=509243.5, ans=0.2 2024-09-16 22:18:46,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2024-09-16 22:18:47,013 INFO [train.py:1198] (1/2) Epoch 29, batch 800, loss[loss=0.2059, ctc_loss=0.1389, cr_loss=0.3349, over 21064.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3759, over 4026395.76 frames. ], batch size: 53, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:19:06,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=509300.1666666667, ans=0.0 2024-09-16 22:19:51,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.118e+02 2.231e+02 2.446e+02 6.172e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 22:20:02,335 INFO [train.py:1198] (1/2) Epoch 29, batch 850, loss[loss=0.1888, ctc_loss=0.1238, cr_loss=0.325, over 20948.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1514, cr_loss=0.3744, over 4049438.74 frames. ], batch size: 49, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:20:07,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=509413.5, ans=0.0 2024-09-16 22:20:25,512 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:20:53,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-16 22:21:01,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=509526.8333333333, ans=0.125 2024-09-16 22:21:18,318 INFO [train.py:1198] (1/2) Epoch 29, batch 900, loss[loss=0.2088, ctc_loss=0.1379, cr_loss=0.3546, over 20868.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3743, over 4052632.35 frames. ], batch size: 57, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:21:44,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509583.5, ans=0.1 2024-09-16 22:22:10,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-16 22:22:25,846 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.150e+02 2.291e+02 2.446e+02 6.186e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 22:22:29,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=509668.5, ans=0.025 2024-09-16 22:22:36,653 INFO [train.py:1198] (1/2) Epoch 29, batch 950, loss[loss=0.2677, ctc_loss=0.1816, cr_loss=0.4304, over 20118.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3755, over 4068159.52 frames. ], batch size: 80, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:22:59,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=509725.1666666667, ans=0.2 2024-09-16 22:23:02,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=509725.1666666667, ans=0.125 2024-09-16 22:23:09,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509753.5, ans=0.1 2024-09-16 22:23:09,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=509753.5, ans=0.125 2024-09-16 22:23:19,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=509753.5, ans=0.125 2024-09-16 22:23:34,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2024-09-16 22:23:51,745 INFO [train.py:1198] (1/2) Epoch 29, batch 1000, loss[loss=0.214, ctc_loss=0.1434, cr_loss=0.353, over 20993.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3759, over 4088357.20 frames. ], batch size: 48, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:23:59,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=509838.5, ans=0.0 2024-09-16 22:24:59,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.126e+02 2.274e+02 2.456e+02 3.874e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-16 22:25:08,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=509980.1666666667, ans=0.0 2024-09-16 22:25:09,905 INFO [train.py:1198] (1/2) Epoch 29, batch 1050, loss[loss=0.2509, ctc_loss=0.1689, cr_loss=0.4103, over 20679.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3753, over 4089981.63 frames. ], batch size: 71, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:25:10,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=509980.1666666667, ans=0.0 2024-09-16 22:25:11,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509980.1666666667, ans=0.1 2024-09-16 22:25:17,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509980.1666666667, ans=0.1 2024-09-16 22:25:21,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=509980.1666666667, ans=0.2 2024-09-16 22:25:23,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-16 22:25:26,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=510008.5, ans=0.2 2024-09-16 22:25:45,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=510036.8333333333, ans=0.025 2024-09-16 22:25:53,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=510036.8333333333, ans=0.0 2024-09-16 22:25:59,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=510065.1666666667, ans=0.0 2024-09-16 22:26:06,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=510065.1666666667, ans=0.025 2024-09-16 22:26:19,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:26:23,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-16 22:26:25,406 INFO [train.py:1198] (1/2) Epoch 29, batch 1100, loss[loss=0.2071, ctc_loss=0.1372, cr_loss=0.3492, over 21069.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3755, over 4082791.55 frames. ], batch size: 56, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:26:27,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2024-09-16 22:26:35,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=510121.8333333333, ans=0.125 2024-09-16 22:26:39,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=510150.1666666667, ans=0.125 2024-09-16 22:26:39,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=510150.1666666667, ans=0.125 2024-09-16 22:26:51,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=510150.1666666667, ans=0.125 2024-09-16 22:27:16,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-16 22:27:33,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.109e+02 2.245e+02 2.404e+02 3.046e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 22:27:43,740 INFO [train.py:1198] (1/2) Epoch 29, batch 1150, loss[loss=0.2452, ctc_loss=0.1627, cr_loss=0.4123, over 20932.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.3756, over 4090222.89 frames. ], batch size: 60, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:28:59,928 INFO [train.py:1198] (1/2) Epoch 29, batch 1200, loss[loss=0.2017, ctc_loss=0.134, cr_loss=0.3385, over 20936.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3749, over 4082490.12 frames. ], batch size: 50, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:29:09,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=510405.1666666667, ans=0.0 2024-09-16 22:29:30,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=510461.8333333333, ans=0.2 2024-09-16 22:30:05,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.088e+02 2.280e+02 2.493e+02 4.353e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-16 22:30:18,912 INFO [train.py:1198] (1/2) Epoch 29, batch 1250, loss[loss=0.2346, ctc_loss=0.1558, cr_loss=0.394, over 20672.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3755, over 4065903.70 frames. ], batch size: 68, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:30:34,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-16 22:30:40,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=22.5 2024-09-16 22:31:18,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-16 22:31:24,391 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:31:34,813 INFO [train.py:1198] (1/2) Epoch 29, batch 1300, loss[loss=0.2008, ctc_loss=0.1315, cr_loss=0.3463, over 20968.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3757, over 4058825.63 frames. ], batch size: 48, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:31:44,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510688.5, ans=0.1 2024-09-16 22:32:04,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=510745.1666666667, ans=0.125 2024-09-16 22:32:04,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2024-09-16 22:32:26,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=510773.5, ans=0.125 2024-09-16 22:32:30,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510773.5, ans=0.1 2024-09-16 22:32:40,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.148e+02 2.225e+02 2.364e+02 3.598e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 22:32:40,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=510801.8333333333, ans=0.125 2024-09-16 22:32:48,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=510801.8333333333, ans=0.0 2024-09-16 22:32:50,751 INFO [train.py:1198] (1/2) Epoch 29, batch 1350, loss[loss=0.2228, ctc_loss=0.1493, cr_loss=0.3674, over 20939.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3742, over 4078794.39 frames. ], batch size: 51, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:32:55,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=510830.1666666667, ans=0.025 2024-09-16 22:33:22,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=510886.8333333333, ans=0.2 2024-09-16 22:33:26,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-16 22:33:31,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=510886.8333333333, ans=0.0 2024-09-16 22:33:33,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=510886.8333333333, ans=0.0 2024-09-16 22:33:37,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.75 vs. limit=10.0 2024-09-16 22:33:41,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=510915.1666666667, ans=0.125 2024-09-16 22:34:07,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-16 22:34:10,121 INFO [train.py:1198] (1/2) Epoch 29, batch 1400, loss[loss=0.2527, ctc_loss=0.1676, cr_loss=0.4256, over 20105.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3754, over 4086894.19 frames. ], batch size: 80, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:34:46,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=511028.5, ans=0.125 2024-09-16 22:35:14,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.124e+02 2.250e+02 2.355e+02 2.864e+02, threshold=4.501e+02, percent-clipped=0.0 2024-09-16 22:35:25,599 INFO [train.py:1198] (1/2) Epoch 29, batch 1450, loss[loss=0.2248, ctc_loss=0.1487, cr_loss=0.3801, over 20285.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3733, over 4081488.03 frames. ], batch size: 74, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:35:42,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=511141.8333333333, ans=0.2 2024-09-16 22:36:07,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=511170.1666666667, ans=0.2 2024-09-16 22:36:37,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=511226.8333333333, ans=0.125 2024-09-16 22:36:40,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511226.8333333333, ans=0.125 2024-09-16 22:36:44,967 INFO [train.py:1198] (1/2) Epoch 29, batch 1500, loss[loss=0.2044, ctc_loss=0.136, cr_loss=0.3418, over 20982.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1507, cr_loss=0.3723, over 4080775.03 frames. ], batch size: 51, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:37:01,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-16 22:37:26,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=511311.8333333333, ans=0.015 2024-09-16 22:37:33,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=511340.1666666667, ans=0.125 2024-09-16 22:37:35,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-09-16 22:37:37,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=511340.1666666667, ans=0.125 2024-09-16 22:37:50,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.140e+02 2.286e+02 2.435e+02 3.535e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-16 22:37:53,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=511368.5, ans=0.0 2024-09-16 22:38:00,892 INFO [train.py:1198] (1/2) Epoch 29, batch 1550, loss[loss=0.2268, ctc_loss=0.1505, cr_loss=0.3815, over 20673.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1501, cr_loss=0.3718, over 4091148.12 frames. ], batch size: 71, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:38:51,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511481.8333333333, ans=0.1 2024-09-16 22:39:00,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=511481.8333333333, ans=0.05 2024-09-16 22:39:03,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=511510.1666666667, ans=0.025 2024-09-16 22:39:19,500 INFO [train.py:1198] (1/2) Epoch 29, batch 1600, loss[loss=0.2928, ctc_loss=0.2031, cr_loss=0.4482, over 18185.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1504, cr_loss=0.3725, over 4097115.57 frames. ], batch size: 108, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:40:25,052 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.124e+02 2.264e+02 2.371e+02 3.682e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 22:40:35,605 INFO [train.py:1198] (1/2) Epoch 29, batch 1650, loss[loss=0.1999, ctc_loss=0.1362, cr_loss=0.3186, over 20987.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1507, cr_loss=0.3726, over 4099173.42 frames. ], batch size: 50, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:40:40,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511680.1666666667, ans=0.1 2024-09-16 22:41:36,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-16 22:41:53,689 INFO [train.py:1198] (1/2) Epoch 29, batch 1700, loss[loss=0.2425, ctc_loss=0.1628, cr_loss=0.3985, over 21085.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1514, cr_loss=0.374, over 4078146.22 frames. ], batch size: 59, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:41:56,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-16 22:42:33,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=511878.5, ans=0.2 2024-09-16 22:42:38,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=511906.8333333333, ans=0.125 2024-09-16 22:43:00,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.153e+02 2.296e+02 2.463e+02 4.622e+02, threshold=4.592e+02, percent-clipped=1.0 2024-09-16 22:43:09,784 INFO [train.py:1198] (1/2) Epoch 29, batch 1750, loss[loss=0.1779, ctc_loss=0.1165, cr_loss=0.3068, over 20981.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1522, cr_loss=0.3753, over 4072884.42 frames. ], batch size: 49, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:43:20,819 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:43:25,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=511991.8333333333, ans=0.125 2024-09-16 22:43:25,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=511991.8333333333, ans=0.2 2024-09-16 22:43:31,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511991.8333333333, ans=0.1 2024-09-16 22:43:37,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=511991.8333333333, ans=0.125 2024-09-16 22:43:54,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512048.5, ans=0.1 2024-09-16 22:44:13,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-09-16 22:44:28,574 INFO [train.py:1198] (1/2) Epoch 29, batch 1800, loss[loss=0.1925, ctc_loss=0.1265, cr_loss=0.3301, over 20939.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3744, over 4086094.17 frames. ], batch size: 50, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:45:08,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=512161.8333333333, ans=0.025 2024-09-16 22:45:12,779 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:45:14,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=512190.1666666667, ans=0.125 2024-09-16 22:45:35,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.152e+02 2.333e+02 2.463e+02 3.061e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-16 22:45:44,761 INFO [train.py:1198] (1/2) Epoch 29, batch 1850, loss[loss=0.222, ctc_loss=0.148, cr_loss=0.3699, over 21012.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.376, over 4090416.41 frames. ], batch size: 61, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:45:48,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-16 22:46:01,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=512275.1666666667, ans=0.125 2024-09-16 22:46:30,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=512331.8333333333, ans=0.125 2024-09-16 22:46:37,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=512331.8333333333, ans=0.125 2024-09-16 22:46:44,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2024-09-16 22:47:00,270 INFO [train.py:1198] (1/2) Epoch 29, batch 1900, loss[loss=0.286, ctc_loss=0.2014, cr_loss=0.4232, over 14423.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3758, over 4097129.41 frames. ], batch size: 149, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:47:05,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=512388.5, ans=0.125 2024-09-16 22:47:38,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-16 22:47:43,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=512445.1666666667, ans=0.025 2024-09-16 22:48:09,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.091e+02 2.187e+02 2.363e+02 3.962e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-16 22:48:19,401 INFO [train.py:1198] (1/2) Epoch 29, batch 1950, loss[loss=0.2387, ctc_loss=0.1641, cr_loss=0.3729, over 20648.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.3762, over 4097068.31 frames. ], batch size: 68, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:48:40,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-16 22:48:41,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512558.5, ans=0.1 2024-09-16 22:48:54,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-16 22:48:56,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=512586.8333333333, ans=0.04949747468305833 2024-09-16 22:49:10,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=512615.1666666667, ans=0.0 2024-09-16 22:49:33,588 INFO [train.py:1198] (1/2) Epoch 29, batch 2000, loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3716, over 20987.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1521, cr_loss=0.3764, over 4101217.19 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:50:04,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=512728.5, ans=0.125 2024-09-16 22:50:11,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=512728.5, ans=0.2 2024-09-16 22:50:32,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=512756.8333333333, ans=0.035 2024-09-16 22:50:37,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=512785.1666666667, ans=0.125 2024-09-16 22:50:43,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.165e+02 2.291e+02 2.433e+02 4.983e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 22:50:52,257 INFO [train.py:1198] (1/2) Epoch 29, batch 2050, loss[loss=0.2362, ctc_loss=0.1557, cr_loss=0.4023, over 21029.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3765, over 4109129.54 frames. ], batch size: 62, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:51:00,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=512813.5, ans=0.2 2024-09-16 22:51:15,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=512841.8333333333, ans=0.125 2024-09-16 22:51:24,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-16 22:51:33,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=512870.1666666667, ans=0.035 2024-09-16 22:51:54,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=512926.8333333333, ans=0.0 2024-09-16 22:52:01,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=512926.8333333333, ans=0.125 2024-09-16 22:52:03,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=512926.8333333333, ans=0.02 2024-09-16 22:52:07,576 INFO [train.py:1198] (1/2) Epoch 29, batch 2100, loss[loss=0.2023, ctc_loss=0.1328, cr_loss=0.3471, over 20952.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1523, cr_loss=0.3772, over 4104942.24 frames. ], batch size: 48, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:52:33,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=512983.5, ans=0.125 2024-09-16 22:53:00,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=513040.1666666667, ans=0.0 2024-09-16 22:53:14,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513068.5, ans=0.0 2024-09-16 22:53:18,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.144e+02 2.259e+02 2.385e+02 3.448e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-16 22:53:26,127 INFO [train.py:1198] (1/2) Epoch 29, batch 2150, loss[loss=0.2391, ctc_loss=0.1628, cr_loss=0.3815, over 20768.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1522, cr_loss=0.377, over 4115822.89 frames. ], batch size: 71, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:53:27,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-09-16 22:53:43,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=513125.1666666667, ans=0.125 2024-09-16 22:54:04,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=513153.5, ans=0.125 2024-09-16 22:54:41,708 INFO [train.py:1198] (1/2) Epoch 29, batch 2200, loss[loss=0.1889, ctc_loss=0.1229, cr_loss=0.3299, over 19906.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3761, over 4114118.87 frames. ], batch size: 44, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:54:45,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513238.5, ans=0.1 2024-09-16 22:54:54,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=513238.5, ans=0.125 2024-09-16 22:55:04,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=513266.8333333333, ans=0.125 2024-09-16 22:55:10,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=513295.1666666667, ans=0.125 2024-09-16 22:55:19,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=513295.1666666667, ans=0.125 2024-09-16 22:55:49,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.100e+02 2.240e+02 2.383e+02 5.972e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-16 22:55:53,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-16 22:55:57,136 INFO [train.py:1198] (1/2) Epoch 29, batch 2250, loss[loss=0.1861, ctc_loss=0.1216, cr_loss=0.3227, over 20983.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3758, over 4113642.81 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:56:05,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2024-09-16 22:56:35,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513436.8333333333, ans=0.1 2024-09-16 22:56:38,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=513436.8333333333, ans=0.2 2024-09-16 22:56:50,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=513465.1666666667, ans=0.0 2024-09-16 22:56:54,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=513465.1666666667, ans=0.0 2024-09-16 22:57:14,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513521.8333333333, ans=0.125 2024-09-16 22:57:15,360 INFO [train.py:1198] (1/2) Epoch 29, batch 2300, loss[loss=0.2245, ctc_loss=0.1482, cr_loss=0.3816, over 21064.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3757, over 4110556.24 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:57:15,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513521.8333333333, ans=0.1 2024-09-16 22:57:17,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513521.8333333333, ans=0.1 2024-09-16 22:57:24,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=513521.8333333333, ans=0.125 2024-09-16 22:57:33,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=513550.1666666667, ans=0.125 2024-09-16 22:57:46,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-16 22:57:47,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2024-09-16 22:57:48,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=513578.5, ans=0.0 2024-09-16 22:57:49,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=513578.5, ans=0.95 2024-09-16 22:58:08,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513606.8333333333, ans=0.1 2024-09-16 22:58:22,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=513635.1666666667, ans=0.2 2024-09-16 22:58:23,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.103e+02 2.248e+02 2.400e+02 4.352e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 22:58:30,656 INFO [train.py:1198] (1/2) Epoch 29, batch 2350, loss[loss=0.1999, ctc_loss=0.1296, cr_loss=0.3517, over 20958.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3748, over 4099098.08 frames. ], batch size: 50, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:59:15,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513720.1666666667, ans=0.125 2024-09-16 22:59:43,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=22.5 2024-09-16 22:59:46,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=22.5 2024-09-16 22:59:50,426 INFO [train.py:1198] (1/2) Epoch 29, batch 2400, loss[loss=0.23, ctc_loss=0.1571, cr_loss=0.3648, over 20875.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.15, cr_loss=0.3724, over 4110594.16 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 22:59:58,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=513805.1666666667, ans=0.125 2024-09-16 23:00:04,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=513833.5, ans=0.0 2024-09-16 23:00:10,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=513833.5, ans=0.2 2024-09-16 23:00:12,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-16 23:00:43,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=513890.1666666667, ans=0.125 2024-09-16 23:00:58,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.122e+02 2.266e+02 2.430e+02 2.916e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-16 23:01:00,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=513918.5, ans=0.125 2024-09-16 23:01:06,250 INFO [train.py:1198] (1/2) Epoch 29, batch 2450, loss[loss=0.2231, ctc_loss=0.1501, cr_loss=0.3653, over 20880.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1502, cr_loss=0.3726, over 4116396.51 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:01:07,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-16 23:01:19,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-16 23:01:21,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513975.1666666667, ans=0.1 2024-09-16 23:02:25,179 INFO [train.py:1198] (1/2) Epoch 29, batch 2500, loss[loss=0.237, ctc_loss=0.1577, cr_loss=0.3964, over 20324.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.3736, over 4096439.37 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:03:33,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.136e+02 2.264e+02 2.407e+02 3.369e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 23:03:41,236 INFO [train.py:1198] (1/2) Epoch 29, batch 2550, loss[loss=0.2141, ctc_loss=0.1417, cr_loss=0.3621, over 20719.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3737, over 4103968.63 frames. ], batch size: 71, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:04:15,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=514286.8333333333, ans=0.125 2024-09-16 23:04:25,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=514315.1666666667, ans=0.125 2024-09-16 23:05:00,624 INFO [train.py:1198] (1/2) Epoch 29, batch 2600, loss[loss=0.2323, ctc_loss=0.1571, cr_loss=0.3761, over 20954.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3737, over 4110742.08 frames. ], batch size: 49, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:05:09,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=514371.8333333333, ans=0.0 2024-09-16 23:06:08,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.142e+02 2.284e+02 2.431e+02 3.588e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-16 23:06:16,110 INFO [train.py:1198] (1/2) Epoch 29, batch 2650, loss[loss=0.196, ctc_loss=0.1303, cr_loss=0.3284, over 20947.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3744, over 4114257.38 frames. ], batch size: 48, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:07:08,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=22.5 2024-09-16 23:07:15,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=514626.8333333333, ans=0.035 2024-09-16 23:07:29,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=514626.8333333333, ans=0.125 2024-09-16 23:07:31,843 INFO [train.py:1198] (1/2) Epoch 29, batch 2700, loss[loss=0.2453, ctc_loss=0.1656, cr_loss=0.3986, over 20838.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1515, cr_loss=0.375, over 4108234.10 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:07:47,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=514655.1666666667, ans=0.0 2024-09-16 23:08:09,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-16 23:08:43,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.099e+02 2.261e+02 2.472e+02 3.844e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 23:08:50,579 INFO [train.py:1198] (1/2) Epoch 29, batch 2750, loss[loss=0.2296, ctc_loss=0.1526, cr_loss=0.3848, over 21032.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1523, cr_loss=0.3758, over 4100391.77 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:08:50,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=514796.8333333333, ans=0.125 2024-09-16 23:09:07,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=514825.1666666667, ans=0.125 2024-09-16 23:09:33,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=514853.5, ans=0.2 2024-09-16 23:10:06,420 INFO [train.py:1198] (1/2) Epoch 29, batch 2800, loss[loss=0.2524, ctc_loss=0.1723, cr_loss=0.4006, over 20941.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3757, over 4092358.58 frames. ], batch size: 64, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:10:08,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=514938.5, ans=0.2 2024-09-16 23:10:53,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=515023.5, ans=0.125 2024-09-16 23:11:08,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515051.8333333333, ans=0.125 2024-09-16 23:11:12,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=515051.8333333333, ans=0.04949747468305833 2024-09-16 23:11:16,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-16 23:11:17,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.106e+02 2.230e+02 2.380e+02 2.857e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-16 23:11:22,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2024-09-16 23:11:23,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=515080.1666666667, ans=0.125 2024-09-16 23:11:25,029 INFO [train.py:1198] (1/2) Epoch 29, batch 2850, loss[loss=0.2651, ctc_loss=0.1899, cr_loss=0.3761, over 14173.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3738, over 4093346.90 frames. ], batch size: 150, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:12:00,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=515136.8333333333, ans=0.125 2024-09-16 23:12:06,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515136.8333333333, ans=0.125 2024-09-16 23:12:08,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=515136.8333333333, ans=0.2 2024-09-16 23:12:27,894 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:12:40,965 INFO [train.py:1198] (1/2) Epoch 29, batch 2900, loss[loss=0.2593, ctc_loss=0.1749, cr_loss=0.422, over 19967.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1511, cr_loss=0.3735, over 4086028.46 frames. ], batch size: 80, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:12:51,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515221.8333333333, ans=0.125 2024-09-16 23:12:56,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=515250.1666666667, ans=0.05 2024-09-16 23:12:56,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-16 23:13:06,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=515250.1666666667, ans=0.5 2024-09-16 23:13:07,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2024-09-16 23:13:51,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.125e+02 2.234e+02 2.447e+02 6.615e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-16 23:13:59,372 INFO [train.py:1198] (1/2) Epoch 29, batch 2950, loss[loss=0.2843, ctc_loss=0.2054, cr_loss=0.3946, over 14415.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1517, cr_loss=0.3736, over 4080612.84 frames. ], batch size: 150, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:14:19,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515391.8333333333, ans=0.1 2024-09-16 23:14:52,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=515448.5, ans=0.2 2024-09-16 23:15:15,110 INFO [train.py:1198] (1/2) Epoch 29, batch 3000, loss[loss=0.2319, ctc_loss=0.1538, cr_loss=0.3907, over 20974.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1513, cr_loss=0.3727, over 4078927.42 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:15:15,110 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 23:15:38,424 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.04172, ctc_loss=0.04172, cr_loss=1.228e-14, over 944034.00 frames. 2024-09-16 23:15:38,424 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-16 23:15:55,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515533.5, ans=0.125 2024-09-16 23:16:33,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2024-09-16 23:16:49,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.142e+02 2.276e+02 2.439e+02 4.356e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 23:16:54,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=515618.5, ans=0.125 2024-09-16 23:16:57,214 INFO [train.py:1198] (1/2) Epoch 29, batch 3050, loss[loss=0.2677, ctc_loss=0.1797, cr_loss=0.4402, over 20680.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1517, cr_loss=0.3738, over 4081477.19 frames. ], batch size: 71, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:16:59,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=515646.8333333333, ans=10.0 2024-09-16 23:17:16,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2024-09-16 23:17:32,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=515703.5, ans=0.0 2024-09-16 23:17:41,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=515731.8333333333, ans=0.0 2024-09-16 23:17:43,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=515731.8333333333, ans=0.2 2024-09-16 23:18:07,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=515760.1666666667, ans=0.0 2024-09-16 23:18:12,672 INFO [train.py:1198] (1/2) Epoch 29, batch 3100, loss[loss=0.1886, ctc_loss=0.1234, cr_loss=0.3262, over 19804.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1524, cr_loss=0.3749, over 4075766.51 frames. ], batch size: 44, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:18:23,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=515788.5, ans=0.125 2024-09-16 23:18:56,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=515845.1666666667, ans=0.0 2024-09-16 23:19:23,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.105e+02 2.305e+02 2.411e+02 4.093e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-16 23:19:27,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=515901.8333333333, ans=0.0 2024-09-16 23:19:31,336 INFO [train.py:1198] (1/2) Epoch 29, batch 3150, loss[loss=0.2139, ctc_loss=0.1432, cr_loss=0.3536, over 20906.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3744, over 4091932.06 frames. ], batch size: 54, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:19:42,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515930.1666666667, ans=0.125 2024-09-16 23:20:21,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=516015.1666666667, ans=0.125 2024-09-16 23:20:24,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=516015.1666666667, ans=0.2 2024-09-16 23:20:33,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516043.5, ans=0.1 2024-09-16 23:20:35,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516043.5, ans=0.1 2024-09-16 23:20:46,901 INFO [train.py:1198] (1/2) Epoch 29, batch 3200, loss[loss=0.2468, ctc_loss=0.1694, cr_loss=0.3873, over 17990.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1518, cr_loss=0.3741, over 4084242.55 frames. ], batch size: 108, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:21:44,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=516156.8333333333, ans=0.125 2024-09-16 23:21:55,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.106e+02 2.233e+02 2.457e+02 3.336e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 23:22:00,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=516185.1666666667, ans=0.0 2024-09-16 23:22:05,457 INFO [train.py:1198] (1/2) Epoch 29, batch 3250, loss[loss=0.1952, ctc_loss=0.13, cr_loss=0.3259, over 20922.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1514, cr_loss=0.3736, over 4092521.22 frames. ], batch size: 50, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:22:27,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516241.8333333333, ans=0.1 2024-09-16 23:22:36,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-16 23:23:12,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516326.8333333333, ans=0.1 2024-09-16 23:23:21,197 INFO [train.py:1198] (1/2) Epoch 29, batch 3300, loss[loss=0.2295, ctc_loss=0.1534, cr_loss=0.3808, over 20890.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3741, over 4100929.77 frames. ], batch size: 54, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:23:28,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516355.1666666667, ans=0.1 2024-09-16 23:23:29,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=516355.1666666667, ans=0.125 2024-09-16 23:23:33,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516355.1666666667, ans=0.125 2024-09-16 23:23:38,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516383.5, ans=0.125 2024-09-16 23:23:41,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=516383.5, ans=0.125 2024-09-16 23:23:59,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=516411.8333333333, ans=0.125 2024-09-16 23:24:31,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.291e+02 2.446e+02 4.467e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 23:24:33,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=516468.5, ans=0.0 2024-09-16 23:24:35,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=516468.5, ans=0.125 2024-09-16 23:24:38,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2024-09-16 23:24:39,312 INFO [train.py:1198] (1/2) Epoch 29, batch 3350, loss[loss=0.1965, ctc_loss=0.1318, cr_loss=0.3236, over 20948.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.3739, over 4089001.65 frames. ], batch size: 48, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:25:32,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-16 23:25:50,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-16 23:25:54,418 INFO [train.py:1198] (1/2) Epoch 29, batch 3400, loss[loss=0.2007, ctc_loss=0.1326, cr_loss=0.3407, over 20935.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.3745, over 4079552.18 frames. ], batch size: 60, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:26:03,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=516638.5, ans=0.2 2024-09-16 23:26:16,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=516666.8333333333, ans=0.09899494936611666 2024-09-16 23:26:56,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516751.8333333333, ans=0.1 2024-09-16 23:27:02,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.118e+02 2.242e+02 2.428e+02 3.600e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 23:27:10,396 INFO [train.py:1198] (1/2) Epoch 29, batch 3450, loss[loss=0.1992, ctc_loss=0.1299, cr_loss=0.3462, over 20930.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3744, over 4079297.57 frames. ], batch size: 48, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:27:38,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516808.5, ans=0.1 2024-09-16 23:27:42,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=516836.8333333333, ans=0.125 2024-09-16 23:27:45,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=516836.8333333333, ans=0.0 2024-09-16 23:28:02,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=516865.1666666667, ans=0.5 2024-09-16 23:28:05,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=516865.1666666667, ans=0.125 2024-09-16 23:28:05,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-16 23:28:28,958 INFO [train.py:1198] (1/2) Epoch 29, batch 3500, loss[loss=0.2443, ctc_loss=0.1615, cr_loss=0.414, over 20289.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3756, over 4086120.11 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:28:33,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=516921.8333333333, ans=0.125 2024-09-16 23:28:43,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=516950.1666666667, ans=0.0 2024-09-16 23:28:47,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-16 23:29:13,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 23:29:37,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.182e+02 2.322e+02 2.506e+02 4.001e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-16 23:29:44,942 INFO [train.py:1198] (1/2) Epoch 29, batch 3550, loss[loss=0.2461, ctc_loss=0.1651, cr_loss=0.405, over 21036.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3754, over 4092785.27 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:30:06,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=517091.8333333333, ans=0.02 2024-09-16 23:30:25,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517120.1666666667, ans=0.125 2024-09-16 23:30:30,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=517120.1666666667, ans=0.125 2024-09-16 23:30:47,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=12.0 2024-09-16 23:31:03,712 INFO [train.py:1198] (1/2) Epoch 29, batch 3600, loss[loss=0.2752, ctc_loss=0.1901, cr_loss=0.4259, over 20637.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3743, over 4098405.04 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:31:06,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517205.1666666667, ans=0.1 2024-09-16 23:31:20,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517233.5, ans=0.0 2024-09-16 23:31:24,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-16 23:31:39,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-16 23:31:52,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=517290.1666666667, ans=0.0 2024-09-16 23:32:08,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=517318.5, ans=0.0 2024-09-16 23:32:11,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.134e+02 2.235e+02 2.407e+02 2.856e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-16 23:32:16,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=517318.5, ans=0.0 2024-09-16 23:32:18,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517346.8333333333, ans=0.1 2024-09-16 23:32:19,280 INFO [train.py:1198] (1/2) Epoch 29, batch 3650, loss[loss=0.2473, ctc_loss=0.167, cr_loss=0.4013, over 20888.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1516, cr_loss=0.3751, over 4098168.51 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:32:26,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-16 23:33:37,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=517488.5, ans=15.0 2024-09-16 23:33:37,826 INFO [train.py:1198] (1/2) Epoch 29, batch 3700, loss[loss=0.2571, ctc_loss=0.1753, cr_loss=0.4092, over 18319.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3754, over 4089170.46 frames. ], batch size: 108, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:34:11,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=517545.1666666667, ans=0.0 2024-09-16 23:34:17,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-09-16 23:34:21,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=517573.5, ans=0.125 2024-09-16 23:34:39,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517601.8333333333, ans=0.1 2024-09-16 23:34:45,235 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.137e+02 2.276e+02 2.424e+02 2.892e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 23:34:52,996 INFO [train.py:1198] (1/2) Epoch 29, batch 3750, loss[loss=0.1922, ctc_loss=0.125, cr_loss=0.3358, over 20953.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1515, cr_loss=0.3749, over 4092993.38 frames. ], batch size: 48, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:35:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=517630.1666666667, ans=0.5 2024-09-16 23:35:26,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517686.8333333333, ans=0.1 2024-09-16 23:35:40,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517715.1666666667, ans=0.125 2024-09-16 23:36:12,601 INFO [train.py:1198] (1/2) Epoch 29, batch 3800, loss[loss=0.2421, ctc_loss=0.1619, cr_loss=0.4012, over 20669.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3755, over 4076295.66 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:36:41,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=517828.5, ans=0.0 2024-09-16 23:36:59,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=517856.8333333333, ans=0.2 2024-09-16 23:37:19,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=517885.1666666667, ans=0.0 2024-09-16 23:37:19,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=517885.1666666667, ans=0.0 2024-09-16 23:37:20,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.153e+02 2.283e+02 2.431e+02 3.165e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-16 23:37:25,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=517885.1666666667, ans=0.2 2024-09-16 23:37:28,103 INFO [train.py:1198] (1/2) Epoch 29, batch 3850, loss[loss=0.2186, ctc_loss=0.1479, cr_loss=0.354, over 21065.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3763, over 4079051.17 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:37:46,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=517941.8333333333, ans=0.0 2024-09-16 23:38:04,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=517970.1666666667, ans=0.125 2024-09-16 23:38:07,834 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:38:33,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=518026.8333333333, ans=0.0 2024-09-16 23:38:33,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=518026.8333333333, ans=0.2 2024-09-16 23:38:33,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-16 23:38:43,640 INFO [train.py:1198] (1/2) Epoch 29, batch 3900, loss[loss=0.2722, ctc_loss=0.1845, cr_loss=0.4386, over 20968.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.376, over 4078793.46 frames. ], batch size: 67, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:39:12,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=518083.5, ans=0.125 2024-09-16 23:39:23,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=518111.8333333333, ans=0.2 2024-09-16 23:39:27,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=518111.8333333333, ans=0.0 2024-09-16 23:39:41,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=518140.1666666667, ans=0.95 2024-09-16 23:39:44,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=518140.1666666667, ans=0.0 2024-09-16 23:39:54,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.123e+02 2.282e+02 2.483e+02 3.210e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 23:40:02,118 INFO [train.py:1198] (1/2) Epoch 29, batch 3950, loss[loss=0.2499, ctc_loss=0.1715, cr_loss=0.392, over 20961.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1521, cr_loss=0.3751, over 4079074.96 frames. ], batch size: 64, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:40:18,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=518225.1666666667, ans=0.125 2024-09-16 23:40:23,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=518225.1666666667, ans=0.125 2024-09-16 23:40:45,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=518281.8333333333, ans=0.2 2024-09-16 23:41:08,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=518310.1666666667, ans=0.0 2024-09-16 23:41:16,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=518338.5, ans=0.125 2024-09-16 23:41:17,429 INFO [train.py:1198] (1/2) Epoch 29, batch 4000, loss[loss=0.2187, ctc_loss=0.1443, cr_loss=0.372, over 20992.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3744, over 4094095.18 frames. ], batch size: 52, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:41:17,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=518338.5, ans=0.0 2024-09-16 23:41:33,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518338.5, ans=0.1 2024-09-16 23:41:53,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=518395.1666666667, ans=0.125 2024-09-16 23:42:10,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=518423.5, ans=0.125 2024-09-16 23:42:24,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=518451.8333333333, ans=0.2 2024-09-16 23:42:28,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.114e+02 2.276e+02 2.427e+02 3.959e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-16 23:42:36,504 INFO [train.py:1198] (1/2) Epoch 29, batch 4050, loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3739, over 21077.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3739, over 4094519.02 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:43:05,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=518536.8333333333, ans=0.025 2024-09-16 23:43:37,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=518593.5, ans=0.2 2024-09-16 23:43:46,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=518593.5, ans=0.0 2024-09-16 23:43:50,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=518621.8333333333, ans=0.125 2024-09-16 23:43:51,989 INFO [train.py:1198] (1/2) Epoch 29, batch 4100, loss[loss=0.1898, ctc_loss=0.1255, cr_loss=0.3215, over 20361.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1511, cr_loss=0.3737, over 4096531.32 frames. ], batch size: 45, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:43:53,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=518621.8333333333, ans=0.125 2024-09-16 23:43:58,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=518621.8333333333, ans=0.125 2024-09-16 23:43:58,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=518621.8333333333, ans=0.2 2024-09-16 23:44:27,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-16 23:44:32,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=518678.5, ans=0.125 2024-09-16 23:44:45,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=518706.8333333333, ans=0.125 2024-09-16 23:45:03,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.153e+02 2.279e+02 2.457e+02 3.517e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-16 23:45:11,027 INFO [train.py:1198] (1/2) Epoch 29, batch 4150, loss[loss=0.2345, ctc_loss=0.1583, cr_loss=0.381, over 20684.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3744, over 4095024.43 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 64.0 2024-09-16 23:45:14,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=518763.5, ans=0.0 2024-09-16 23:45:26,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=518791.8333333333, ans=0.125 2024-09-16 23:45:36,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518791.8333333333, ans=0.1 2024-09-16 23:45:49,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=518820.1666666667, ans=0.2 2024-09-16 23:46:07,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=518848.5, ans=0.0 2024-09-16 23:46:14,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=518876.8333333333, ans=0.0 2024-09-16 23:46:26,147 INFO [train.py:1198] (1/2) Epoch 29, batch 4200, loss[loss=0.2417, ctc_loss=0.1643, cr_loss=0.3869, over 19468.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1517, cr_loss=0.3737, over 4070816.30 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 64.0 2024-09-16 23:47:18,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=518990.1666666667, ans=0.125 2024-09-16 23:47:32,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=519018.5, ans=0.125 2024-09-16 23:47:32,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=519018.5, ans=0.125 2024-09-16 23:47:35,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=519018.5, ans=0.0 2024-09-16 23:47:35,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 23:47:38,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.098e+02 2.247e+02 2.454e+02 3.348e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 23:47:41,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=519018.5, ans=0.025 2024-09-16 23:47:44,086 INFO [train.py:1198] (1/2) Epoch 29, batch 4250, loss[loss=0.1988, ctc_loss=0.1299, cr_loss=0.3446, over 21006.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1519, cr_loss=0.3745, over 4076474.39 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:48:15,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-09-16 23:48:16,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519103.5, ans=0.125 2024-09-16 23:48:33,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519131.8333333333, ans=0.125 2024-09-16 23:48:52,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519160.1666666667, ans=0.1 2024-09-16 23:48:58,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-16 23:49:00,247 INFO [train.py:1198] (1/2) Epoch 29, batch 4300, loss[loss=0.2038, ctc_loss=0.1363, cr_loss=0.3372, over 21073.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.374, over 4095404.94 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:49:00,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=519188.5, ans=0.125 2024-09-16 23:49:47,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-09-16 23:50:04,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=519301.8333333333, ans=0.125 2024-09-16 23:50:09,931 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.142e+02 2.329e+02 2.478e+02 4.020e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-16 23:50:10,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=519301.8333333333, ans=0.2 2024-09-16 23:50:14,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=519330.1666666667, ans=0.0 2024-09-16 23:50:16,037 INFO [train.py:1198] (1/2) Epoch 29, batch 4350, loss[loss=0.2002, ctc_loss=0.1336, cr_loss=0.3333, over 21052.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3733, over 4090725.78 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:50:37,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519358.5, ans=0.1 2024-09-16 23:51:09,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-16 23:51:34,122 INFO [train.py:1198] (1/2) Epoch 29, batch 4400, loss[loss=0.2322, ctc_loss=0.1573, cr_loss=0.3744, over 20784.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3739, over 4104794.46 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:52:13,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-16 23:52:13,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.92 vs. limit=5.0 2024-09-16 23:52:43,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.151e+02 2.295e+02 2.423e+02 3.130e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-16 23:52:49,909 INFO [train.py:1198] (1/2) Epoch 29, batch 4450, loss[loss=0.1851, ctc_loss=0.1191, cr_loss=0.3299, over 20971.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1502, cr_loss=0.3726, over 4103253.50 frames. ], batch size: 49, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:54:08,503 INFO [train.py:1198] (1/2) Epoch 29, batch 4500, loss[loss=0.1922, ctc_loss=0.1262, cr_loss=0.3298, over 20370.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1502, cr_loss=0.373, over 4106624.83 frames. ], batch size: 45, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:55:01,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519840.1666666667, ans=0.1 2024-09-16 23:55:04,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519840.1666666667, ans=0.1 2024-09-16 23:55:10,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=519868.5, ans=0.125 2024-09-16 23:55:13,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=519868.5, ans=0.2 2024-09-16 23:55:17,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.140e+02 2.237e+02 2.446e+02 4.193e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 23:55:23,797 INFO [train.py:1198] (1/2) Epoch 29, batch 4550, loss[loss=0.1826, ctc_loss=0.1207, cr_loss=0.3094, over 20894.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1501, cr_loss=0.3724, over 4103886.87 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:55:36,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=519896.8333333333, ans=0.0 2024-09-16 23:55:39,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=519925.1666666667, ans=0.2 2024-09-16 23:55:40,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=519925.1666666667, ans=0.0 2024-09-16 23:55:45,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=519925.1666666667, ans=0.0 2024-09-16 23:55:48,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=519925.1666666667, ans=10.0 2024-09-16 23:56:13,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-09-16 23:56:14,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=519981.8333333333, ans=0.5 2024-09-16 23:56:19,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=519981.8333333333, ans=0.0 2024-09-16 23:56:32,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=520010.1666666667, ans=0.0 2024-09-16 23:56:34,259 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:56:38,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=520010.1666666667, ans=15.0 2024-09-16 23:56:39,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-16 23:56:42,767 INFO [train.py:1198] (1/2) Epoch 29, batch 4600, loss[loss=0.1942, ctc_loss=0.1265, cr_loss=0.3386, over 21054.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1498, cr_loss=0.3714, over 4109465.79 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:56:46,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-16 23:56:52,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=520038.5, ans=0.07 2024-09-16 23:57:27,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520123.5, ans=0.125 2024-09-16 23:57:52,114 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.114e+02 2.234e+02 2.448e+02 7.926e+02, threshold=4.469e+02, percent-clipped=3.0 2024-09-16 23:57:58,255 INFO [train.py:1198] (1/2) Epoch 29, batch 4650, loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3826, over 21019.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1498, cr_loss=0.3709, over 4096698.84 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:58:17,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-16 23:59:02,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=520293.5, ans=0.0 2024-09-16 23:59:17,123 INFO [train.py:1198] (1/2) Epoch 29, batch 4700, loss[loss=0.2322, ctc_loss=0.155, cr_loss=0.3863, over 20941.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.15, cr_loss=0.3722, over 4099002.70 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:59:29,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-09-16 23:59:31,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=520350.1666666667, ans=0.0 2024-09-16 23:59:33,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=520350.1666666667, ans=0.125 2024-09-17 00:00:02,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=520406.8333333333, ans=0.05 2024-09-17 00:00:26,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.182e+02 2.312e+02 2.519e+02 3.478e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 00:00:32,317 INFO [train.py:1198] (1/2) Epoch 29, batch 4750, loss[loss=0.2645, ctc_loss=0.1776, cr_loss=0.4341, over 20019.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1496, cr_loss=0.3713, over 4089619.36 frames. ], batch size: 80, lr: 2.85e-03, grad_scale: 32.0 2024-09-17 00:00:33,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=12.0 2024-09-17 00:01:25,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520548.5, ans=0.1 2024-09-17 00:01:29,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520548.5, ans=0.125 2024-09-17 00:01:43,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=520576.8333333333, ans=0.125 2024-09-17 00:01:47,729 INFO [train.py:1198] (1/2) Epoch 29, batch 4800, loss[loss=0.255, ctc_loss=0.174, cr_loss=0.405, over 19448.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1493, cr_loss=0.3713, over 4093861.55 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:02:12,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520633.5, ans=0.125 2024-09-17 00:02:21,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=520661.8333333333, ans=0.0 2024-09-17 00:03:00,847 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.174e+02 2.313e+02 2.580e+02 4.357e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-17 00:03:07,006 INFO [train.py:1198] (1/2) Epoch 29, batch 4850, loss[loss=0.2125, ctc_loss=0.1421, cr_loss=0.3519, over 19900.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1508, cr_loss=0.3739, over 4093675.43 frames. ], batch size: 44, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:03:11,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=520746.8333333333, ans=0.0 2024-09-17 00:03:22,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=520775.1666666667, ans=0.05 2024-09-17 00:04:00,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520831.8333333333, ans=0.1 2024-09-17 00:04:02,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=520831.8333333333, ans=0.0 2024-09-17 00:04:13,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=520860.1666666667, ans=0.0 2024-09-17 00:04:26,494 INFO [train.py:1198] (1/2) Epoch 29, batch 4900, loss[loss=0.2489, ctc_loss=0.1687, cr_loss=0.4006, over 20538.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1502, cr_loss=0.3725, over 4091403.58 frames. ], batch size: 75, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:04:30,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-17 00:04:31,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=520888.5, ans=0.125 2024-09-17 00:04:43,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-17 00:04:58,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=520945.1666666667, ans=0.125 2024-09-17 00:05:35,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.144e+02 2.276e+02 2.506e+02 4.473e+02, threshold=4.551e+02, percent-clipped=0.0 2024-09-17 00:05:40,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=521030.1666666667, ans=0.125 2024-09-17 00:05:41,342 INFO [train.py:1198] (1/2) Epoch 29, batch 4950, loss[loss=0.2351, ctc_loss=0.1574, cr_loss=0.3881, over 21071.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1516, cr_loss=0.3748, over 4083237.19 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:05:44,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521030.1666666667, ans=0.125 2024-09-17 00:06:03,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521058.5, ans=0.125 2024-09-17 00:06:15,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.55 vs. limit=10.0 2024-09-17 00:06:50,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=521143.5, ans=0.0 2024-09-17 00:06:56,289 INFO [train.py:1198] (1/2) Epoch 29, batch 5000, loss[loss=0.2411, ctc_loss=0.1639, cr_loss=0.3861, over 20934.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1518, cr_loss=0.3743, over 4076624.91 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:07:05,646 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:07:06,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521171.8333333333, ans=0.125 2024-09-17 00:08:04,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.104e+02 2.201e+02 2.382e+02 2.853e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-17 00:08:10,560 INFO [train.py:1198] (1/2) Epoch 29, batch 5050, loss[loss=0.2233, ctc_loss=0.1491, cr_loss=0.3712, over 21054.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1519, cr_loss=0.3741, over 4066468.82 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:08:25,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=521341.8333333333, ans=0.025 2024-09-17 00:08:43,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=521370.1666666667, ans=0.125 2024-09-17 00:09:02,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=521398.5, ans=0.0 2024-09-17 00:09:08,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=521398.5, ans=0.07 2024-09-17 00:09:15,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521426.8333333333, ans=0.125 2024-09-17 00:09:25,727 INFO [train.py:1198] (1/2) Epoch 29, batch 5100, loss[loss=0.2387, ctc_loss=0.1591, cr_loss=0.3982, over 20882.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1523, cr_loss=0.3747, over 4066265.72 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:09:30,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=521455.1666666667, ans=0.125 2024-09-17 00:09:36,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521455.1666666667, ans=0.125 2024-09-17 00:09:50,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=521483.5, ans=0.125 2024-09-17 00:10:05,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=521511.8333333333, ans=22.5 2024-09-17 00:10:31,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=521568.5, ans=0.2 2024-09-17 00:10:37,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.158e+02 2.245e+02 2.459e+02 4.128e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-17 00:10:43,240 INFO [train.py:1198] (1/2) Epoch 29, batch 5150, loss[loss=0.222, ctc_loss=0.1463, cr_loss=0.3784, over 20991.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3737, over 4076074.52 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:10:46,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=521596.8333333333, ans=0.2 2024-09-17 00:10:47,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=521596.8333333333, ans=0.125 2024-09-17 00:11:03,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2024-09-17 00:11:38,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=521681.8333333333, ans=0.95 2024-09-17 00:11:43,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521710.1666666667, ans=0.1 2024-09-17 00:11:44,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=521710.1666666667, ans=0.125 2024-09-17 00:11:53,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521710.1666666667, ans=0.125 2024-09-17 00:11:57,463 INFO [train.py:1198] (1/2) Epoch 29, batch 5200, loss[loss=0.2148, ctc_loss=0.1438, cr_loss=0.355, over 20986.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3744, over 4077149.56 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:12:02,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=521738.5, ans=0.0 2024-09-17 00:12:15,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=521766.8333333333, ans=0.0 2024-09-17 00:12:27,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=521795.1666666667, ans=0.04949747468305833 2024-09-17 00:13:05,922 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.155e+02 2.285e+02 2.429e+02 3.077e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 00:13:11,961 INFO [train.py:1198] (1/2) Epoch 29, batch 5250, loss[loss=0.2347, ctc_loss=0.1584, cr_loss=0.3815, over 20833.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3769, over 4060834.28 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:13:25,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=521908.5, ans=0.2 2024-09-17 00:13:45,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=521936.8333333333, ans=10.0 2024-09-17 00:13:59,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521965.1666666667, ans=0.1 2024-09-17 00:14:00,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=22.5 2024-09-17 00:14:20,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=521993.5, ans=0.125 2024-09-17 00:14:20,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-17 00:14:25,161 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:14:28,967 INFO [train.py:1198] (1/2) Epoch 29, batch 5300, loss[loss=0.1953, ctc_loss=0.1277, cr_loss=0.338, over 19857.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1525, cr_loss=0.3767, over 4072550.97 frames. ], batch size: 44, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:14:35,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522021.8333333333, ans=0.1 2024-09-17 00:15:37,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.107e+02 2.241e+02 2.435e+02 4.969e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-17 00:15:43,115 INFO [train.py:1198] (1/2) Epoch 29, batch 5350, loss[loss=0.254, ctc_loss=0.172, cr_loss=0.4104, over 20089.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3755, over 4076354.88 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:15:53,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2024-09-17 00:15:59,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-17 00:16:06,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522191.8333333333, ans=0.1 2024-09-17 00:16:07,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=522191.8333333333, ans=0.125 2024-09-17 00:16:42,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=522276.8333333333, ans=0.125 2024-09-17 00:16:58,064 INFO [train.py:1198] (1/2) Epoch 29, batch 5400, loss[loss=0.2319, ctc_loss=0.1559, cr_loss=0.3801, over 20854.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3748, over 4093187.14 frames. ], batch size: 65, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:17:22,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-17 00:17:23,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=522333.5, ans=0.0 2024-09-17 00:17:23,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=522333.5, ans=0.125 2024-09-17 00:17:25,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-09-17 00:17:34,655 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:17:47,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522390.1666666667, ans=0.125 2024-09-17 00:18:04,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=522418.5, ans=0.2 2024-09-17 00:18:08,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.138e+02 2.259e+02 2.409e+02 3.323e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 00:18:13,179 INFO [train.py:1198] (1/2) Epoch 29, batch 5450, loss[loss=0.1776, ctc_loss=0.1146, cr_loss=0.3153, over 20967.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.3743, over 4094838.11 frames. ], batch size: 49, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:18:28,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522475.1666666667, ans=0.1 2024-09-17 00:18:31,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=522475.1666666667, ans=0.125 2024-09-17 00:18:53,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=522503.5, ans=0.0 2024-09-17 00:19:00,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=522531.8333333333, ans=0.125 2024-09-17 00:19:05,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522531.8333333333, ans=0.125 2024-09-17 00:19:30,231 INFO [train.py:1198] (1/2) Epoch 29, batch 5500, loss[loss=0.2001, ctc_loss=0.1314, cr_loss=0.3434, over 20974.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1508, cr_loss=0.3742, over 4095580.75 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:19:32,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-17 00:19:58,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=522645.1666666667, ans=0.125 2024-09-17 00:20:07,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=522645.1666666667, ans=0.0 2024-09-17 00:20:16,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=522673.5, ans=0.125 2024-09-17 00:20:18,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522673.5, ans=0.125 2024-09-17 00:20:37,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=522701.8333333333, ans=0.0 2024-09-17 00:20:40,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.163e+02 2.287e+02 2.452e+02 3.728e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 00:20:44,591 INFO [train.py:1198] (1/2) Epoch 29, batch 5550, loss[loss=0.2396, ctc_loss=0.1601, cr_loss=0.3974, over 20978.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3751, over 4089833.54 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:20:50,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=522730.1666666667, ans=0.125 2024-09-17 00:20:53,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522730.1666666667, ans=0.1 2024-09-17 00:21:04,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=522758.5, ans=0.0 2024-09-17 00:21:11,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=522758.5, ans=0.125 2024-09-17 00:21:57,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=522871.8333333333, ans=0.025 2024-09-17 00:21:58,886 INFO [train.py:1198] (1/2) Epoch 29, batch 5600, loss[loss=0.2649, ctc_loss=0.1861, cr_loss=0.394, over 14489.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3756, over 4078122.80 frames. ], batch size: 149, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:21:59,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=522871.8333333333, ans=0.025 2024-09-17 00:22:11,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-17 00:22:27,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=522928.5, ans=0.125 2024-09-17 00:22:36,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-17 00:22:42,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=522956.8333333333, ans=0.125 2024-09-17 00:23:08,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=522985.1666666667, ans=0.0 2024-09-17 00:23:11,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.139e+02 2.265e+02 2.452e+02 4.630e+02, threshold=4.530e+02, percent-clipped=1.0 2024-09-17 00:23:15,872 INFO [train.py:1198] (1/2) Epoch 29, batch 5650, loss[loss=0.2626, ctc_loss=0.1762, cr_loss=0.4319, over 20962.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.3742, over 4087880.08 frames. ], batch size: 64, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:23:58,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=523070.1666666667, ans=0.025 2024-09-17 00:24:16,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=523126.8333333333, ans=0.0 2024-09-17 00:24:30,979 INFO [train.py:1198] (1/2) Epoch 29, batch 5700, loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3725, over 20698.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3733, over 4098255.80 frames. ], batch size: 71, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:25:35,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=523268.5, ans=0.2 2024-09-17 00:25:37,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523268.5, ans=0.125 2024-09-17 00:25:40,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.159e+02 2.287e+02 2.512e+02 3.342e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 00:25:42,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=523268.5, ans=0.0 2024-09-17 00:25:44,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=523296.8333333333, ans=0.125 2024-09-17 00:25:45,565 INFO [train.py:1198] (1/2) Epoch 29, batch 5750, loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3719, over 20974.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3732, over 4111354.15 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:26:01,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523325.1666666667, ans=0.125 2024-09-17 00:26:04,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=523325.1666666667, ans=0.0 2024-09-17 00:26:30,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=523381.8333333333, ans=0.0 2024-09-17 00:26:59,702 INFO [train.py:1198] (1/2) Epoch 29, batch 5800, loss[loss=0.2314, ctc_loss=0.1587, cr_loss=0.3636, over 19515.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3734, over 4104158.61 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:27:36,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-17 00:27:54,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=523523.5, ans=0.125 2024-09-17 00:27:59,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=523551.8333333333, ans=0.025 2024-09-17 00:28:02,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523551.8333333333, ans=0.125 2024-09-17 00:28:06,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=523551.8333333333, ans=0.125 2024-09-17 00:28:10,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.135e+02 2.265e+02 2.409e+02 4.759e+02, threshold=4.529e+02, percent-clipped=1.0 2024-09-17 00:28:15,336 INFO [train.py:1198] (1/2) Epoch 29, batch 5850, loss[loss=0.2518, ctc_loss=0.1704, cr_loss=0.4069, over 21011.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3743, over 4103866.80 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:28:17,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.37 vs. limit=22.5 2024-09-17 00:28:21,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=523580.1666666667, ans=0.0 2024-09-17 00:28:30,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=523608.5, ans=0.125 2024-09-17 00:28:35,548 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:28:51,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=523636.8333333333, ans=0.1 2024-09-17 00:29:29,799 INFO [train.py:1198] (1/2) Epoch 29, batch 5900, loss[loss=0.2319, ctc_loss=0.1529, cr_loss=0.395, over 20867.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1508, cr_loss=0.3756, over 4112452.68 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:30:15,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=8.0 2024-09-17 00:30:19,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=523806.8333333333, ans=0.125 2024-09-17 00:30:21,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=22.5 2024-09-17 00:30:36,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-09-17 00:30:39,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.133e+02 2.255e+02 2.458e+02 4.200e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 00:30:44,214 INFO [train.py:1198] (1/2) Epoch 29, batch 5950, loss[loss=0.2375, ctc_loss=0.1563, cr_loss=0.4062, over 20875.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1513, cr_loss=0.376, over 4103129.98 frames. ], batch size: 57, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:31:54,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-09-17 00:32:01,153 INFO [train.py:1198] (1/2) Epoch 29, batch 6000, loss[loss=0.1915, ctc_loss=0.1239, cr_loss=0.338, over 21005.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1519, cr_loss=0.3769, over 4111758.91 frames. ], batch size: 52, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:32:01,154 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 00:32:22,680 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.04135, ctc_loss=0.04135, cr_loss=1.22e-14, over 944034.00 frames. 2024-09-17 00:32:22,681 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 00:32:23,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524005.1666666667, ans=0.1 2024-09-17 00:32:26,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-09-17 00:32:26,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-09-17 00:32:37,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524033.5, ans=0.1 2024-09-17 00:32:54,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524061.8333333333, ans=0.1 2024-09-17 00:32:58,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=524061.8333333333, ans=0.2 2024-09-17 00:33:28,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=524118.5, ans=0.125 2024-09-17 00:33:32,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.225e+02 2.385e+02 2.574e+02 4.481e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-17 00:33:32,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=524118.5, ans=0.0 2024-09-17 00:33:36,796 INFO [train.py:1198] (1/2) Epoch 29, batch 6050, loss[loss=0.2345, ctc_loss=0.1555, cr_loss=0.3953, over 20951.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3762, over 4115989.24 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:33:59,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524175.1666666667, ans=0.125 2024-09-17 00:34:06,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=524203.5, ans=0.125 2024-09-17 00:34:07,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=524203.5, ans=0.0 2024-09-17 00:34:15,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-09-17 00:34:51,709 INFO [train.py:1198] (1/2) Epoch 29, batch 6100, loss[loss=0.2242, ctc_loss=0.1513, cr_loss=0.3646, over 21026.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3759, over 4109995.29 frames. ], batch size: 61, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:34:55,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.29 vs. limit=6.0 2024-09-17 00:35:51,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524401.8333333334, ans=0.0 2024-09-17 00:36:01,668 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.182e+02 2.347e+02 2.516e+02 3.483e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-17 00:36:06,253 INFO [train.py:1198] (1/2) Epoch 29, batch 6150, loss[loss=0.2217, ctc_loss=0.1503, cr_loss=0.357, over 21015.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3753, over 4099147.65 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:36:58,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=524515.1666666666, ans=0.125 2024-09-17 00:37:20,241 INFO [train.py:1198] (1/2) Epoch 29, batch 6200, loss[loss=0.2301, ctc_loss=0.1527, cr_loss=0.387, over 21021.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1524, cr_loss=0.3761, over 4081816.84 frames. ], batch size: 63, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:37:21,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=524571.8333333334, ans=0.125 2024-09-17 00:37:30,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=524571.8333333334, ans=0.0 2024-09-17 00:38:05,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=524656.8333333334, ans=0.0 2024-09-17 00:38:16,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=524656.8333333334, ans=0.2 2024-09-17 00:38:17,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=524685.1666666666, ans=0.125 2024-09-17 00:38:28,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.176e+02 2.354e+02 2.588e+02 3.496e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-17 00:38:28,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=524685.1666666666, ans=0.125 2024-09-17 00:38:32,840 INFO [train.py:1198] (1/2) Epoch 29, batch 6250, loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3657, over 20970.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1535, cr_loss=0.3763, over 4031786.17 frames. ], batch size: 52, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:38:42,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=524713.5, ans=0.0 2024-09-17 00:39:46,105 INFO [train.py:1198] (1/2) Epoch 29, batch 6300, loss[loss=0.2521, ctc_loss=0.1707, cr_loss=0.4072, over 18278.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1567, cr_loss=0.3804, over 3983890.30 frames. ], batch size: 108, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:39:49,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-17 00:40:23,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=524911.8333333334, ans=0.2 2024-09-17 00:40:35,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-09-17 00:40:38,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524940.1666666666, ans=0.1 2024-09-17 00:40:45,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=524968.5, ans=0.0 2024-09-17 00:40:52,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.211e+02 2.422e+02 2.609e+02 4.840e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-17 00:40:57,102 INFO [train.py:1198] (1/2) Epoch 29, batch 6350, loss[loss=0.2778, ctc_loss=0.1982, cr_loss=0.398, over 14646.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1596, cr_loss=0.3816, over 3855037.25 frames. ], batch size: 149, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:41:48,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-09-17 00:42:47,318 INFO [train.py:1198] (1/2) Epoch 30, batch 0, loss[loss=0.2605, ctc_loss=0.1755, cr_loss=0.4248, over 19378.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1755, cr_loss=0.4248, over 19378.00 frames. ], batch size: 90, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:42:47,319 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 00:42:59,780 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7192, 3.8148, 3.6984, 3.7013], device='cuda:1') 2024-09-17 00:43:05,651 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.04125, ctc_loss=0.04125, cr_loss=1.237e-14, over 944034.00 frames. 2024-09-17 00:43:05,652 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 00:43:33,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-17 00:43:33,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=12.0 2024-09-17 00:43:59,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-17 00:44:21,518 INFO [train.py:1198] (1/2) Epoch 30, batch 50, loss[loss=0.2025, ctc_loss=0.1336, cr_loss=0.3446, over 20771.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1492, cr_loss=0.3711, over 929592.04 frames. ], batch size: 53, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:44:21,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=525254.6666666666, ans=0.125 2024-09-17 00:44:30,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.150e+02 2.396e+02 2.756e+02 3.152e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-17 00:44:46,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=22.5 2024-09-17 00:44:53,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=525311.3333333334, ans=0.0 2024-09-17 00:45:17,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525339.6666666666, ans=0.1 2024-09-17 00:45:18,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-17 00:45:36,153 INFO [train.py:1198] (1/2) Epoch 30, batch 100, loss[loss=0.2262, ctc_loss=0.1535, cr_loss=0.3638, over 20155.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.149, cr_loss=0.3705, over 1620309.27 frames. ], batch size: 80, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:45:41,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=525396.3333333334, ans=0.0 2024-09-17 00:45:45,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525396.3333333334, ans=0.1 2024-09-17 00:45:56,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=525424.6666666666, ans=0.125 2024-09-17 00:46:43,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=525509.6666666666, ans=0.125 2024-09-17 00:46:54,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-09-17 00:46:54,830 INFO [train.py:1198] (1/2) Epoch 30, batch 150, loss[loss=0.2294, ctc_loss=0.1535, cr_loss=0.3795, over 20810.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3744, over 2169797.95 frames. ], batch size: 56, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:47:03,660 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.145e+02 2.323e+02 2.514e+02 3.152e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 00:47:24,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-17 00:47:52,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525623.0, ans=0.1 2024-09-17 00:47:56,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=525651.3333333334, ans=0.125 2024-09-17 00:48:02,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=525651.3333333334, ans=0.125 2024-09-17 00:48:07,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=525651.3333333334, ans=0.0 2024-09-17 00:48:11,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=525679.6666666666, ans=0.0 2024-09-17 00:48:12,912 INFO [train.py:1198] (1/2) Epoch 30, batch 200, loss[loss=0.225, ctc_loss=0.1512, cr_loss=0.369, over 20663.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.373, over 2602866.36 frames. ], batch size: 66, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:48:19,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525679.6666666666, ans=0.1 2024-09-17 00:48:32,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=525708.0, ans=0.025 2024-09-17 00:49:23,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525793.0, ans=0.0 2024-09-17 00:49:27,998 INFO [train.py:1198] (1/2) Epoch 30, batch 250, loss[loss=0.2073, ctc_loss=0.1373, cr_loss=0.3499, over 20944.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3738, over 2920596.91 frames. ], batch size: 49, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:49:37,150 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.151e+02 2.228e+02 2.428e+02 4.713e+02, threshold=4.455e+02, percent-clipped=1.0 2024-09-17 00:49:50,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-17 00:49:57,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=525878.0, ans=0.2 2024-09-17 00:50:24,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=525906.3333333334, ans=0.0 2024-09-17 00:50:43,803 INFO [train.py:1198] (1/2) Epoch 30, batch 300, loss[loss=0.2366, ctc_loss=0.1566, cr_loss=0.4001, over 20656.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3742, over 3166578.64 frames. ], batch size: 66, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:50:44,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-09-17 00:50:53,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525963.0, ans=0.0 2024-09-17 00:50:56,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=525963.0, ans=0.0 2024-09-17 00:51:26,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=22.5 2024-09-17 00:51:40,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=526048.0, ans=0.2 2024-09-17 00:51:44,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526076.3333333334, ans=0.1 2024-09-17 00:51:46,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-17 00:51:53,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=526076.3333333334, ans=0.2 2024-09-17 00:51:59,787 INFO [train.py:1198] (1/2) Epoch 30, batch 350, loss[loss=0.2613, ctc_loss=0.1768, cr_loss=0.4224, over 21018.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3741, over 3363357.52 frames. ], batch size: 61, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:52:12,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.167e+02 2.278e+02 2.443e+02 3.101e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 00:52:25,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-17 00:52:28,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 00:52:29,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526133.0, ans=0.0 2024-09-17 00:52:40,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.30 vs. limit=5.0 2024-09-17 00:52:56,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=526189.6666666666, ans=0.125 2024-09-17 00:53:02,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=526218.0, ans=0.0 2024-09-17 00:53:17,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-17 00:53:20,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526246.3333333334, ans=0.125 2024-09-17 00:53:21,569 INFO [train.py:1198] (1/2) Epoch 30, batch 400, loss[loss=0.2181, ctc_loss=0.146, cr_loss=0.3607, over 20956.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3732, over 3517392.84 frames. ], batch size: 60, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:53:41,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526274.6666666666, ans=0.125 2024-09-17 00:54:05,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=526331.3333333334, ans=0.125 2024-09-17 00:54:09,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-09-17 00:54:37,519 INFO [train.py:1198] (1/2) Epoch 30, batch 450, loss[loss=0.2301, ctc_loss=0.1562, cr_loss=0.3694, over 20980.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.374, over 3636476.28 frames. ], batch size: 55, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:54:46,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 2.124e+02 2.232e+02 2.429e+02 3.758e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-17 00:54:52,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526416.3333333334, ans=0.125 2024-09-17 00:54:57,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=526416.3333333334, ans=0.0 2024-09-17 00:55:12,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526444.6666666666, ans=0.0 2024-09-17 00:55:34,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526473.0, ans=0.0 2024-09-17 00:55:52,555 INFO [train.py:1198] (1/2) Epoch 30, batch 500, loss[loss=0.2401, ctc_loss=0.1602, cr_loss=0.3996, over 20600.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1525, cr_loss=0.3772, over 3733496.07 frames. ], batch size: 75, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:55:52,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=526529.6666666666, ans=0.125 2024-09-17 00:56:15,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=526558.0, ans=15.0 2024-09-17 00:56:16,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=526558.0, ans=0.125 2024-09-17 00:56:41,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-17 00:57:07,831 INFO [train.py:1198] (1/2) Epoch 30, batch 550, loss[loss=0.2007, ctc_loss=0.1309, cr_loss=0.3487, over 20940.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1535, cr_loss=0.3791, over 3821325.46 frames. ], batch size: 60, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:57:14,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=526671.3333333334, ans=0.0 2024-09-17 00:57:16,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=526671.3333333334, ans=0.125 2024-09-17 00:57:16,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.211e+02 2.385e+02 2.602e+02 5.099e+02, threshold=4.770e+02, percent-clipped=1.0 2024-09-17 00:57:20,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2024-09-17 00:58:17,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=526784.6666666666, ans=0.0 2024-09-17 00:58:25,983 INFO [train.py:1198] (1/2) Epoch 30, batch 600, loss[loss=0.1974, ctc_loss=0.1322, cr_loss=0.3257, over 20333.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1529, cr_loss=0.3782, over 3883869.22 frames. ], batch size: 45, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:58:30,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=526813.0, ans=0.07 2024-09-17 00:59:45,003 INFO [train.py:1198] (1/2) Epoch 30, batch 650, loss[loss=0.2428, ctc_loss=0.1628, cr_loss=0.3998, over 20677.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1527, cr_loss=0.3775, over 3921665.25 frames. ], batch size: 71, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:59:53,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.141e+02 2.266e+02 2.459e+02 4.953e+02, threshold=4.533e+02, percent-clipped=1.0 2024-09-17 01:00:07,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=526983.0, ans=0.0 2024-09-17 01:00:20,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=527011.3333333334, ans=0.0 2024-09-17 01:00:23,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=527011.3333333334, ans=0.025 2024-09-17 01:00:26,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527011.3333333334, ans=0.1 2024-09-17 01:00:32,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=527039.6666666666, ans=0.2 2024-09-17 01:00:38,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527039.6666666666, ans=0.1 2024-09-17 01:01:01,154 INFO [train.py:1198] (1/2) Epoch 30, batch 700, loss[loss=0.2225, ctc_loss=0.1469, cr_loss=0.3782, over 20931.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1525, cr_loss=0.3769, over 3960040.63 frames. ], batch size: 60, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:01:25,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=527124.6666666666, ans=0.2 2024-09-17 01:01:52,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=527181.3333333334, ans=0.0 2024-09-17 01:02:13,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=527209.6666666666, ans=0.07 2024-09-17 01:02:16,577 INFO [train.py:1198] (1/2) Epoch 30, batch 750, loss[loss=0.2515, ctc_loss=0.1697, cr_loss=0.4094, over 20775.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1537, cr_loss=0.3793, over 3978466.80 frames. ], batch size: 56, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:02:21,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=527238.0, ans=0.2 2024-09-17 01:02:22,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=527238.0, ans=0.125 2024-09-17 01:02:25,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.139e+02 2.297e+02 2.430e+02 2.989e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 01:02:27,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527238.0, ans=0.1 2024-09-17 01:02:56,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-09-17 01:03:35,341 INFO [train.py:1198] (1/2) Epoch 30, batch 800, loss[loss=0.2547, ctc_loss=0.1726, cr_loss=0.4105, over 20774.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1537, cr_loss=0.3786, over 4010551.25 frames. ], batch size: 56, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:04:46,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=527493.0, ans=0.125 2024-09-17 01:04:53,957 INFO [train.py:1198] (1/2) Epoch 30, batch 850, loss[loss=0.1664, ctc_loss=0.1084, cr_loss=0.29, over 20980.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1524, cr_loss=0.3769, over 4044151.61 frames. ], batch size: 51, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:04:58,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=527521.3333333334, ans=0.5 2024-09-17 01:05:02,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.146e+02 2.289e+02 2.431e+02 3.556e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-17 01:05:23,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-17 01:05:44,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2024-09-17 01:05:51,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527606.3333333334, ans=0.125 2024-09-17 01:05:51,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=527606.3333333334, ans=0.09899494936611666 2024-09-17 01:06:01,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=527634.6666666666, ans=0.125 2024-09-17 01:06:08,616 INFO [train.py:1198] (1/2) Epoch 30, batch 900, loss[loss=0.2632, ctc_loss=0.18, cr_loss=0.416, over 20206.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1518, cr_loss=0.3761, over 4049172.54 frames. ], batch size: 80, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:06:20,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=527663.0, ans=0.025 2024-09-17 01:07:16,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527776.3333333334, ans=0.0 2024-09-17 01:07:21,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=527776.3333333334, ans=0.125 2024-09-17 01:07:23,856 INFO [train.py:1198] (1/2) Epoch 30, batch 950, loss[loss=0.2344, ctc_loss=0.1585, cr_loss=0.3793, over 20738.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3765, over 4066423.47 frames. ], batch size: 71, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:07:32,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.120e+02 2.234e+02 2.375e+02 2.946e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-17 01:07:36,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.85 vs. limit=10.0 2024-09-17 01:07:58,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527861.3333333334, ans=0.0 2024-09-17 01:08:03,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527861.3333333334, ans=0.1 2024-09-17 01:08:09,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=527889.6666666666, ans=0.125 2024-09-17 01:08:24,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=527918.0, ans=0.125 2024-09-17 01:08:39,136 INFO [train.py:1198] (1/2) Epoch 30, batch 1000, loss[loss=0.2333, ctc_loss=0.1582, cr_loss=0.3757, over 20618.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1514, cr_loss=0.3762, over 4076516.96 frames. ], batch size: 68, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:08:45,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527946.3333333334, ans=0.1 2024-09-17 01:09:24,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-17 01:09:25,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=528031.3333333334, ans=0.015 2024-09-17 01:09:57,080 INFO [train.py:1198] (1/2) Epoch 30, batch 1050, loss[loss=0.2134, ctc_loss=0.1426, cr_loss=0.3539, over 20250.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3759, over 4086516.65 frames. ], batch size: 74, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:10:06,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.120e+02 2.221e+02 2.349e+02 3.101e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-17 01:10:15,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=528116.3333333334, ans=0.015 2024-09-17 01:11:09,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=528201.3333333334, ans=0.025 2024-09-17 01:11:16,827 INFO [train.py:1198] (1/2) Epoch 30, batch 1100, loss[loss=0.2619, ctc_loss=0.1785, cr_loss=0.4174, over 20826.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3745, over 4093292.38 frames. ], batch size: 65, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:11:55,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=528286.3333333334, ans=0.0 2024-09-17 01:12:02,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528314.6666666666, ans=0.125 2024-09-17 01:12:32,393 INFO [train.py:1198] (1/2) Epoch 30, batch 1150, loss[loss=0.2022, ctc_loss=0.1351, cr_loss=0.3355, over 19452.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1515, cr_loss=0.3756, over 4090737.42 frames. ], batch size: 43, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:12:41,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.162e+02 2.278e+02 2.515e+02 3.617e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 01:12:46,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=528399.6666666666, ans=0.125 2024-09-17 01:12:49,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=528399.6666666666, ans=0.0 2024-09-17 01:13:08,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528428.0, ans=0.1 2024-09-17 01:13:28,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=528456.3333333334, ans=0.2 2024-09-17 01:13:48,036 INFO [train.py:1198] (1/2) Epoch 30, batch 1200, loss[loss=0.2432, ctc_loss=0.1624, cr_loss=0.4042, over 21015.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1521, cr_loss=0.3762, over 4086405.82 frames. ], batch size: 61, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:14:10,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=528541.3333333334, ans=0.0 2024-09-17 01:14:20,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=528569.6666666666, ans=0.125 2024-09-17 01:15:07,161 INFO [train.py:1198] (1/2) Epoch 30, batch 1250, loss[loss=0.204, ctc_loss=0.1356, cr_loss=0.3419, over 21004.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3753, over 4103967.79 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:15:07,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=528654.6666666666, ans=0.125 2024-09-17 01:15:07,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=528654.6666666666, ans=0.125 2024-09-17 01:15:17,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.152e+02 2.256e+02 2.360e+02 3.023e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-17 01:15:28,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=528683.0, ans=0.2 2024-09-17 01:15:45,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=528711.3333333334, ans=0.125 2024-09-17 01:16:25,779 INFO [train.py:1198] (1/2) Epoch 30, batch 1300, loss[loss=0.2368, ctc_loss=0.1573, cr_loss=0.3975, over 20286.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3745, over 4098165.84 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:16:30,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528796.3333333334, ans=0.1 2024-09-17 01:17:07,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=528853.0, ans=0.0 2024-09-17 01:17:25,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-17 01:17:32,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528909.6666666666, ans=0.1 2024-09-17 01:17:41,458 INFO [train.py:1198] (1/2) Epoch 30, batch 1350, loss[loss=0.2274, ctc_loss=0.1514, cr_loss=0.3801, over 21075.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1513, cr_loss=0.3758, over 4100227.78 frames. ], batch size: 56, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:17:52,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.140e+02 2.245e+02 2.378e+02 4.804e+02, threshold=4.491e+02, percent-clipped=1.0 2024-09-17 01:17:56,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=528966.3333333334, ans=0.04949747468305833 2024-09-17 01:18:38,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=529023.0, ans=0.2 2024-09-17 01:18:38,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=529023.0, ans=0.2 2024-09-17 01:18:56,300 INFO [train.py:1198] (1/2) Epoch 30, batch 1400, loss[loss=0.2409, ctc_loss=0.1608, cr_loss=0.4002, over 20952.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.3751, over 4092903.77 frames. ], batch size: 64, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:19:37,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529136.3333333334, ans=0.125 2024-09-17 01:19:41,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=529164.6666666666, ans=0.05 2024-09-17 01:20:01,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=529193.0, ans=0.125 2024-09-17 01:20:07,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529193.0, ans=0.1 2024-09-17 01:20:11,901 INFO [train.py:1198] (1/2) Epoch 30, batch 1450, loss[loss=0.1957, ctc_loss=0.1258, cr_loss=0.3494, over 20990.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3742, over 4099264.21 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:20:18,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=529221.3333333334, ans=0.025 2024-09-17 01:20:22,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.128e+02 2.266e+02 2.464e+02 3.893e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 01:20:35,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=529249.6666666666, ans=0.025 2024-09-17 01:21:14,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=529334.6666666666, ans=15.0 2024-09-17 01:21:30,937 INFO [train.py:1198] (1/2) Epoch 30, batch 1500, loss[loss=0.2458, ctc_loss=0.1662, cr_loss=0.3981, over 20828.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3739, over 4098601.80 frames. ], batch size: 59, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:21:46,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=529391.3333333334, ans=0.0 2024-09-17 01:21:52,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=529391.3333333334, ans=0.2 2024-09-17 01:22:02,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529419.6666666666, ans=0.125 2024-09-17 01:22:26,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:22:49,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-09-17 01:22:50,335 INFO [train.py:1198] (1/2) Epoch 30, batch 1550, loss[loss=0.2549, ctc_loss=0.1759, cr_loss=0.3954, over 18366.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3739, over 4103557.89 frames. ], batch size: 108, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:22:53,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=529504.6666666666, ans=0.125 2024-09-17 01:23:00,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.158e+02 2.296e+02 2.497e+02 3.223e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-17 01:23:24,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=529561.3333333334, ans=0.2 2024-09-17 01:23:38,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529589.6666666666, ans=0.125 2024-09-17 01:23:48,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=529589.6666666666, ans=0.1 2024-09-17 01:23:52,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529618.0, ans=0.0 2024-09-17 01:24:06,402 INFO [train.py:1198] (1/2) Epoch 30, batch 1600, loss[loss=0.2359, ctc_loss=0.1593, cr_loss=0.3834, over 20660.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3735, over 4109105.05 frames. ], batch size: 66, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:24:11,297 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:24:21,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=529674.6666666666, ans=0.2 2024-09-17 01:24:32,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=529674.6666666666, ans=0.125 2024-09-17 01:25:13,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=529759.6666666666, ans=0.2 2024-09-17 01:25:19,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-17 01:25:21,987 INFO [train.py:1198] (1/2) Epoch 30, batch 1650, loss[loss=0.2667, ctc_loss=0.1784, cr_loss=0.4418, over 20647.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3761, over 4091172.76 frames. ], batch size: 66, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:25:25,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=529788.0, ans=10.0 2024-09-17 01:25:26,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=529788.0, ans=0.125 2024-09-17 01:25:28,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=529788.0, ans=0.125 2024-09-17 01:25:32,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.129e+02 2.229e+02 2.370e+02 3.077e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-17 01:25:57,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529844.6666666666, ans=0.0 2024-09-17 01:26:04,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=529844.6666666666, ans=0.125 2024-09-17 01:26:30,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-17 01:26:40,443 INFO [train.py:1198] (1/2) Epoch 30, batch 1700, loss[loss=0.2505, ctc_loss=0.1675, cr_loss=0.4148, over 20695.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.151, cr_loss=0.3753, over 4097266.79 frames. ], batch size: 68, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:26:49,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529929.6666666666, ans=0.0 2024-09-17 01:27:56,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=530043.0, ans=0.125 2024-09-17 01:27:59,092 INFO [train.py:1198] (1/2) Epoch 30, batch 1750, loss[loss=0.2263, ctc_loss=0.151, cr_loss=0.3763, over 20725.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3748, over 4097097.15 frames. ], batch size: 71, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:28:09,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.130e+02 2.256e+02 2.425e+02 4.114e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-17 01:28:12,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=22.5 2024-09-17 01:28:43,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=530156.3333333334, ans=0.0 2024-09-17 01:28:46,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-17 01:29:14,772 INFO [train.py:1198] (1/2) Epoch 30, batch 1800, loss[loss=0.1872, ctc_loss=0.122, cr_loss=0.3259, over 21004.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3743, over 4093386.33 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:29:50,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=530269.6666666666, ans=0.025 2024-09-17 01:29:57,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=530269.6666666666, ans=0.125 2024-09-17 01:30:23,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=530326.3333333334, ans=0.125 2024-09-17 01:30:30,934 INFO [train.py:1198] (1/2) Epoch 30, batch 1850, loss[loss=0.2055, ctc_loss=0.1321, cr_loss=0.3668, over 19896.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.374, over 4098446.08 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:30:41,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.167e+02 2.258e+02 2.458e+02 5.473e+02, threshold=4.517e+02, percent-clipped=1.0 2024-09-17 01:31:13,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=530411.3333333334, ans=0.125 2024-09-17 01:31:33,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=530468.0, ans=0.0 2024-09-17 01:31:43,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=530468.0, ans=0.025 2024-09-17 01:31:46,355 INFO [train.py:1198] (1/2) Epoch 30, batch 1900, loss[loss=0.2001, ctc_loss=0.1309, cr_loss=0.3457, over 20961.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1516, cr_loss=0.3762, over 4093698.51 frames. ], batch size: 51, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:32:38,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=530581.3333333334, ans=0.125 2024-09-17 01:32:39,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=530581.3333333334, ans=0.125 2024-09-17 01:32:56,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=530609.6666666666, ans=0.125 2024-09-17 01:33:04,675 INFO [train.py:1198] (1/2) Epoch 30, batch 1950, loss[loss=0.2419, ctc_loss=0.1622, cr_loss=0.3986, over 20343.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3759, over 4103740.86 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:33:15,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.152e+02 2.322e+02 2.436e+02 3.414e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-17 01:33:29,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=530666.3333333334, ans=0.125 2024-09-17 01:33:48,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530694.6666666666, ans=0.1 2024-09-17 01:34:11,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=530751.3333333334, ans=0.125 2024-09-17 01:34:23,734 INFO [train.py:1198] (1/2) Epoch 30, batch 2000, loss[loss=0.1828, ctc_loss=0.1183, cr_loss=0.3226, over 20945.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.376, over 4096823.38 frames. ], batch size: 49, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:34:50,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=530808.0, ans=0.125 2024-09-17 01:34:57,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=530836.3333333334, ans=0.125 2024-09-17 01:35:16,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530864.6666666666, ans=0.1 2024-09-17 01:35:20,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-17 01:35:35,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-17 01:35:39,233 INFO [train.py:1198] (1/2) Epoch 30, batch 2050, loss[loss=0.2236, ctc_loss=0.1454, cr_loss=0.391, over 20944.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3753, over 4091207.19 frames. ], batch size: 67, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:35:49,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.120e+02 2.239e+02 2.413e+02 5.839e+02, threshold=4.478e+02, percent-clipped=1.0 2024-09-17 01:36:14,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=530978.0, ans=0.0 2024-09-17 01:36:48,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=531034.6666666666, ans=0.0 2024-09-17 01:36:55,842 INFO [train.py:1198] (1/2) Epoch 30, batch 2100, loss[loss=0.2112, ctc_loss=0.1385, cr_loss=0.3632, over 20979.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1507, cr_loss=0.3752, over 4101391.10 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:36:57,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=531063.0, ans=0.125 2024-09-17 01:37:17,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531091.3333333334, ans=0.125 2024-09-17 01:37:53,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=531148.0, ans=0.0 2024-09-17 01:38:00,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=531176.3333333334, ans=0.5 2024-09-17 01:38:05,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=531176.3333333334, ans=0.125 2024-09-17 01:38:13,910 INFO [train.py:1198] (1/2) Epoch 30, batch 2150, loss[loss=0.2133, ctc_loss=0.1424, cr_loss=0.3543, over 21040.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1515, cr_loss=0.3763, over 4102901.49 frames. ], batch size: 53, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:38:24,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.139e+02 2.270e+02 2.532e+02 3.032e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-17 01:38:32,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-17 01:39:17,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=531318.0, ans=0.125 2024-09-17 01:39:32,486 INFO [train.py:1198] (1/2) Epoch 30, batch 2200, loss[loss=0.1736, ctc_loss=0.1136, cr_loss=0.2998, over 20981.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.375, over 4109207.86 frames. ], batch size: 50, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:39:34,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=531346.3333333334, ans=0.025 2024-09-17 01:40:33,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531459.6666666666, ans=0.1 2024-09-17 01:40:48,038 INFO [train.py:1198] (1/2) Epoch 30, batch 2250, loss[loss=0.1895, ctc_loss=0.1247, cr_loss=0.324, over 20963.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.3751, over 4104356.02 frames. ], batch size: 50, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:40:58,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.156e+02 2.262e+02 2.516e+02 3.058e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-17 01:41:00,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=531488.0, ans=0.0 2024-09-17 01:41:16,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531516.3333333334, ans=0.1 2024-09-17 01:41:43,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=531573.0, ans=0.025 2024-09-17 01:41:50,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=531601.3333333334, ans=15.0 2024-09-17 01:41:52,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-09-17 01:42:05,087 INFO [train.py:1198] (1/2) Epoch 30, batch 2300, loss[loss=0.226, ctc_loss=0.1497, cr_loss=0.3815, over 20985.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3738, over 4110731.54 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:42:17,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=531629.6666666666, ans=0.0 2024-09-17 01:42:25,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=531658.0, ans=6.0 2024-09-17 01:43:10,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531743.0, ans=0.1 2024-09-17 01:43:20,467 INFO [train.py:1198] (1/2) Epoch 30, batch 2350, loss[loss=0.2469, ctc_loss=0.1653, cr_loss=0.4082, over 20093.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1501, cr_loss=0.373, over 4116823.59 frames. ], batch size: 80, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:43:31,074 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.138e+02 2.313e+02 2.458e+02 5.191e+02, threshold=4.627e+02, percent-clipped=2.0 2024-09-17 01:43:57,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=531828.0, ans=0.0 2024-09-17 01:44:00,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=531828.0, ans=0.0 2024-09-17 01:44:08,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=531856.3333333334, ans=0.125 2024-09-17 01:44:09,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=531856.3333333334, ans=0.125 2024-09-17 01:44:21,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=531856.3333333334, ans=0.125 2024-09-17 01:44:27,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=531884.6666666666, ans=0.125 2024-09-17 01:44:34,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-17 01:44:39,261 INFO [train.py:1198] (1/2) Epoch 30, batch 2400, loss[loss=0.241, ctc_loss=0.1607, cr_loss=0.4013, over 21022.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1509, cr_loss=0.3747, over 4119919.04 frames. ], batch size: 61, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:44:53,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=531941.3333333334, ans=0.2 2024-09-17 01:45:28,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531998.0, ans=0.125 2024-09-17 01:45:57,923 INFO [train.py:1198] (1/2) Epoch 30, batch 2450, loss[loss=0.2188, ctc_loss=0.1459, cr_loss=0.3647, over 20926.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3735, over 4109759.22 frames. ], batch size: 60, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:46:08,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.168e+02 2.254e+02 2.444e+02 3.250e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-17 01:46:10,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=532054.6666666666, ans=0.0 2024-09-17 01:46:41,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-09-17 01:47:06,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=532168.0, ans=0.04949747468305833 2024-09-17 01:47:09,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=532168.0, ans=0.125 2024-09-17 01:47:14,046 INFO [train.py:1198] (1/2) Epoch 30, batch 2500, loss[loss=0.2439, ctc_loss=0.1645, cr_loss=0.3968, over 19528.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1511, cr_loss=0.3741, over 4103121.72 frames. ], batch size: 90, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:47:53,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-17 01:48:08,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532281.3333333334, ans=0.1 2024-09-17 01:48:09,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=532281.3333333334, ans=0.125 2024-09-17 01:48:26,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=532309.6666666666, ans=0.025 2024-09-17 01:48:28,973 INFO [train.py:1198] (1/2) Epoch 30, batch 2550, loss[loss=0.2187, ctc_loss=0.1431, cr_loss=0.3778, over 20974.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3748, over 4092508.19 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:48:39,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.156e+02 2.232e+02 2.452e+02 5.570e+02, threshold=4.464e+02, percent-clipped=1.0 2024-09-17 01:48:41,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=22.5 2024-09-17 01:48:46,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=532366.3333333334, ans=0.125 2024-09-17 01:49:40,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=532451.3333333334, ans=0.125 2024-09-17 01:49:47,741 INFO [train.py:1198] (1/2) Epoch 30, batch 2600, loss[loss=0.2116, ctc_loss=0.1385, cr_loss=0.3657, over 21075.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3747, over 4094784.65 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:50:26,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 01:50:27,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=532536.3333333334, ans=0.0 2024-09-17 01:50:36,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=532564.6666666666, ans=0.0 2024-09-17 01:51:02,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-17 01:51:05,900 INFO [train.py:1198] (1/2) Epoch 30, batch 2650, loss[loss=0.2555, ctc_loss=0.1694, cr_loss=0.4301, over 21089.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1528, cr_loss=0.3781, over 4088724.47 frames. ], batch size: 59, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:51:16,684 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.122e+02 2.251e+02 2.469e+02 4.413e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 01:51:21,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=532649.6666666666, ans=0.05 2024-09-17 01:51:36,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-09-17 01:51:50,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=532678.0, ans=0.125 2024-09-17 01:51:54,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=532706.3333333334, ans=0.0 2024-09-17 01:51:54,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=22.5 2024-09-17 01:52:01,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=532706.3333333334, ans=0.2 2024-09-17 01:52:23,348 INFO [train.py:1198] (1/2) Epoch 30, batch 2700, loss[loss=0.2257, ctc_loss=0.1512, cr_loss=0.3721, over 21022.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1526, cr_loss=0.3778, over 4095871.60 frames. ], batch size: 62, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:52:25,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-17 01:53:11,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=532848.0, ans=0.5 2024-09-17 01:53:32,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532876.3333333334, ans=0.1 2024-09-17 01:53:38,683 INFO [train.py:1198] (1/2) Epoch 30, batch 2750, loss[loss=0.2411, ctc_loss=0.1635, cr_loss=0.3881, over 20926.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.153, cr_loss=0.378, over 4085638.60 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:53:49,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.183e+02 2.309e+02 2.496e+02 2.932e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 01:54:12,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=532961.3333333334, ans=0.125 2024-09-17 01:54:26,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=532989.6666666666, ans=0.0 2024-09-17 01:54:54,469 INFO [train.py:1198] (1/2) Epoch 30, batch 2800, loss[loss=0.2414, ctc_loss=0.1643, cr_loss=0.3852, over 19926.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1531, cr_loss=0.3779, over 4077755.38 frames. ], batch size: 80, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:55:08,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=533046.3333333334, ans=0.0 2024-09-17 01:55:36,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533103.0, ans=0.1 2024-09-17 01:56:13,074 INFO [train.py:1198] (1/2) Epoch 30, batch 2850, loss[loss=0.2548, ctc_loss=0.1737, cr_loss=0.4056, over 18272.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.153, cr_loss=0.3779, over 4089019.07 frames. ], batch size: 108, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:56:23,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.178e+02 2.325e+02 2.456e+02 4.976e+02, threshold=4.650e+02, percent-clipped=1.0 2024-09-17 01:56:54,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533244.6666666666, ans=0.1 2024-09-17 01:57:31,987 INFO [train.py:1198] (1/2) Epoch 30, batch 2900, loss[loss=0.1928, ctc_loss=0.1271, cr_loss=0.3283, over 20965.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1538, cr_loss=0.3786, over 4085199.66 frames. ], batch size: 51, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:58:08,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533386.3333333334, ans=0.1 2024-09-17 01:58:16,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:58:37,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533443.0, ans=0.1 2024-09-17 01:58:38,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=533443.0, ans=0.0 2024-09-17 01:58:47,289 INFO [train.py:1198] (1/2) Epoch 30, batch 2950, loss[loss=0.2438, ctc_loss=0.1616, cr_loss=0.4112, over 20958.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1533, cr_loss=0.3784, over 4091283.42 frames. ], batch size: 58, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:58:51,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-17 01:58:57,871 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.174e+02 2.288e+02 2.456e+02 3.838e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-17 01:59:05,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=533499.6666666666, ans=0.125 2024-09-17 01:59:10,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533499.6666666666, ans=0.1 2024-09-17 01:59:13,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=533499.6666666666, ans=0.125 2024-09-17 01:59:22,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=533528.0, ans=0.125 2024-09-17 01:59:24,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533528.0, ans=0.125 2024-09-17 01:59:43,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=533556.3333333334, ans=0.125 2024-09-17 01:59:49,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=533584.6666666666, ans=0.0 2024-09-17 02:00:03,350 INFO [train.py:1198] (1/2) Epoch 30, batch 3000, loss[loss=0.2214, ctc_loss=0.1489, cr_loss=0.3629, over 20929.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1524, cr_loss=0.3774, over 4103430.58 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:00:03,351 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 02:00:24,468 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.04164, ctc_loss=0.04164, cr_loss=1.25e-14, over 944034.00 frames. 2024-09-17 02:00:24,469 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 02:00:27,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2024-09-17 02:00:52,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=533641.3333333334, ans=0.125 2024-09-17 02:01:16,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=533698.0, ans=0.125 2024-09-17 02:01:29,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-17 02:01:35,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=533726.3333333334, ans=0.125 2024-09-17 02:01:43,703 INFO [train.py:1198] (1/2) Epoch 30, batch 3050, loss[loss=0.2428, ctc_loss=0.1595, cr_loss=0.4163, over 21078.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3763, over 4091439.17 frames. ], batch size: 59, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:01:54,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.114e+02 2.234e+02 2.399e+02 3.160e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-17 02:01:57,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:02:08,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=533783.0, ans=0.125 2024-09-17 02:02:17,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-09-17 02:02:38,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=533839.6666666666, ans=0.0 2024-09-17 02:02:51,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-17 02:03:01,870 INFO [train.py:1198] (1/2) Epoch 30, batch 3100, loss[loss=0.2245, ctc_loss=0.1517, cr_loss=0.3639, over 20836.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1526, cr_loss=0.3771, over 4091640.05 frames. ], batch size: 59, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:03:20,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=533924.6666666666, ans=0.0 2024-09-17 02:03:29,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=533924.6666666666, ans=0.125 2024-09-17 02:03:31,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=533953.0, ans=0.125 2024-09-17 02:03:58,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533981.3333333334, ans=0.125 2024-09-17 02:04:06,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=534009.6666666666, ans=0.125 2024-09-17 02:04:17,647 INFO [train.py:1198] (1/2) Epoch 30, batch 3150, loss[loss=0.1915, ctc_loss=0.1212, cr_loss=0.3514, over 20955.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1518, cr_loss=0.3767, over 4097981.41 frames. ], batch size: 49, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:04:28,290 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.145e+02 2.284e+02 2.407e+02 3.555e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-17 02:05:33,776 INFO [train.py:1198] (1/2) Epoch 30, batch 3200, loss[loss=0.2205, ctc_loss=0.1493, cr_loss=0.356, over 20795.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3751, over 4102771.92 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:05:40,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=534179.6666666666, ans=0.0 2024-09-17 02:05:44,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=534179.6666666666, ans=0.125 2024-09-17 02:06:24,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=534264.6666666666, ans=0.125 2024-09-17 02:06:53,511 INFO [train.py:1198] (1/2) Epoch 30, batch 3250, loss[loss=0.2396, ctc_loss=0.1618, cr_loss=0.3892, over 21016.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3756, over 4105676.54 frames. ], batch size: 63, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:07:04,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.178e+02 2.333e+02 2.477e+02 3.944e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-17 02:07:23,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 02:07:49,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=534406.3333333334, ans=0.125 2024-09-17 02:07:52,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=534406.3333333334, ans=0.0 2024-09-17 02:08:13,325 INFO [train.py:1198] (1/2) Epoch 30, batch 3300, loss[loss=0.2355, ctc_loss=0.1557, cr_loss=0.3988, over 21007.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3738, over 4115028.06 frames. ], batch size: 63, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:08:13,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=534463.0, ans=0.0 2024-09-17 02:09:13,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534576.3333333334, ans=0.1 2024-09-17 02:09:16,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=534576.3333333334, ans=0.0 2024-09-17 02:09:29,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.19 vs. limit=15.0 2024-09-17 02:09:29,690 INFO [train.py:1198] (1/2) Epoch 30, batch 3350, loss[loss=0.2132, ctc_loss=0.1418, cr_loss=0.357, over 20976.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.15, cr_loss=0.3739, over 4114656.46 frames. ], batch size: 55, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:09:40,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.130e+02 2.265e+02 2.418e+02 3.586e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-17 02:09:42,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=534604.6666666666, ans=0.0 2024-09-17 02:09:45,498 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:09:55,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=534633.0, ans=0.125 2024-09-17 02:10:06,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534661.3333333334, ans=0.1 2024-09-17 02:10:13,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534689.6666666666, ans=0.125 2024-09-17 02:10:17,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=534689.6666666666, ans=0.125 2024-09-17 02:10:45,065 INFO [train.py:1198] (1/2) Epoch 30, batch 3400, loss[loss=0.2388, ctc_loss=0.1572, cr_loss=0.4076, over 20703.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.3749, over 4117198.19 frames. ], batch size: 68, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:10:55,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=534746.3333333334, ans=0.125 2024-09-17 02:10:57,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=534746.3333333334, ans=0.2 2024-09-17 02:11:06,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=534774.6666666666, ans=0.0 2024-09-17 02:11:15,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534803.0, ans=0.125 2024-09-17 02:11:35,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=534831.3333333334, ans=0.0 2024-09-17 02:11:39,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=534831.3333333334, ans=0.125 2024-09-17 02:11:53,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534859.6666666666, ans=0.1 2024-09-17 02:12:00,668 INFO [train.py:1198] (1/2) Epoch 30, batch 3450, loss[loss=0.2462, ctc_loss=0.1734, cr_loss=0.3643, over 14479.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.151, cr_loss=0.3757, over 4099254.82 frames. ], batch size: 149, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:12:17,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.142e+02 2.255e+02 2.437e+02 3.136e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 02:12:22,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534916.3333333334, ans=0.1 2024-09-17 02:12:34,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534944.6666666666, ans=0.125 2024-09-17 02:12:37,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=534944.6666666666, ans=0.0 2024-09-17 02:12:45,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-17 02:13:14,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=535001.3333333334, ans=0.0 2024-09-17 02:13:19,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=535029.6666666666, ans=0.2 2024-09-17 02:13:20,538 INFO [train.py:1198] (1/2) Epoch 30, batch 3500, loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.3831, over 20946.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3762, over 4101173.58 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:13:23,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=535029.6666666666, ans=0.125 2024-09-17 02:13:23,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=535029.6666666666, ans=0.125 2024-09-17 02:13:26,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=535029.6666666666, ans=0.025 2024-09-17 02:13:29,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535029.6666666666, ans=0.1 2024-09-17 02:14:05,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=535086.3333333334, ans=0.125 2024-09-17 02:14:39,210 INFO [train.py:1198] (1/2) Epoch 30, batch 3550, loss[loss=0.1905, ctc_loss=0.1275, cr_loss=0.3147, over 20902.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3741, over 4096836.35 frames. ], batch size: 57, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:14:44,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=535171.3333333334, ans=0.05 2024-09-17 02:14:52,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.161e+02 2.313e+02 2.481e+02 4.633e+02, threshold=4.626e+02, percent-clipped=2.0 2024-09-17 02:15:01,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=535199.6666666666, ans=0.0 2024-09-17 02:15:30,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=535256.3333333334, ans=0.0 2024-09-17 02:15:41,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-17 02:15:54,826 INFO [train.py:1198] (1/2) Epoch 30, batch 3600, loss[loss=0.2575, ctc_loss=0.1744, cr_loss=0.4155, over 20024.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3739, over 4084420.30 frames. ], batch size: 80, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:16:14,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=535341.3333333334, ans=0.125 2024-09-17 02:16:20,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=535341.3333333334, ans=0.125 2024-09-17 02:16:24,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-17 02:16:55,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=535426.3333333334, ans=0.2 2024-09-17 02:17:10,289 INFO [train.py:1198] (1/2) Epoch 30, batch 3650, loss[loss=0.2054, ctc_loss=0.136, cr_loss=0.3471, over 20931.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1495, cr_loss=0.3722, over 4096664.83 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:17:23,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.121e+02 2.237e+02 2.396e+02 3.207e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-17 02:17:24,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=535483.0, ans=0.0 2024-09-17 02:17:28,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535483.0, ans=0.1 2024-09-17 02:17:43,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=535511.3333333334, ans=0.2 2024-09-17 02:17:48,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535511.3333333334, ans=0.125 2024-09-17 02:18:07,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=535539.6666666666, ans=0.2 2024-09-17 02:18:10,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=535539.6666666666, ans=0.2 2024-09-17 02:18:20,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=535568.0, ans=0.125 2024-09-17 02:18:28,790 INFO [train.py:1198] (1/2) Epoch 30, batch 3700, loss[loss=0.2401, ctc_loss=0.1611, cr_loss=0.3948, over 20825.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.149, cr_loss=0.3711, over 4096217.62 frames. ], batch size: 59, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:18:40,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=535596.3333333334, ans=0.125 2024-09-17 02:18:54,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=535624.6666666666, ans=0.125 2024-09-17 02:19:47,701 INFO [train.py:1198] (1/2) Epoch 30, batch 3750, loss[loss=0.2059, ctc_loss=0.1378, cr_loss=0.3404, over 20924.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.373, over 4105406.66 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:20:01,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.114e+02 2.254e+02 2.437e+02 3.443e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 02:20:21,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=535794.6666666666, ans=0.0 2024-09-17 02:20:22,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=535794.6666666666, ans=0.125 2024-09-17 02:21:03,366 INFO [train.py:1198] (1/2) Epoch 30, batch 3800, loss[loss=0.2111, ctc_loss=0.1403, cr_loss=0.3537, over 20775.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3728, over 4094940.59 frames. ], batch size: 56, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:21:21,847 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:21:21,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=535908.0, ans=0.2 2024-09-17 02:21:29,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=535908.0, ans=0.125 2024-09-17 02:21:49,403 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:22:01,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=535964.6666666666, ans=0.125 2024-09-17 02:22:05,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=535993.0, ans=0.125 2024-09-17 02:22:05,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=535993.0, ans=0.05 2024-09-17 02:22:10,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=535993.0, ans=0.125 2024-09-17 02:22:15,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-17 02:22:19,343 INFO [train.py:1198] (1/2) Epoch 30, batch 3850, loss[loss=0.2028, ctc_loss=0.1346, cr_loss=0.3409, over 20867.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3723, over 4092806.62 frames. ], batch size: 57, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:22:19,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536021.3333333334, ans=0.1 2024-09-17 02:22:34,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.116e+02 2.273e+02 2.465e+02 5.111e+02, threshold=4.545e+02, percent-clipped=1.0 2024-09-17 02:22:57,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=536078.0, ans=0.0 2024-09-17 02:23:10,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=536106.3333333334, ans=0.2 2024-09-17 02:23:36,062 INFO [train.py:1198] (1/2) Epoch 30, batch 3900, loss[loss=0.2155, ctc_loss=0.1429, cr_loss=0.363, over 20782.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1487, cr_loss=0.3706, over 4094940.85 frames. ], batch size: 56, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:23:58,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=536191.3333333334, ans=0.0 2024-09-17 02:24:02,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=536191.3333333334, ans=0.125 2024-09-17 02:24:06,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=536191.3333333334, ans=0.0 2024-09-17 02:24:14,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=536219.6666666666, ans=0.05 2024-09-17 02:24:38,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=536276.3333333334, ans=0.125 2024-09-17 02:24:49,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 02:24:50,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=536276.3333333334, ans=0.125 2024-09-17 02:24:54,544 INFO [train.py:1198] (1/2) Epoch 30, batch 3950, loss[loss=0.265, ctc_loss=0.1794, cr_loss=0.4282, over 20971.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1493, cr_loss=0.3713, over 4086993.64 frames. ], batch size: 64, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:25:02,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=536304.6666666666, ans=0.0 2024-09-17 02:25:04,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-09-17 02:25:09,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.134e+02 2.247e+02 2.389e+02 4.042e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-17 02:25:51,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=536389.6666666666, ans=0.05 2024-09-17 02:26:13,950 INFO [train.py:1198] (1/2) Epoch 30, batch 4000, loss[loss=0.2506, ctc_loss=0.1653, cr_loss=0.4267, over 20670.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3726, over 4100107.36 frames. ], batch size: 71, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:26:48,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536503.0, ans=0.1 2024-09-17 02:27:16,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=536559.6666666666, ans=0.125 2024-09-17 02:27:30,171 INFO [train.py:1198] (1/2) Epoch 30, batch 4050, loss[loss=0.2214, ctc_loss=0.1465, cr_loss=0.3744, over 20894.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3724, over 4094979.08 frames. ], batch size: 57, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:27:35,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=536588.0, ans=0.0 2024-09-17 02:27:45,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.180e+02 2.307e+02 2.416e+02 4.153e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 02:27:46,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-09-17 02:27:55,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536616.3333333334, ans=0.1 2024-09-17 02:28:04,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=536644.6666666666, ans=0.125 2024-09-17 02:28:18,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=536673.0, ans=0.125 2024-09-17 02:28:22,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536673.0, ans=0.125 2024-09-17 02:28:24,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=536673.0, ans=0.2 2024-09-17 02:28:29,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536701.3333333334, ans=0.1 2024-09-17 02:28:45,817 INFO [train.py:1198] (1/2) Epoch 30, batch 4100, loss[loss=0.2309, ctc_loss=0.1533, cr_loss=0.3878, over 20699.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.149, cr_loss=0.3712, over 4085881.86 frames. ], batch size: 68, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:28:52,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=536729.6666666666, ans=0.125 2024-09-17 02:29:00,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-17 02:29:11,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.12 vs. limit=10.0 2024-09-17 02:29:28,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=536786.3333333334, ans=0.125 2024-09-17 02:29:40,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=536814.6666666666, ans=0.2 2024-09-17 02:29:44,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536814.6666666666, ans=0.1 2024-09-17 02:29:53,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536843.0, ans=0.1 2024-09-17 02:30:01,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=536843.0, ans=0.2 2024-09-17 02:30:03,838 INFO [train.py:1198] (1/2) Epoch 30, batch 4150, loss[loss=0.2042, ctc_loss=0.1337, cr_loss=0.3527, over 21062.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3718, over 4101828.24 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:30:18,963 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.090e+02 2.210e+02 2.360e+02 3.416e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-17 02:30:28,480 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:30:37,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=536928.0, ans=0.125 2024-09-17 02:30:37,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536928.0, ans=0.1 2024-09-17 02:31:22,532 INFO [train.py:1198] (1/2) Epoch 30, batch 4200, loss[loss=0.2646, ctc_loss=0.1828, cr_loss=0.4088, over 18370.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.15, cr_loss=0.3733, over 4097285.09 frames. ], batch size: 108, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:31:44,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=537041.3333333334, ans=0.0 2024-09-17 02:32:38,263 INFO [train.py:1198] (1/2) Epoch 30, batch 4250, loss[loss=0.2177, ctc_loss=0.1471, cr_loss=0.353, over 20785.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3733, over 4099144.82 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:32:40,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2024-09-17 02:32:53,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.166e+02 2.280e+02 2.499e+02 3.236e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 02:33:39,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537268.0, ans=0.125 2024-09-17 02:33:53,812 INFO [train.py:1198] (1/2) Epoch 30, batch 4300, loss[loss=0.2578, ctc_loss=0.1738, cr_loss=0.4198, over 19378.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3747, over 4095049.77 frames. ], batch size: 90, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:34:15,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537324.6666666666, ans=0.1 2024-09-17 02:34:21,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=537324.6666666666, ans=0.025 2024-09-17 02:34:27,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-17 02:35:13,348 INFO [train.py:1198] (1/2) Epoch 30, batch 4350, loss[loss=0.2441, ctc_loss=0.1639, cr_loss=0.4013, over 20967.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3749, over 4091750.92 frames. ], batch size: 64, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:35:26,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-17 02:35:27,243 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:35:28,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.215e+02 2.336e+02 2.568e+02 4.986e+02, threshold=4.672e+02, percent-clipped=1.0 2024-09-17 02:35:33,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=537466.3333333334, ans=0.025 2024-09-17 02:36:00,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537523.0, ans=0.1 2024-09-17 02:36:20,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=537551.3333333334, ans=0.125 2024-09-17 02:36:26,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537551.3333333334, ans=0.125 2024-09-17 02:36:29,265 INFO [train.py:1198] (1/2) Epoch 30, batch 4400, loss[loss=0.2138, ctc_loss=0.1418, cr_loss=0.36, over 20947.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1507, cr_loss=0.3749, over 4089745.63 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:36:56,976 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:37:05,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=537636.3333333334, ans=0.125 2024-09-17 02:37:12,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537636.3333333334, ans=0.1 2024-09-17 02:37:35,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=537693.0, ans=0.125 2024-09-17 02:37:48,851 INFO [train.py:1198] (1/2) Epoch 30, batch 4450, loss[loss=0.2151, ctc_loss=0.1409, cr_loss=0.371, over 21063.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.374, over 4100602.48 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:37:50,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=537721.3333333334, ans=0.5 2024-09-17 02:38:04,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.157e+02 2.310e+02 2.428e+02 3.096e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 02:38:50,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=537834.6666666666, ans=0.04949747468305833 2024-09-17 02:38:55,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=537834.6666666666, ans=0.0 2024-09-17 02:39:01,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537834.6666666666, ans=0.125 2024-09-17 02:39:05,432 INFO [train.py:1198] (1/2) Epoch 30, batch 4500, loss[loss=0.2274, ctc_loss=0.1513, cr_loss=0.3808, over 20313.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.3731, over 4096297.90 frames. ], batch size: 74, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:39:16,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=537863.0, ans=0.04949747468305833 2024-09-17 02:39:36,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537919.6666666666, ans=0.1 2024-09-17 02:39:36,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-17 02:39:45,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=537919.6666666666, ans=10.0 2024-09-17 02:39:51,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-17 02:40:19,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-17 02:40:21,809 INFO [train.py:1198] (1/2) Epoch 30, batch 4550, loss[loss=0.2088, ctc_loss=0.1381, cr_loss=0.3537, over 21010.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1496, cr_loss=0.3727, over 4096961.91 frames. ], batch size: 61, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:40:22,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=538004.6666666666, ans=0.0 2024-09-17 02:40:28,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2024-09-17 02:40:38,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.115e+02 2.239e+02 2.437e+02 3.730e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-17 02:41:16,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=538089.6666666666, ans=0.125 2024-09-17 02:41:40,230 INFO [train.py:1198] (1/2) Epoch 30, batch 4600, loss[loss=0.211, ctc_loss=0.1391, cr_loss=0.3593, over 20786.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3741, over 4089873.52 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:42:05,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.20 vs. limit=10.0 2024-09-17 02:42:33,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-09-17 02:42:36,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=538231.3333333334, ans=0.05 2024-09-17 02:42:59,196 INFO [train.py:1198] (1/2) Epoch 30, batch 4650, loss[loss=0.2443, ctc_loss=0.1632, cr_loss=0.4059, over 20261.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3745, over 4094411.54 frames. ], batch size: 74, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:43:13,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-17 02:43:15,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.100e+02 2.325e+02 2.528e+02 3.398e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 02:43:38,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538344.6666666666, ans=0.125 2024-09-17 02:43:49,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538373.0, ans=0.125 2024-09-17 02:43:50,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538373.0, ans=0.125 2024-09-17 02:43:57,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-17 02:44:08,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=538401.3333333334, ans=0.0 2024-09-17 02:44:14,743 INFO [train.py:1198] (1/2) Epoch 30, batch 4700, loss[loss=0.2308, ctc_loss=0.1546, cr_loss=0.381, over 20815.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3751, over 4084112.28 frames. ], batch size: 59, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:44:27,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=538429.6666666666, ans=0.035 2024-09-17 02:44:28,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=538458.0, ans=0.125 2024-09-17 02:44:31,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538458.0, ans=0.1 2024-09-17 02:44:32,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-17 02:44:42,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=538458.0, ans=0.0 2024-09-17 02:44:54,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538486.3333333334, ans=0.125 2024-09-17 02:44:57,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=538486.3333333334, ans=0.125 2024-09-17 02:45:30,722 INFO [train.py:1198] (1/2) Epoch 30, batch 4750, loss[loss=0.2172, ctc_loss=0.1446, cr_loss=0.363, over 21062.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.3747, over 4076571.45 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:45:47,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.139e+02 2.224e+02 2.394e+02 3.412e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-17 02:45:47,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=538599.6666666666, ans=0.0 2024-09-17 02:46:11,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=538628.0, ans=0.125 2024-09-17 02:46:21,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=538656.3333333334, ans=15.0 2024-09-17 02:46:21,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-17 02:46:49,144 INFO [train.py:1198] (1/2) Epoch 30, batch 4800, loss[loss=0.2364, ctc_loss=0.1593, cr_loss=0.3854, over 21085.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1514, cr_loss=0.3757, over 4090189.65 frames. ], batch size: 59, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:47:01,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=538713.0, ans=0.125 2024-09-17 02:47:04,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=538741.3333333334, ans=0.0 2024-09-17 02:47:04,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=538741.3333333334, ans=0.0 2024-09-17 02:47:20,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=538769.6666666666, ans=10.0 2024-09-17 02:48:07,438 INFO [train.py:1198] (1/2) Epoch 30, batch 4850, loss[loss=0.225, ctc_loss=0.1494, cr_loss=0.3782, over 20969.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3768, over 4087129.86 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:48:25,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.164e+02 2.334e+02 2.583e+02 4.515e+02, threshold=4.668e+02, percent-clipped=2.0 2024-09-17 02:48:55,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=538939.6666666666, ans=0.05 2024-09-17 02:49:23,181 INFO [train.py:1198] (1/2) Epoch 30, batch 4900, loss[loss=0.2338, ctc_loss=0.1572, cr_loss=0.3828, over 21025.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3748, over 4085712.49 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:50:37,793 INFO [train.py:1198] (1/2) Epoch 30, batch 4950, loss[loss=0.2363, ctc_loss=0.1565, cr_loss=0.399, over 21084.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1518, cr_loss=0.3767, over 4092840.87 frames. ], batch size: 59, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:50:57,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.145e+02 2.240e+02 2.413e+02 3.481e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 02:50:57,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=539166.3333333334, ans=0.125 2024-09-17 02:51:06,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=539194.6666666666, ans=0.2 2024-09-17 02:51:28,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=539223.0, ans=0.125 2024-09-17 02:51:33,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=539223.0, ans=0.125 2024-09-17 02:51:35,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=22.5 2024-09-17 02:51:52,506 INFO [train.py:1198] (1/2) Epoch 30, batch 5000, loss[loss=0.2253, ctc_loss=0.1502, cr_loss=0.3755, over 20278.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1516, cr_loss=0.3764, over 4088249.30 frames. ], batch size: 74, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:52:37,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=539364.6666666666, ans=0.0 2024-09-17 02:53:02,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=539393.0, ans=0.125 2024-09-17 02:53:07,310 INFO [train.py:1198] (1/2) Epoch 30, batch 5050, loss[loss=0.2245, ctc_loss=0.1505, cr_loss=0.3704, over 20955.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3758, over 4094701.27 frames. ], batch size: 67, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:53:26,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.163e+02 2.269e+02 2.419e+02 4.295e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-17 02:54:23,721 INFO [train.py:1198] (1/2) Epoch 30, batch 5100, loss[loss=0.2323, ctc_loss=0.1572, cr_loss=0.376, over 20786.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1514, cr_loss=0.3761, over 4096242.37 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:54:46,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=539591.3333333334, ans=0.125 2024-09-17 02:54:55,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=539619.6666666666, ans=0.0 2024-09-17 02:55:37,750 INFO [train.py:1198] (1/2) Epoch 30, batch 5150, loss[loss=0.2537, ctc_loss=0.1704, cr_loss=0.4165, over 21013.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3758, over 4096188.36 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:55:38,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-17 02:55:54,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=539733.0, ans=0.025 2024-09-17 02:55:57,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.156e+02 2.270e+02 2.462e+02 5.193e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-17 02:56:03,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=539733.0, ans=0.0 2024-09-17 02:56:04,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-17 02:56:04,914 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:56:28,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=539789.6666666666, ans=0.125 2024-09-17 02:56:46,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=15.0 2024-09-17 02:56:51,830 INFO [train.py:1198] (1/2) Epoch 30, batch 5200, loss[loss=0.2068, ctc_loss=0.1352, cr_loss=0.358, over 20938.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1524, cr_loss=0.3784, over 4105099.42 frames. ], batch size: 51, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:56:52,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-17 02:57:08,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=539874.6666666666, ans=0.0 2024-09-17 02:57:42,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=539931.3333333334, ans=0.125 2024-09-17 02:58:00,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=539959.6666666666, ans=0.2 2024-09-17 02:58:09,115 INFO [train.py:1198] (1/2) Epoch 30, batch 5250, loss[loss=0.1925, ctc_loss=0.1276, cr_loss=0.3242, over 21006.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1524, cr_loss=0.3782, over 4108322.68 frames. ], batch size: 52, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:58:16,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=539988.0, ans=0.125 2024-09-17 02:58:28,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.145e+02 2.254e+02 2.424e+02 4.153e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-17 02:58:33,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=540016.3333333334, ans=0.04949747468305833 2024-09-17 02:58:42,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=540044.6666666666, ans=0.0 2024-09-17 02:58:45,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=540044.6666666666, ans=0.025 2024-09-17 02:59:00,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=540073.0, ans=0.2 2024-09-17 02:59:01,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540073.0, ans=0.1 2024-09-17 02:59:04,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=540073.0, ans=0.025 2024-09-17 02:59:23,864 INFO [train.py:1198] (1/2) Epoch 30, batch 5300, loss[loss=0.1825, ctc_loss=0.1158, cr_loss=0.3337, over 20935.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1515, cr_loss=0.3758, over 4108969.83 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:59:36,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=540129.6666666666, ans=0.0 2024-09-17 02:59:43,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=540158.0, ans=0.125 2024-09-17 02:59:56,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=540186.3333333334, ans=0.2 2024-09-17 03:00:15,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-09-17 03:00:25,765 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:00:38,823 INFO [train.py:1198] (1/2) Epoch 30, batch 5350, loss[loss=0.2329, ctc_loss=0.1556, cr_loss=0.3869, over 20976.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1508, cr_loss=0.375, over 4105106.54 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:00:55,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 03:00:58,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.139e+02 2.237e+02 2.366e+02 2.975e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-17 03:01:23,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=540356.3333333334, ans=0.2 2024-09-17 03:01:40,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=540384.6666666666, ans=0.025 2024-09-17 03:01:53,448 INFO [train.py:1198] (1/2) Epoch 30, batch 5400, loss[loss=0.1907, ctc_loss=0.126, cr_loss=0.3231, over 19875.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3739, over 4109145.68 frames. ], batch size: 44, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:02:00,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=540413.0, ans=0.0 2024-09-17 03:02:13,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=540441.3333333334, ans=0.125 2024-09-17 03:02:24,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=15.0 2024-09-17 03:02:43,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=540498.0, ans=0.125 2024-09-17 03:03:08,404 INFO [train.py:1198] (1/2) Epoch 30, batch 5450, loss[loss=0.2084, ctc_loss=0.1383, cr_loss=0.3505, over 20995.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.3746, over 4097566.22 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:03:27,843 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.169e+02 2.265e+02 2.380e+02 3.417e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-17 03:03:38,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540611.3333333334, ans=0.125 2024-09-17 03:04:22,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=540668.0, ans=0.125 2024-09-17 03:04:25,197 INFO [train.py:1198] (1/2) Epoch 30, batch 5500, loss[loss=0.2418, ctc_loss=0.1639, cr_loss=0.3894, over 19274.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1503, cr_loss=0.3745, over 4099595.45 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:04:29,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-17 03:04:43,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=540724.6666666666, ans=0.125 2024-09-17 03:04:51,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=15.0 2024-09-17 03:05:14,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=540781.3333333334, ans=0.2 2024-09-17 03:05:39,862 INFO [train.py:1198] (1/2) Epoch 30, batch 5550, loss[loss=0.2215, ctc_loss=0.1479, cr_loss=0.3678, over 20839.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3737, over 4099278.02 frames. ], batch size: 65, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:05:43,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=540838.0, ans=0.125 2024-09-17 03:05:58,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=540866.3333333334, ans=0.125 2024-09-17 03:05:59,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.136e+02 2.246e+02 2.482e+02 5.208e+02, threshold=4.491e+02, percent-clipped=1.0 2024-09-17 03:06:02,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=540866.3333333334, ans=0.2 2024-09-17 03:06:32,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=540923.0, ans=0.125 2024-09-17 03:06:57,182 INFO [train.py:1198] (1/2) Epoch 30, batch 5600, loss[loss=0.2297, ctc_loss=0.1492, cr_loss=0.4025, over 20978.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3741, over 4092999.15 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:07:03,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=540979.6666666666, ans=0.125 2024-09-17 03:07:46,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-09-17 03:07:59,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=541093.0, ans=0.125 2024-09-17 03:08:03,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=541093.0, ans=0.125 2024-09-17 03:08:05,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-17 03:08:11,549 INFO [train.py:1198] (1/2) Epoch 30, batch 5650, loss[loss=0.2785, ctc_loss=0.1971, cr_loss=0.407, over 14295.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3725, over 4081899.00 frames. ], batch size: 149, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:08:20,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=541121.3333333334, ans=0.0 2024-09-17 03:08:30,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.219e+02 2.344e+02 2.549e+02 4.118e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-17 03:08:34,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=541149.6666666666, ans=0.0 2024-09-17 03:09:26,216 INFO [train.py:1198] (1/2) Epoch 30, batch 5700, loss[loss=0.2708, ctc_loss=0.1858, cr_loss=0.425, over 19310.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.375, over 4065498.34 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:09:55,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=541319.6666666666, ans=0.0 2024-09-17 03:10:07,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=541319.6666666666, ans=0.2 2024-09-17 03:10:10,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=541348.0, ans=0.125 2024-09-17 03:10:14,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=541348.0, ans=0.0 2024-09-17 03:10:19,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=22.5 2024-09-17 03:10:41,357 INFO [train.py:1198] (1/2) Epoch 30, batch 5750, loss[loss=0.2364, ctc_loss=0.1578, cr_loss=0.393, over 20580.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3749, over 4081939.39 frames. ], batch size: 75, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:10:53,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=541404.6666666666, ans=0.07 2024-09-17 03:11:02,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.112e+02 2.229e+02 2.445e+02 4.655e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-17 03:11:37,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=541489.6666666666, ans=0.125 2024-09-17 03:11:56,086 INFO [train.py:1198] (1/2) Epoch 30, batch 5800, loss[loss=0.2242, ctc_loss=0.1487, cr_loss=0.3775, over 21094.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3747, over 4097612.76 frames. ], batch size: 59, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:12:03,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541546.3333333334, ans=0.1 2024-09-17 03:12:08,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=541546.3333333334, ans=0.125 2024-09-17 03:12:18,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541574.6666666666, ans=0.1 2024-09-17 03:12:23,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541574.6666666666, ans=0.1 2024-09-17 03:13:13,493 INFO [train.py:1198] (1/2) Epoch 30, batch 5850, loss[loss=0.2244, ctc_loss=0.149, cr_loss=0.377, over 20852.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3738, over 4103318.92 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:13:16,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=541688.0, ans=0.125 2024-09-17 03:13:22,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=541688.0, ans=0.2 2024-09-17 03:13:32,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-17 03:13:34,357 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.173e+02 2.328e+02 2.470e+02 4.926e+02, threshold=4.656e+02, percent-clipped=1.0 2024-09-17 03:13:48,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-09-17 03:14:02,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2024-09-17 03:14:06,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=541773.0, ans=0.2 2024-09-17 03:14:28,778 INFO [train.py:1198] (1/2) Epoch 30, batch 5900, loss[loss=0.276, ctc_loss=0.1907, cr_loss=0.4266, over 14165.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3735, over 4098556.88 frames. ], batch size: 149, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:14:30,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-17 03:14:47,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-09-17 03:14:49,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=541858.0, ans=0.0 2024-09-17 03:14:57,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=541886.3333333334, ans=0.125 2024-09-17 03:15:04,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541886.3333333334, ans=0.1 2024-09-17 03:15:33,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=541943.0, ans=0.2 2024-09-17 03:15:36,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=541943.0, ans=0.0 2024-09-17 03:15:45,428 INFO [train.py:1198] (1/2) Epoch 30, batch 5950, loss[loss=0.2371, ctc_loss=0.1588, cr_loss=0.3915, over 20778.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3739, over 4099900.95 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:15:55,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=541971.3333333334, ans=0.0 2024-09-17 03:16:06,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.125e+02 2.235e+02 2.360e+02 2.949e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-17 03:16:19,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542028.0, ans=0.1 2024-09-17 03:16:28,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=542056.3333333334, ans=0.025 2024-09-17 03:16:41,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-17 03:16:55,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542084.6666666666, ans=0.125 2024-09-17 03:16:59,603 INFO [train.py:1198] (1/2) Epoch 30, batch 6000, loss[loss=0.2675, ctc_loss=0.1905, cr_loss=0.3849, over 14095.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3743, over 4093281.30 frames. ], batch size: 149, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:16:59,603 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 03:17:22,468 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.04091, ctc_loss=0.04091, cr_loss=1.274e-14, over 944034.00 frames. 2024-09-17 03:17:22,469 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 03:17:27,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-17 03:18:36,980 INFO [train.py:1198] (1/2) Epoch 30, batch 6050, loss[loss=0.2101, ctc_loss=0.1382, cr_loss=0.3597, over 20974.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3751, over 4100109.03 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:18:57,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.198e+02 2.318e+02 2.450e+02 3.539e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-17 03:19:17,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542311.3333333334, ans=0.1 2024-09-17 03:19:52,774 INFO [train.py:1198] (1/2) Epoch 30, batch 6100, loss[loss=0.2198, ctc_loss=0.148, cr_loss=0.3594, over 20788.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3767, over 4104549.20 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:20:04,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=22.5 2024-09-17 03:20:36,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=542453.0, ans=0.125 2024-09-17 03:20:39,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-17 03:21:09,161 INFO [train.py:1198] (1/2) Epoch 30, batch 6150, loss[loss=0.2057, ctc_loss=0.1339, cr_loss=0.3587, over 21044.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.375, over 4100570.70 frames. ], batch size: 52, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:21:16,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=542538.0, ans=0.0 2024-09-17 03:21:29,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.183e+02 2.320e+02 2.515e+02 4.172e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 03:21:46,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=542594.6666666666, ans=0.0 2024-09-17 03:21:50,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=542594.6666666666, ans=0.0 2024-09-17 03:22:04,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.18 vs. limit=10.0 2024-09-17 03:22:18,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=542651.3333333334, ans=0.125 2024-09-17 03:22:22,945 INFO [train.py:1198] (1/2) Epoch 30, batch 6200, loss[loss=0.2708, ctc_loss=0.1815, cr_loss=0.4463, over 20855.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.376, over 4081662.71 frames. ], batch size: 65, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:22:39,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542708.0, ans=0.125 2024-09-17 03:23:34,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=542793.0, ans=0.2 2024-09-17 03:23:34,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=542793.0, ans=0.0 2024-09-17 03:23:37,121 INFO [train.py:1198] (1/2) Epoch 30, batch 6250, loss[loss=0.2668, ctc_loss=0.1819, cr_loss=0.4244, over 18123.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3766, over 4041266.52 frames. ], batch size: 108, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:23:52,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=542849.6666666666, ans=0.2 2024-09-17 03:23:57,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.236e+02 2.421e+02 2.583e+02 4.278e+02, threshold=4.841e+02, percent-clipped=0.0 2024-09-17 03:24:50,049 INFO [train.py:1198] (1/2) Epoch 30, batch 6300, loss[loss=0.248, ctc_loss=0.1681, cr_loss=0.3995, over 18255.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1548, cr_loss=0.3783, over 3986598.06 frames. ], batch size: 108, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:25:02,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=542991.3333333334, ans=0.125 2024-09-17 03:25:50,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=543076.3333333334, ans=0.0 2024-09-17 03:26:00,269 INFO [train.py:1198] (1/2) Epoch 30, batch 6350, loss[loss=0.2799, ctc_loss=0.1951, cr_loss=0.4244, over 14342.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1602, cr_loss=0.3822, over 3800053.32 frames. ], batch size: 149, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:26:03,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=543104.6666666666, ans=0.04949747468305833 2024-09-17 03:26:17,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-09-17 03:26:18,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-17 03:26:20,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.478e+02 2.628e+02 2.838e+02 3.418e+02, threshold=5.255e+02, percent-clipped=0.0 2024-09-17 03:26:51,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=543189.6666666666, ans=0.0 2024-09-17 03:27:50,477 INFO [train.py:1198] (1/2) Epoch 31, batch 0, loss[loss=0.1948, ctc_loss=0.1252, cr_loss=0.3482, over 20971.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1252, cr_loss=0.3482, over 20971.00 frames. ], batch size: 51, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:27:50,477 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 03:28:09,056 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.04055, ctc_loss=0.04055, cr_loss=1.258e-14, over 944034.00 frames. 2024-09-17 03:28:09,057 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 03:28:15,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=543220.8333333334, ans=0.125 2024-09-17 03:28:40,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2024-09-17 03:28:58,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2024-09-17 03:29:01,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-17 03:29:03,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=543305.8333333334, ans=0.125 2024-09-17 03:29:09,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=543334.1666666666, ans=0.035 2024-09-17 03:29:15,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=543334.1666666666, ans=0.02 2024-09-17 03:29:24,171 INFO [train.py:1198] (1/2) Epoch 31, batch 50, loss[loss=0.1956, ctc_loss=0.1294, cr_loss=0.331, over 19923.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1483, cr_loss=0.3693, over 921682.06 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:29:26,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=543362.5, ans=0.04949747468305833 2024-09-17 03:30:00,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.128e+02 2.271e+02 2.598e+02 3.725e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 03:30:07,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=543419.1666666666, ans=0.0 2024-09-17 03:30:08,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=543419.1666666666, ans=0.125 2024-09-17 03:30:13,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=543447.5, ans=0.125 2024-09-17 03:30:16,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543447.5, ans=0.125 2024-09-17 03:30:33,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2024-09-17 03:30:41,917 INFO [train.py:1198] (1/2) Epoch 31, batch 100, loss[loss=0.2409, ctc_loss=0.1638, cr_loss=0.3853, over 20665.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1492, cr_loss=0.3718, over 1628330.84 frames. ], batch size: 68, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:31:03,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=543532.5, ans=0.125 2024-09-17 03:31:13,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=543560.8333333334, ans=0.0 2024-09-17 03:31:58,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=543645.8333333334, ans=0.2 2024-09-17 03:31:59,816 INFO [train.py:1198] (1/2) Epoch 31, batch 150, loss[loss=0.1711, ctc_loss=0.1109, cr_loss=0.3009, over 21077.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1494, cr_loss=0.3726, over 2178597.04 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:32:21,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-17 03:32:34,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.119e+02 2.302e+02 2.446e+02 8.776e+02, threshold=4.604e+02, percent-clipped=1.0 2024-09-17 03:32:57,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=543730.8333333334, ans=0.0 2024-09-17 03:33:15,075 INFO [train.py:1198] (1/2) Epoch 31, batch 200, loss[loss=0.2609, ctc_loss=0.1763, cr_loss=0.4233, over 20774.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3735, over 2588289.80 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:33:20,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-09-17 03:33:30,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=543815.8333333334, ans=10.0 2024-09-17 03:33:33,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-17 03:33:37,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-17 03:33:54,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=543844.1666666666, ans=0.04949747468305833 2024-09-17 03:34:30,230 INFO [train.py:1198] (1/2) Epoch 31, batch 250, loss[loss=0.2058, ctc_loss=0.1349, cr_loss=0.3545, over 20931.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.149, cr_loss=0.3719, over 2921820.40 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:34:35,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=543929.1666666666, ans=0.0 2024-09-17 03:34:48,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=543957.5, ans=0.025 2024-09-17 03:35:00,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=543985.8333333334, ans=0.125 2024-09-17 03:35:04,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.139e+02 2.233e+02 2.438e+02 5.737e+02, threshold=4.465e+02, percent-clipped=1.0 2024-09-17 03:35:46,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=544042.5, ans=0.2 2024-09-17 03:35:49,371 INFO [train.py:1198] (1/2) Epoch 31, batch 300, loss[loss=0.2239, ctc_loss=0.1487, cr_loss=0.3762, over 20960.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3753, over 3174003.63 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:36:45,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=544155.8333333334, ans=0.0 2024-09-17 03:37:01,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-17 03:37:08,426 INFO [train.py:1198] (1/2) Epoch 31, batch 350, loss[loss=0.234, ctc_loss=0.1546, cr_loss=0.3969, over 21078.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3747, over 3380219.39 frames. ], batch size: 59, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:37:17,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544212.5, ans=0.1 2024-09-17 03:37:17,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=544212.5, ans=0.0 2024-09-17 03:37:28,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=544240.8333333334, ans=0.125 2024-09-17 03:37:42,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.146e+02 2.269e+02 2.420e+02 4.863e+02, threshold=4.538e+02, percent-clipped=1.0 2024-09-17 03:37:55,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-17 03:38:10,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-17 03:38:23,585 INFO [train.py:1198] (1/2) Epoch 31, batch 400, loss[loss=0.2198, ctc_loss=0.145, cr_loss=0.3739, over 21059.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1506, cr_loss=0.3736, over 3533593.86 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:39:01,460 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:39:12,277 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:39:23,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=544467.5, ans=0.125 2024-09-17 03:39:26,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544467.5, ans=0.125 2024-09-17 03:39:28,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=544467.5, ans=0.0 2024-09-17 03:39:39,514 INFO [train.py:1198] (1/2) Epoch 31, batch 450, loss[loss=0.1976, ctc_loss=0.1317, cr_loss=0.3295, over 20976.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1497, cr_loss=0.372, over 3653840.52 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:39:40,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=22.5 2024-09-17 03:40:15,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.124e+02 2.223e+02 2.373e+02 3.148e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-17 03:40:16,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=544552.5, ans=0.0 2024-09-17 03:40:41,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=544609.1666666666, ans=0.025 2024-09-17 03:40:44,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=544609.1666666666, ans=0.2 2024-09-17 03:40:53,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=544609.1666666666, ans=0.0 2024-09-17 03:40:53,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=544609.1666666666, ans=0.2 2024-09-17 03:40:56,159 INFO [train.py:1198] (1/2) Epoch 31, batch 500, loss[loss=0.2006, ctc_loss=0.1315, cr_loss=0.3456, over 20795.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1501, cr_loss=0.3724, over 3748481.55 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:41:03,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-17 03:41:06,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-17 03:41:25,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-17 03:41:59,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=544750.8333333334, ans=0.0 2024-09-17 03:42:04,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=544750.8333333334, ans=0.0 2024-09-17 03:42:12,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=544779.1666666666, ans=0.125 2024-09-17 03:42:14,247 INFO [train.py:1198] (1/2) Epoch 31, batch 550, loss[loss=0.2004, ctc_loss=0.1318, cr_loss=0.3427, over 19889.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3743, over 3827685.23 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:42:26,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=544779.1666666666, ans=0.09899494936611666 2024-09-17 03:42:52,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.124e+02 2.240e+02 2.429e+02 3.419e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-17 03:42:57,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:43:16,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=544892.5, ans=0.125 2024-09-17 03:43:30,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=544892.5, ans=0.2 2024-09-17 03:43:33,215 INFO [train.py:1198] (1/2) Epoch 31, batch 600, loss[loss=0.202, ctc_loss=0.1324, cr_loss=0.3479, over 20911.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1495, cr_loss=0.3731, over 3899071.89 frames. ], batch size: 54, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:43:39,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-17 03:44:42,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545034.1666666666, ans=0.1 2024-09-17 03:44:48,534 INFO [train.py:1198] (1/2) Epoch 31, batch 650, loss[loss=0.1886, ctc_loss=0.1195, cr_loss=0.3452, over 20977.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.373, over 3935809.26 frames. ], batch size: 49, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:45:03,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=545090.8333333334, ans=0.0 2024-09-17 03:45:22,994 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.106e+02 2.256e+02 2.442e+02 2.933e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-17 03:46:03,462 INFO [train.py:1198] (1/2) Epoch 31, batch 700, loss[loss=0.2236, ctc_loss=0.1512, cr_loss=0.3624, over 20697.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3741, over 3972158.95 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:46:11,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545204.1666666666, ans=0.1 2024-09-17 03:46:11,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=545204.1666666666, ans=0.0 2024-09-17 03:46:39,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=545260.8333333334, ans=0.0 2024-09-17 03:47:07,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545317.5, ans=0.125 2024-09-17 03:47:22,003 INFO [train.py:1198] (1/2) Epoch 31, batch 750, loss[loss=0.1992, ctc_loss=0.1299, cr_loss=0.3468, over 20989.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3744, over 4008572.98 frames. ], batch size: 50, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:47:34,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=545345.8333333334, ans=0.0 2024-09-17 03:47:47,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=545374.1666666666, ans=0.0 2024-09-17 03:47:56,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.156e+02 2.302e+02 2.451e+02 2.970e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 03:48:11,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=545430.8333333334, ans=0.0 2024-09-17 03:48:22,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=545459.1666666666, ans=0.0 2024-09-17 03:48:39,988 INFO [train.py:1198] (1/2) Epoch 31, batch 800, loss[loss=0.2009, ctc_loss=0.133, cr_loss=0.3394, over 21030.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3721, over 4039817.20 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:48:56,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=545515.8333333334, ans=0.0 2024-09-17 03:49:11,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=545544.1666666666, ans=0.05 2024-09-17 03:49:20,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=545544.1666666666, ans=0.0 2024-09-17 03:49:22,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-17 03:49:48,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=545600.8333333334, ans=0.125 2024-09-17 03:49:55,530 INFO [train.py:1198] (1/2) Epoch 31, batch 850, loss[loss=0.2291, ctc_loss=0.1532, cr_loss=0.3796, over 20322.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1482, cr_loss=0.3716, over 4061376.33 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:50:22,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=545657.5, ans=0.0 2024-09-17 03:50:30,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.121e+02 2.292e+02 2.437e+02 3.125e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 03:51:10,863 INFO [train.py:1198] (1/2) Epoch 31, batch 900, loss[loss=0.2462, ctc_loss=0.1656, cr_loss=0.4034, over 20929.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3728, over 4057404.41 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:51:33,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=545799.1666666666, ans=0.0 2024-09-17 03:51:53,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=545827.5, ans=0.125 2024-09-17 03:52:07,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=545855.8333333334, ans=0.125 2024-09-17 03:52:29,754 INFO [train.py:1198] (1/2) Epoch 31, batch 950, loss[loss=0.185, ctc_loss=0.121, cr_loss=0.3201, over 20944.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3747, over 4067438.21 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:52:43,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=545940.8333333334, ans=0.025 2024-09-17 03:53:04,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.183e+02 2.329e+02 2.428e+02 3.182e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-17 03:53:10,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:53:44,833 INFO [train.py:1198] (1/2) Epoch 31, batch 1000, loss[loss=0.2192, ctc_loss=0.1439, cr_loss=0.3763, over 20980.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.374, over 4077210.05 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:54:25,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546110.8333333334, ans=0.125 2024-09-17 03:54:49,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=546167.5, ans=0.0 2024-09-17 03:55:03,073 INFO [train.py:1198] (1/2) Epoch 31, batch 1050, loss[loss=0.2007, ctc_loss=0.1303, cr_loss=0.3518, over 21005.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1496, cr_loss=0.3731, over 4078331.67 frames. ], batch size: 51, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:55:14,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2024-09-17 03:55:21,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=546224.1666666666, ans=0.125 2024-09-17 03:55:22,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-17 03:55:24,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=546224.1666666666, ans=0.125 2024-09-17 03:55:28,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2024-09-17 03:55:38,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.166e+02 2.237e+02 2.389e+02 5.782e+02, threshold=4.475e+02, percent-clipped=2.0 2024-09-17 03:55:59,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=546280.8333333334, ans=0.07 2024-09-17 03:56:13,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=546309.1666666666, ans=0.0 2024-09-17 03:56:18,989 INFO [train.py:1198] (1/2) Epoch 31, batch 1100, loss[loss=0.2182, ctc_loss=0.1447, cr_loss=0.3675, over 20840.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3728, over 4093427.96 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:56:55,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=546394.1666666666, ans=0.025 2024-09-17 03:57:33,743 INFO [train.py:1198] (1/2) Epoch 31, batch 1150, loss[loss=0.2052, ctc_loss=0.1369, cr_loss=0.3417, over 20704.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3743, over 4088042.07 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:57:34,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=546479.1666666666, ans=0.0 2024-09-17 03:57:38,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-17 03:57:46,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=546479.1666666666, ans=0.09899494936611666 2024-09-17 03:57:49,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=546507.5, ans=0.125 2024-09-17 03:58:11,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.137e+02 2.309e+02 2.471e+02 6.425e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 03:58:18,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=546535.8333333334, ans=0.95 2024-09-17 03:58:41,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=546592.5, ans=0.125 2024-09-17 03:58:52,850 INFO [train.py:1198] (1/2) Epoch 31, batch 1200, loss[loss=0.2116, ctc_loss=0.1392, cr_loss=0.3622, over 20970.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.375, over 4088069.17 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:59:14,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=546649.1666666666, ans=0.035 2024-09-17 03:59:46,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=546705.8333333334, ans=0.125 2024-09-17 04:00:11,546 INFO [train.py:1198] (1/2) Epoch 31, batch 1250, loss[loss=0.2444, ctc_loss=0.1646, cr_loss=0.3988, over 20662.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1508, cr_loss=0.3753, over 4084116.42 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:00:34,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=546790.8333333334, ans=0.125 2024-09-17 04:00:46,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.151e+02 2.284e+02 2.422e+02 3.136e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 04:01:27,017 INFO [train.py:1198] (1/2) Epoch 31, batch 1300, loss[loss=0.2277, ctc_loss=0.1555, cr_loss=0.3611, over 20822.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3749, over 4086907.33 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:01:48,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=546932.5, ans=0.0 2024-09-17 04:01:58,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=546960.8333333334, ans=0.04949747468305833 2024-09-17 04:02:19,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=546989.1666666666, ans=0.025 2024-09-17 04:02:41,928 INFO [train.py:1198] (1/2) Epoch 31, batch 1350, loss[loss=0.2964, ctc_loss=0.209, cr_loss=0.437, over 14211.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.376, over 4065936.62 frames. ], batch size: 149, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:02:43,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=547045.8333333334, ans=0.125 2024-09-17 04:02:56,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-09-17 04:03:16,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.183e+02 2.335e+02 2.523e+02 3.587e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-17 04:03:44,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=547159.1666666666, ans=0.2 2024-09-17 04:04:00,906 INFO [train.py:1198] (1/2) Epoch 31, batch 1400, loss[loss=0.2521, ctc_loss=0.1718, cr_loss=0.4017, over 18241.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.3749, over 4081585.50 frames. ], batch size: 108, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:05:10,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.54 vs. limit=6.0 2024-09-17 04:05:15,796 INFO [train.py:1198] (1/2) Epoch 31, batch 1450, loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3802, over 19456.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3737, over 4084176.72 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:05:22,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=547329.1666666666, ans=0.0 2024-09-17 04:05:35,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547357.5, ans=0.0 2024-09-17 04:05:40,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547357.5, ans=0.125 2024-09-17 04:05:43,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-17 04:05:47,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=547385.8333333334, ans=0.0 2024-09-17 04:05:53,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.124e+02 2.269e+02 2.456e+02 3.778e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-17 04:06:20,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-17 04:06:24,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=547442.5, ans=0.025 2024-09-17 04:06:34,674 INFO [train.py:1198] (1/2) Epoch 31, batch 1500, loss[loss=0.2392, ctc_loss=0.1616, cr_loss=0.3879, over 20781.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3729, over 4084292.70 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:06:57,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=547499.1666666666, ans=0.125 2024-09-17 04:07:44,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2024-09-17 04:07:50,376 INFO [train.py:1198] (1/2) Epoch 31, batch 1550, loss[loss=0.2058, ctc_loss=0.1385, cr_loss=0.3365, over 20955.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3724, over 4098384.15 frames. ], batch size: 51, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:08:06,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-17 04:08:20,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=547669.1666666666, ans=0.0 2024-09-17 04:08:24,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.137e+02 2.218e+02 2.342e+02 3.108e+02, threshold=4.436e+02, percent-clipped=0.0 2024-09-17 04:08:32,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547669.1666666666, ans=0.125 2024-09-17 04:08:37,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547697.5, ans=0.1 2024-09-17 04:08:41,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=547697.5, ans=0.125 2024-09-17 04:08:51,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-17 04:09:05,527 INFO [train.py:1198] (1/2) Epoch 31, batch 1600, loss[loss=0.2822, ctc_loss=0.2015, cr_loss=0.4036, over 14174.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3738, over 4079302.22 frames. ], batch size: 149, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:10:12,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=22.5 2024-09-17 04:10:16,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=547867.5, ans=0.0 2024-09-17 04:10:23,629 INFO [train.py:1198] (1/2) Epoch 31, batch 1650, loss[loss=0.214, ctc_loss=0.1401, cr_loss=0.3696, over 20959.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1496, cr_loss=0.373, over 4088833.74 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:10:25,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=547895.8333333334, ans=0.2 2024-09-17 04:10:31,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=547895.8333333334, ans=0.125 2024-09-17 04:10:42,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547924.1666666666, ans=0.125 2024-09-17 04:10:58,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.128e+02 2.245e+02 2.351e+02 2.893e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-17 04:11:42,084 INFO [train.py:1198] (1/2) Epoch 31, batch 1700, loss[loss=0.2159, ctc_loss=0.143, cr_loss=0.3645, over 20879.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1489, cr_loss=0.3717, over 4095044.19 frames. ], batch size: 54, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:11:42,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=548037.5, ans=0.125 2024-09-17 04:12:09,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=548065.8333333334, ans=0.0 2024-09-17 04:12:18,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=548094.1666666666, ans=0.5 2024-09-17 04:12:20,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-17 04:12:22,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=548094.1666666666, ans=0.125 2024-09-17 04:12:30,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=548122.5, ans=0.2 2024-09-17 04:12:57,564 INFO [train.py:1198] (1/2) Epoch 31, batch 1750, loss[loss=0.2412, ctc_loss=0.1612, cr_loss=0.4002, over 20666.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3728, over 4099400.32 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:13:02,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=548179.1666666666, ans=0.125 2024-09-17 04:13:17,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=548207.5, ans=0.0 2024-09-17 04:13:25,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548207.5, ans=0.1 2024-09-17 04:13:31,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=548235.8333333334, ans=0.0 2024-09-17 04:13:33,967 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.140e+02 2.226e+02 2.346e+02 2.927e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-17 04:13:34,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=548235.8333333334, ans=0.0 2024-09-17 04:13:45,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=22.5 2024-09-17 04:13:47,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=548264.1666666666, ans=0.125 2024-09-17 04:14:12,610 INFO [train.py:1198] (1/2) Epoch 31, batch 1800, loss[loss=0.2261, ctc_loss=0.1504, cr_loss=0.3782, over 20650.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3737, over 4091071.94 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:14:25,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 04:14:41,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=548377.5, ans=0.025 2024-09-17 04:14:41,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=548377.5, ans=0.05 2024-09-17 04:14:49,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2024-09-17 04:15:31,160 INFO [train.py:1198] (1/2) Epoch 31, batch 1850, loss[loss=0.2497, ctc_loss=0.1681, cr_loss=0.4078, over 20012.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.3731, over 4098149.60 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:16:07,659 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.123e+02 2.279e+02 2.476e+02 3.054e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-17 04:16:18,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=548547.5, ans=0.125 2024-09-17 04:16:21,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=548547.5, ans=0.125 2024-09-17 04:16:47,357 INFO [train.py:1198] (1/2) Epoch 31, batch 1900, loss[loss=0.1921, ctc_loss=0.1275, cr_loss=0.3232, over 21047.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3733, over 4084315.33 frames. ], batch size: 53, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:17:16,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=22.5 2024-09-17 04:17:17,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-17 04:17:29,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=548660.8333333334, ans=0.0 2024-09-17 04:17:29,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=548660.8333333334, ans=0.0 2024-09-17 04:17:56,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=548717.5, ans=0.125 2024-09-17 04:18:05,346 INFO [train.py:1198] (1/2) Epoch 31, batch 1950, loss[loss=0.2375, ctc_loss=0.1567, cr_loss=0.4042, over 20976.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.3728, over 4086741.94 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:18:20,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=548774.1666666666, ans=0.0 2024-09-17 04:18:27,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-17 04:18:41,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.132e+02 2.239e+02 2.428e+02 5.677e+02, threshold=4.478e+02, percent-clipped=1.0 2024-09-17 04:19:21,237 INFO [train.py:1198] (1/2) Epoch 31, batch 2000, loss[loss=0.2447, ctc_loss=0.1668, cr_loss=0.3898, over 19950.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3735, over 4094732.10 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:20:08,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=548972.5, ans=0.2 2024-09-17 04:20:14,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=548972.5, ans=0.125 2024-09-17 04:20:36,396 INFO [train.py:1198] (1/2) Epoch 31, batch 2050, loss[loss=0.2097, ctc_loss=0.1375, cr_loss=0.3609, over 21053.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3757, over 4090313.59 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:21:13,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-17 04:21:17,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.145e+02 2.250e+02 2.460e+02 3.586e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-17 04:21:25,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549114.1666666666, ans=0.1 2024-09-17 04:21:54,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=22.5 2024-09-17 04:21:55,369 INFO [train.py:1198] (1/2) Epoch 31, batch 2100, loss[loss=0.2059, ctc_loss=0.1352, cr_loss=0.3536, over 20932.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3762, over 4096563.81 frames. ], batch size: 48, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:22:00,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=549170.8333333334, ans=0.125 2024-09-17 04:22:17,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=549199.1666666666, ans=0.2 2024-09-17 04:22:33,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-17 04:23:13,891 INFO [train.py:1198] (1/2) Epoch 31, batch 2150, loss[loss=0.2562, ctc_loss=0.1717, cr_loss=0.4224, over 20931.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3759, over 4105249.30 frames. ], batch size: 60, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:23:23,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-17 04:23:24,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549312.5, ans=0.1 2024-09-17 04:23:30,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=549340.8333333334, ans=0.025 2024-09-17 04:23:47,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=549369.1666666666, ans=0.2 2024-09-17 04:23:50,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=549369.1666666666, ans=0.07 2024-09-17 04:23:52,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.149e+02 2.255e+02 2.448e+02 6.767e+02, threshold=4.511e+02, percent-clipped=1.0 2024-09-17 04:24:01,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=549397.5, ans=0.125 2024-09-17 04:24:29,758 INFO [train.py:1198] (1/2) Epoch 31, batch 2200, loss[loss=0.1986, ctc_loss=0.1305, cr_loss=0.3408, over 20799.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1497, cr_loss=0.3741, over 4119451.07 frames. ], batch size: 53, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:24:37,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549454.1666666666, ans=0.125 2024-09-17 04:25:16,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=549539.1666666666, ans=15.0 2024-09-17 04:25:26,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=549539.1666666666, ans=0.09899494936611666 2024-09-17 04:25:44,318 INFO [train.py:1198] (1/2) Epoch 31, batch 2250, loss[loss=0.1958, ctc_loss=0.1276, cr_loss=0.3407, over 20966.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3741, over 4108161.58 frames. ], batch size: 49, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:25:55,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=549595.8333333334, ans=10.0 2024-09-17 04:26:22,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.195e+02 2.305e+02 2.510e+02 4.431e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-17 04:26:24,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=549652.5, ans=0.125 2024-09-17 04:26:24,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=549652.5, ans=0.0 2024-09-17 04:26:42,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=549680.8333333334, ans=0.0 2024-09-17 04:26:46,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=549709.1666666666, ans=0.0 2024-09-17 04:27:02,946 INFO [train.py:1198] (1/2) Epoch 31, batch 2300, loss[loss=0.2095, ctc_loss=0.1386, cr_loss=0.3548, over 21003.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1502, cr_loss=0.3753, over 4106512.86 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 16.0 2024-09-17 04:27:13,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=549737.5, ans=0.0 2024-09-17 04:27:22,714 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:27:31,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.54 vs. limit=10.0 2024-09-17 04:28:09,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-17 04:28:16,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549879.1666666666, ans=0.1 2024-09-17 04:28:18,267 INFO [train.py:1198] (1/2) Epoch 31, batch 2350, loss[loss=0.236, ctc_loss=0.1581, cr_loss=0.3893, over 20860.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1505, cr_loss=0.3758, over 4100530.28 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 16.0 2024-09-17 04:28:21,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2024-09-17 04:28:26,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=549879.1666666666, ans=0.125 2024-09-17 04:28:27,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.28 vs. limit=10.0 2024-09-17 04:29:00,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.124e+02 2.282e+02 2.449e+02 3.157e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 04:29:11,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=549964.1666666666, ans=0.0 2024-09-17 04:29:20,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=549992.5, ans=0.0 2024-09-17 04:29:36,883 INFO [train.py:1198] (1/2) Epoch 31, batch 2400, loss[loss=0.2217, ctc_loss=0.1443, cr_loss=0.3872, over 20799.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1509, cr_loss=0.376, over 4102837.27 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:29:37,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=550020.8333333334, ans=0.125 2024-09-17 04:29:49,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=550020.8333333334, ans=0.025 2024-09-17 04:29:52,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-09-17 04:30:34,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=550105.8333333334, ans=0.125 2024-09-17 04:30:37,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=550134.1666666666, ans=0.0 2024-09-17 04:30:49,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=550134.1666666666, ans=0.2 2024-09-17 04:30:49,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=550134.1666666666, ans=0.0 2024-09-17 04:30:52,178 INFO [train.py:1198] (1/2) Epoch 31, batch 2450, loss[loss=0.2274, ctc_loss=0.1491, cr_loss=0.3912, over 21077.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1522, cr_loss=0.3777, over 4089684.72 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:31:19,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-17 04:31:20,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=550219.1666666666, ans=0.125 2024-09-17 04:31:24,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=550219.1666666666, ans=0.0 2024-09-17 04:31:31,139 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.172e+02 2.281e+02 2.421e+02 5.125e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-17 04:31:31,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=550219.1666666666, ans=0.125 2024-09-17 04:32:07,197 INFO [train.py:1198] (1/2) Epoch 31, batch 2500, loss[loss=0.1957, ctc_loss=0.129, cr_loss=0.3339, over 20994.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1511, cr_loss=0.3759, over 4092483.89 frames. ], batch size: 52, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:33:09,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550417.5, ans=0.1 2024-09-17 04:33:16,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=550417.5, ans=0.04949747468305833 2024-09-17 04:33:25,558 INFO [train.py:1198] (1/2) Epoch 31, batch 2550, loss[loss=0.2257, ctc_loss=0.1473, cr_loss=0.3925, over 20808.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3761, over 4101261.95 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:33:28,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=550445.8333333334, ans=0.2 2024-09-17 04:33:28,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=550445.8333333334, ans=0.2 2024-09-17 04:33:29,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-09-17 04:33:33,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=550445.8333333334, ans=0.125 2024-09-17 04:33:54,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=550502.5, ans=0.0 2024-09-17 04:34:06,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.119e+02 2.280e+02 2.422e+02 3.001e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 04:34:42,682 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:34:43,803 INFO [train.py:1198] (1/2) Epoch 31, batch 2600, loss[loss=0.2242, ctc_loss=0.148, cr_loss=0.381, over 20879.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1511, cr_loss=0.3761, over 4101208.80 frames. ], batch size: 57, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:34:57,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=550615.8333333334, ans=0.0 2024-09-17 04:35:23,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=550644.1666666666, ans=0.2 2024-09-17 04:35:28,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-17 04:35:59,585 INFO [train.py:1198] (1/2) Epoch 31, batch 2650, loss[loss=0.2409, ctc_loss=0.1614, cr_loss=0.3975, over 21043.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3759, over 4101010.91 frames. ], batch size: 62, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:36:07,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=550729.1666666666, ans=0.125 2024-09-17 04:36:12,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=550729.1666666666, ans=0.125 2024-09-17 04:36:26,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=550757.5, ans=15.0 2024-09-17 04:36:39,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=550785.8333333334, ans=0.0 2024-09-17 04:36:40,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.154e+02 2.329e+02 2.493e+02 4.391e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-17 04:37:15,072 INFO [train.py:1198] (1/2) Epoch 31, batch 2700, loss[loss=0.1877, ctc_loss=0.1229, cr_loss=0.3244, over 19885.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1513, cr_loss=0.3762, over 4090501.51 frames. ], batch size: 44, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:37:21,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550870.8333333334, ans=0.1 2024-09-17 04:38:33,624 INFO [train.py:1198] (1/2) Epoch 31, batch 2750, loss[loss=0.1901, ctc_loss=0.1236, cr_loss=0.3322, over 20952.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3747, over 4093665.05 frames. ], batch size: 48, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:38:47,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=551040.8333333334, ans=0.2 2024-09-17 04:38:52,117 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:39:13,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.177e+02 2.282e+02 2.437e+02 4.059e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-17 04:39:24,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551097.5, ans=0.1 2024-09-17 04:39:27,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=551097.5, ans=0.125 2024-09-17 04:39:48,315 INFO [train.py:1198] (1/2) Epoch 31, batch 2800, loss[loss=0.226, ctc_loss=0.1499, cr_loss=0.3805, over 21071.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3741, over 4092789.95 frames. ], batch size: 59, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:39:48,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=551154.1666666666, ans=0.0 2024-09-17 04:39:56,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=551154.1666666666, ans=0.0 2024-09-17 04:40:38,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551239.1666666666, ans=0.1 2024-09-17 04:41:04,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 04:41:06,994 INFO [train.py:1198] (1/2) Epoch 31, batch 2850, loss[loss=0.2002, ctc_loss=0.1347, cr_loss=0.3273, over 20791.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3739, over 4086412.40 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:41:47,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.152e+02 2.291e+02 2.433e+02 3.295e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 04:41:51,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=551380.8333333334, ans=0.125 2024-09-17 04:42:22,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=22.5 2024-09-17 04:42:22,928 INFO [train.py:1198] (1/2) Epoch 31, batch 2900, loss[loss=0.2134, ctc_loss=0.1388, cr_loss=0.3731, over 21043.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3732, over 4097222.69 frames. ], batch size: 56, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:42:38,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551465.8333333334, ans=0.125 2024-09-17 04:42:52,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551494.1666666666, ans=0.125 2024-09-17 04:42:53,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=551494.1666666666, ans=0.125 2024-09-17 04:43:04,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=551494.1666666666, ans=0.5 2024-09-17 04:43:05,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2024-09-17 04:43:36,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=551550.8333333334, ans=0.125 2024-09-17 04:43:38,865 INFO [train.py:1198] (1/2) Epoch 31, batch 2950, loss[loss=0.1776, ctc_loss=0.1161, cr_loss=0.3073, over 20957.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1486, cr_loss=0.3718, over 4110111.15 frames. ], batch size: 49, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:44:03,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2024-09-17 04:44:22,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.171e+02 2.322e+02 2.486e+02 3.288e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-17 04:44:49,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=551692.5, ans=0.125 2024-09-17 04:44:56,183 INFO [train.py:1198] (1/2) Epoch 31, batch 3000, loss[loss=0.201, ctc_loss=0.1349, cr_loss=0.3306, over 21025.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3722, over 4102550.93 frames. ], batch size: 62, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:44:56,183 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 04:45:13,230 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6244, 2.3511, 2.3064, 2.5405, 2.3186, 2.4435, 1.8983, 1.7836], device='cuda:1') 2024-09-17 04:45:16,856 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.04047, ctc_loss=0.04047, cr_loss=1.207e-14, over 944034.00 frames. 2024-09-17 04:45:16,856 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 04:45:39,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=551749.1666666666, ans=0.0 2024-09-17 04:46:35,418 INFO [train.py:1198] (1/2) Epoch 31, batch 3050, loss[loss=0.1889, ctc_loss=0.1229, cr_loss=0.3301, over 21074.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3731, over 4106909.35 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:46:43,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-17 04:46:58,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551890.8333333334, ans=0.0 2024-09-17 04:47:16,568 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.157e+02 2.283e+02 2.453e+02 3.621e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 04:47:50,926 INFO [train.py:1198] (1/2) Epoch 31, batch 3100, loss[loss=0.249, ctc_loss=0.171, cr_loss=0.3901, over 20956.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1509, cr_loss=0.3752, over 4092559.17 frames. ], batch size: 67, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:47:54,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=552004.1666666666, ans=0.125 2024-09-17 04:48:18,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=552032.5, ans=0.125 2024-09-17 04:48:18,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552032.5, ans=0.1 2024-09-17 04:48:47,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=552089.1666666666, ans=0.0 2024-09-17 04:48:57,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=552117.5, ans=0.0 2024-09-17 04:49:06,381 INFO [train.py:1198] (1/2) Epoch 31, batch 3150, loss[loss=0.1906, ctc_loss=0.125, cr_loss=0.3282, over 20977.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1513, cr_loss=0.3754, over 4092258.13 frames. ], batch size: 51, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:49:11,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=552145.8333333334, ans=0.125 2024-09-17 04:49:14,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552145.8333333334, ans=0.1 2024-09-17 04:49:49,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.192e+02 2.298e+02 2.459e+02 5.368e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-17 04:50:04,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552230.8333333334, ans=0.1 2024-09-17 04:50:21,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=552259.1666666666, ans=15.0 2024-09-17 04:50:24,968 INFO [train.py:1198] (1/2) Epoch 31, batch 3200, loss[loss=0.2601, ctc_loss=0.1761, cr_loss=0.4202, over 20358.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1518, cr_loss=0.3758, over 4083061.62 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:50:31,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=552287.5, ans=0.2 2024-09-17 04:50:40,768 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:51:16,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552372.5, ans=0.1 2024-09-17 04:51:34,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=552400.8333333334, ans=0.125 2024-09-17 04:51:35,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=552400.8333333334, ans=0.125 2024-09-17 04:51:43,490 INFO [train.py:1198] (1/2) Epoch 31, batch 3250, loss[loss=0.2773, ctc_loss=0.1969, cr_loss=0.4019, over 14604.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3763, over 4083977.21 frames. ], batch size: 150, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:51:45,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=552429.1666666666, ans=0.025 2024-09-17 04:51:51,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552429.1666666666, ans=0.125 2024-09-17 04:51:52,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552429.1666666666, ans=0.1 2024-09-17 04:52:01,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552457.5, ans=0.1 2024-09-17 04:52:04,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=552457.5, ans=0.125 2024-09-17 04:52:24,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.170e+02 2.288e+02 2.542e+02 4.991e+02, threshold=4.575e+02, percent-clipped=1.0 2024-09-17 04:52:42,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=552542.5, ans=0.0 2024-09-17 04:52:59,093 INFO [train.py:1198] (1/2) Epoch 31, batch 3300, loss[loss=0.2297, ctc_loss=0.1524, cr_loss=0.3865, over 20658.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3758, over 4094989.75 frames. ], batch size: 71, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:53:11,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=552570.8333333334, ans=0.125 2024-09-17 04:53:43,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552655.8333333334, ans=0.125 2024-09-17 04:53:49,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=552655.8333333334, ans=0.0 2024-09-17 04:53:51,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=552655.8333333334, ans=0.125 2024-09-17 04:54:15,005 INFO [train.py:1198] (1/2) Epoch 31, batch 3350, loss[loss=0.2623, ctc_loss=0.1761, cr_loss=0.4309, over 19914.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3767, over 4105838.96 frames. ], batch size: 80, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:54:20,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=552712.5, ans=0.0 2024-09-17 04:54:21,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=552712.5, ans=0.125 2024-09-17 04:54:36,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=552740.8333333334, ans=0.0 2024-09-17 04:54:44,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=552769.1666666666, ans=0.04949747468305833 2024-09-17 04:54:55,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.153e+02 2.240e+02 2.345e+02 3.285e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 04:55:21,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=552825.8333333334, ans=0.2 2024-09-17 04:55:33,751 INFO [train.py:1198] (1/2) Epoch 31, batch 3400, loss[loss=0.208, ctc_loss=0.139, cr_loss=0.3452, over 21062.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3758, over 4093777.92 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:55:43,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=552854.1666666666, ans=0.0 2024-09-17 04:56:00,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=552882.5, ans=0.04949747468305833 2024-09-17 04:56:09,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=552910.8333333334, ans=0.1 2024-09-17 04:56:13,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=552910.8333333334, ans=0.2 2024-09-17 04:56:26,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-17 04:56:39,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552967.5, ans=0.125 2024-09-17 04:56:49,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=552967.5, ans=0.95 2024-09-17 04:56:52,197 INFO [train.py:1198] (1/2) Epoch 31, batch 3450, loss[loss=0.2499, ctc_loss=0.1682, cr_loss=0.4083, over 17889.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3761, over 4089283.77 frames. ], batch size: 108, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:57:32,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-17 04:57:33,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.132e+02 2.281e+02 2.481e+02 3.658e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-17 04:58:02,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=553109.1666666666, ans=0.2 2024-09-17 04:58:08,121 INFO [train.py:1198] (1/2) Epoch 31, batch 3500, loss[loss=0.2465, ctc_loss=0.1622, cr_loss=0.4212, over 21028.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3759, over 4082258.05 frames. ], batch size: 62, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:58:11,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553137.5, ans=0.0 2024-09-17 04:58:40,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=553194.1666666666, ans=0.125 2024-09-17 04:59:20,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.38 vs. limit=22.5 2024-09-17 04:59:24,513 INFO [train.py:1198] (1/2) Epoch 31, batch 3550, loss[loss=0.2726, ctc_loss=0.1871, cr_loss=0.4275, over 18100.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1517, cr_loss=0.3748, over 4051665.35 frames. ], batch size: 108, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:59:44,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=553307.5, ans=0.125 2024-09-17 05:00:05,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.151e+02 2.293e+02 2.461e+02 6.303e+02, threshold=4.586e+02, percent-clipped=1.0 2024-09-17 05:00:42,614 INFO [train.py:1198] (1/2) Epoch 31, batch 3600, loss[loss=0.2388, ctc_loss=0.1607, cr_loss=0.3906, over 20852.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3756, over 4057391.44 frames. ], batch size: 65, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:00:49,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553420.8333333334, ans=0.125 2024-09-17 05:00:50,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=553420.8333333334, ans=0.125 2024-09-17 05:01:23,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=553477.5, ans=0.125 2024-09-17 05:01:34,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=553505.8333333334, ans=0.125 2024-09-17 05:01:46,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=553534.1666666666, ans=0.0 2024-09-17 05:01:49,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553534.1666666666, ans=0.1 2024-09-17 05:01:58,446 INFO [train.py:1198] (1/2) Epoch 31, batch 3650, loss[loss=0.2537, ctc_loss=0.171, cr_loss=0.4132, over 21001.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3739, over 4075427.88 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:02:07,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=553562.5, ans=0.0 2024-09-17 05:02:21,485 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:02:37,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 05:02:42,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.122e+02 2.251e+02 2.423e+02 3.673e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 05:02:44,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=553619.1666666666, ans=0.2 2024-09-17 05:03:16,913 INFO [train.py:1198] (1/2) Epoch 31, batch 3700, loss[loss=0.2146, ctc_loss=0.1428, cr_loss=0.359, over 20795.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.3747, over 4084592.48 frames. ], batch size: 53, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:03:26,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=553704.1666666666, ans=0.0 2024-09-17 05:04:31,824 INFO [train.py:1198] (1/2) Epoch 31, batch 3750, loss[loss=0.246, ctc_loss=0.1661, cr_loss=0.3994, over 20811.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1509, cr_loss=0.376, over 4089601.06 frames. ], batch size: 65, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:04:43,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-17 05:04:57,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=553874.1666666666, ans=0.125 2024-09-17 05:05:00,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=553902.5, ans=0.0 2024-09-17 05:05:07,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553902.5, ans=0.125 2024-09-17 05:05:13,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.142e+02 2.232e+02 2.468e+02 3.336e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-17 05:05:17,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-09-17 05:05:46,950 INFO [train.py:1198] (1/2) Epoch 31, batch 3800, loss[loss=0.2608, ctc_loss=0.1762, cr_loss=0.4229, over 20725.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1517, cr_loss=0.3769, over 4091807.25 frames. ], batch size: 71, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:05:53,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=553987.5, ans=0.125 2024-09-17 05:05:56,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=553987.5, ans=0.125 2024-09-17 05:06:28,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=554044.1666666666, ans=0.07 2024-09-17 05:06:46,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=554072.5, ans=0.125 2024-09-17 05:06:55,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=554100.8333333334, ans=0.025 2024-09-17 05:06:55,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=22.5 2024-09-17 05:07:03,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=554100.8333333334, ans=0.5 2024-09-17 05:07:05,735 INFO [train.py:1198] (1/2) Epoch 31, batch 3850, loss[loss=0.2027, ctc_loss=0.1344, cr_loss=0.3415, over 20984.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3749, over 4094971.40 frames. ], batch size: 48, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:07:08,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=554129.1666666666, ans=0.025 2024-09-17 05:07:16,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554129.1666666666, ans=0.1 2024-09-17 05:07:17,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554129.1666666666, ans=0.1 2024-09-17 05:07:38,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=554185.8333333334, ans=0.025 2024-09-17 05:07:48,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.132e+02 2.259e+02 2.463e+02 4.213e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 05:08:14,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=554242.5, ans=0.125 2024-09-17 05:08:17,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=554242.5, ans=0.0 2024-09-17 05:08:24,376 INFO [train.py:1198] (1/2) Epoch 31, batch 3900, loss[loss=0.2649, ctc_loss=0.189, cr_loss=0.38, over 14350.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.3747, over 4091698.42 frames. ], batch size: 149, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:08:30,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=554270.8333333334, ans=0.0 2024-09-17 05:08:32,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=554270.8333333334, ans=0.125 2024-09-17 05:08:38,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=554299.1666666666, ans=0.5 2024-09-17 05:09:05,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=554327.5, ans=0.125 2024-09-17 05:09:05,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=554327.5, ans=0.0 2024-09-17 05:09:07,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-17 05:09:13,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2024-09-17 05:09:40,111 INFO [train.py:1198] (1/2) Epoch 31, batch 3950, loss[loss=0.2107, ctc_loss=0.1401, cr_loss=0.3528, over 20993.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3747, over 4090191.43 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:09:50,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=554412.5, ans=0.2 2024-09-17 05:09:51,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-17 05:09:58,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-09-17 05:10:00,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=554440.8333333334, ans=0.125 2024-09-17 05:10:22,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.129e+02 2.258e+02 2.498e+02 3.573e+02, threshold=4.516e+02, percent-clipped=0.0 2024-09-17 05:10:52,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=554525.8333333334, ans=0.0 2024-09-17 05:10:55,414 INFO [train.py:1198] (1/2) Epoch 31, batch 4000, loss[loss=0.2529, ctc_loss=0.1692, cr_loss=0.4185, over 20833.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1509, cr_loss=0.3757, over 4070653.19 frames. ], batch size: 65, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:10:57,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=554554.1666666666, ans=0.125 2024-09-17 05:11:10,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=554582.5, ans=0.2 2024-09-17 05:11:45,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=554639.1666666666, ans=0.0 2024-09-17 05:11:47,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2024-09-17 05:12:13,620 INFO [train.py:1198] (1/2) Epoch 31, batch 4050, loss[loss=0.2667, ctc_loss=0.182, cr_loss=0.4237, over 18297.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1497, cr_loss=0.3737, over 4082098.64 frames. ], batch size: 108, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:12:19,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554695.8333333334, ans=0.125 2024-09-17 05:12:19,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554695.8333333334, ans=0.1 2024-09-17 05:12:33,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=554724.1666666666, ans=0.0 2024-09-17 05:12:34,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=554724.1666666666, ans=0.125 2024-09-17 05:12:55,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.197e+02 2.312e+02 2.523e+02 4.675e+02, threshold=4.623e+02, percent-clipped=1.0 2024-09-17 05:13:05,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554780.8333333334, ans=0.1 2024-09-17 05:13:18,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=554809.1666666666, ans=0.95 2024-09-17 05:13:32,285 INFO [train.py:1198] (1/2) Epoch 31, batch 4100, loss[loss=0.2311, ctc_loss=0.1521, cr_loss=0.395, over 21038.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3734, over 4074502.12 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:13:34,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=554837.5, ans=0.0 2024-09-17 05:14:12,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-09-17 05:14:16,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554922.5, ans=0.125 2024-09-17 05:14:20,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=554922.5, ans=0.125 2024-09-17 05:14:20,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=554922.5, ans=0.0 2024-09-17 05:14:36,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=554950.8333333334, ans=0.125 2024-09-17 05:14:47,510 INFO [train.py:1198] (1/2) Epoch 31, batch 4150, loss[loss=0.2577, ctc_loss=0.1723, cr_loss=0.4272, over 20868.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3737, over 4086464.05 frames. ], batch size: 65, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:15:29,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.147e+02 2.282e+02 2.495e+02 3.283e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-17 05:15:31,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-17 05:16:00,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-17 05:16:02,546 INFO [train.py:1198] (1/2) Epoch 31, batch 4200, loss[loss=0.1978, ctc_loss=0.129, cr_loss=0.3443, over 20349.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.149, cr_loss=0.3723, over 4097438.18 frames. ], batch size: 45, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:16:04,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=555120.8333333334, ans=0.2 2024-09-17 05:16:36,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=555177.5, ans=0.025 2024-09-17 05:16:39,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=555177.5, ans=0.125 2024-09-17 05:16:41,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=555177.5, ans=0.125 2024-09-17 05:17:11,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=555234.1666666666, ans=0.04949747468305833 2024-09-17 05:17:15,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=555234.1666666666, ans=0.125 2024-09-17 05:17:18,566 INFO [train.py:1198] (1/2) Epoch 31, batch 4250, loss[loss=0.2782, ctc_loss=0.1902, cr_loss=0.4404, over 20004.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.3732, over 4097582.56 frames. ], batch size: 80, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:17:23,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-17 05:17:36,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555290.8333333334, ans=0.1 2024-09-17 05:17:43,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-17 05:18:05,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.132e+02 2.267e+02 2.432e+02 3.219e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 05:18:27,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=555375.8333333334, ans=0.125 2024-09-17 05:18:38,687 INFO [train.py:1198] (1/2) Epoch 31, batch 4300, loss[loss=0.1722, ctc_loss=0.1128, cr_loss=0.2971, over 20972.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1479, cr_loss=0.3705, over 4100356.50 frames. ], batch size: 52, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:19:02,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2024-09-17 05:19:57,528 INFO [train.py:1198] (1/2) Epoch 31, batch 4350, loss[loss=0.2138, ctc_loss=0.1419, cr_loss=0.3595, over 20995.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1479, cr_loss=0.3707, over 4111770.85 frames. ], batch size: 48, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:20:01,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-17 05:20:02,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-17 05:20:13,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555574.1666666666, ans=0.125 2024-09-17 05:20:23,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=555574.1666666666, ans=0.95 2024-09-17 05:20:35,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555602.5, ans=0.125 2024-09-17 05:20:35,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-17 05:20:36,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555602.5, ans=0.125 2024-09-17 05:20:39,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.174e+02 2.311e+02 2.470e+02 3.545e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 05:20:54,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-17 05:21:04,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555659.1666666666, ans=0.125 2024-09-17 05:21:12,883 INFO [train.py:1198] (1/2) Epoch 31, batch 4400, loss[loss=0.1957, ctc_loss=0.1298, cr_loss=0.3294, over 20989.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1481, cr_loss=0.3706, over 4116481.43 frames. ], batch size: 48, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:21:17,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=555687.5, ans=0.0 2024-09-17 05:21:37,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555715.8333333334, ans=0.125 2024-09-17 05:21:40,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=555715.8333333334, ans=0.125 2024-09-17 05:22:14,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555800.8333333334, ans=0.1 2024-09-17 05:22:29,047 INFO [train.py:1198] (1/2) Epoch 31, batch 4450, loss[loss=0.2048, ctc_loss=0.1334, cr_loss=0.3572, over 19777.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1488, cr_loss=0.3721, over 4108470.87 frames. ], batch size: 44, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:22:34,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-09-17 05:22:46,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-09-17 05:22:57,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=555857.5, ans=0.0 2024-09-17 05:23:11,889 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.145e+02 2.272e+02 2.416e+02 2.988e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 05:23:22,927 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:23:45,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-17 05:23:47,774 INFO [train.py:1198] (1/2) Epoch 31, batch 4500, loss[loss=0.2272, ctc_loss=0.1528, cr_loss=0.3722, over 20710.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3739, over 4109362.15 frames. ], batch size: 71, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:24:26,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-17 05:24:38,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=556055.8333333334, ans=0.125 2024-09-17 05:24:39,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=556055.8333333334, ans=0.125 2024-09-17 05:24:39,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=556055.8333333334, ans=0.125 2024-09-17 05:24:51,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=556084.1666666666, ans=0.125 2024-09-17 05:24:52,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-09-17 05:25:03,824 INFO [train.py:1198] (1/2) Epoch 31, batch 4550, loss[loss=0.2475, ctc_loss=0.1696, cr_loss=0.3896, over 20245.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3742, over 4109559.35 frames. ], batch size: 74, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:25:37,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-17 05:25:48,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.157e+02 2.260e+02 2.457e+02 3.814e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-17 05:26:22,105 INFO [train.py:1198] (1/2) Epoch 31, batch 4600, loss[loss=0.2015, ctc_loss=0.1357, cr_loss=0.3292, over 20988.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3747, over 4118510.02 frames. ], batch size: 58, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:26:23,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=556254.1666666666, ans=0.125 2024-09-17 05:26:27,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556254.1666666666, ans=0.125 2024-09-17 05:26:48,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=556282.5, ans=22.5 2024-09-17 05:26:55,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-17 05:27:05,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=556310.8333333334, ans=0.0 2024-09-17 05:27:12,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556339.1666666666, ans=0.125 2024-09-17 05:27:32,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-09-17 05:27:37,759 INFO [train.py:1198] (1/2) Epoch 31, batch 4650, loss[loss=0.2386, ctc_loss=0.1635, cr_loss=0.3756, over 20825.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.3749, over 4121289.33 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:28:08,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-09-17 05:28:14,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556452.5, ans=0.125 2024-09-17 05:28:20,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.132e+02 2.237e+02 2.361e+02 3.543e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-17 05:28:31,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=556480.8333333334, ans=0.0 2024-09-17 05:28:43,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-09-17 05:28:43,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2024-09-17 05:28:53,541 INFO [train.py:1198] (1/2) Epoch 31, batch 4700, loss[loss=0.2094, ctc_loss=0.1415, cr_loss=0.3396, over 20857.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1499, cr_loss=0.3737, over 4114195.30 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:28:57,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-17 05:29:05,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=556537.5, ans=0.125 2024-09-17 05:29:23,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=556565.8333333334, ans=0.05 2024-09-17 05:29:27,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-09-17 05:29:31,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-17 05:30:08,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-17 05:30:12,030 INFO [train.py:1198] (1/2) Epoch 31, batch 4750, loss[loss=0.236, ctc_loss=0.1587, cr_loss=0.3863, over 20954.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3733, over 4097921.74 frames. ], batch size: 60, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:30:21,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556679.1666666666, ans=0.1 2024-09-17 05:30:23,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-09-17 05:30:29,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 05:30:33,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=556707.5, ans=0.125 2024-09-17 05:30:57,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.170e+02 2.302e+02 2.491e+02 3.115e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 05:31:17,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-17 05:31:29,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556820.8333333334, ans=0.1 2024-09-17 05:31:30,647 INFO [train.py:1198] (1/2) Epoch 31, batch 4800, loss[loss=0.1918, ctc_loss=0.1266, cr_loss=0.326, over 20956.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1492, cr_loss=0.372, over 4107686.52 frames. ], batch size: 49, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:31:57,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=556849.1666666666, ans=0.035 2024-09-17 05:32:09,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2024-09-17 05:32:19,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-17 05:32:46,173 INFO [train.py:1198] (1/2) Epoch 31, batch 4850, loss[loss=0.2241, ctc_loss=0.1496, cr_loss=0.3725, over 21061.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3739, over 4109951.57 frames. ], batch size: 53, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:33:23,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=557019.1666666666, ans=0.025 2024-09-17 05:33:28,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.148e+02 2.306e+02 2.446e+02 3.239e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 05:33:39,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=557047.5, ans=0.125 2024-09-17 05:33:49,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=557075.8333333334, ans=0.2 2024-09-17 05:34:01,310 INFO [train.py:1198] (1/2) Epoch 31, batch 4900, loss[loss=0.2507, ctc_loss=0.1681, cr_loss=0.413, over 20684.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3736, over 4108552.28 frames. ], batch size: 68, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:34:10,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557104.1666666666, ans=0.0 2024-09-17 05:34:49,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=557189.1666666666, ans=0.125 2024-09-17 05:35:15,713 INFO [train.py:1198] (1/2) Epoch 31, batch 4950, loss[loss=0.2347, ctc_loss=0.1574, cr_loss=0.3865, over 20738.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3744, over 4109050.02 frames. ], batch size: 71, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:36:00,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.181e+02 2.267e+02 2.406e+02 3.577e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 05:36:11,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=557330.8333333334, ans=0.2 2024-09-17 05:36:34,079 INFO [train.py:1198] (1/2) Epoch 31, batch 5000, loss[loss=0.229, ctc_loss=0.156, cr_loss=0.365, over 20820.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3732, over 4120723.81 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:36:53,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=557415.8333333334, ans=0.125 2024-09-17 05:36:53,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=557415.8333333334, ans=0.125 2024-09-17 05:37:05,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557444.1666666666, ans=0.1 2024-09-17 05:37:48,146 INFO [train.py:1198] (1/2) Epoch 31, batch 5050, loss[loss=0.2198, ctc_loss=0.145, cr_loss=0.3738, over 20970.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1498, cr_loss=0.3749, over 4120812.95 frames. ], batch size: 48, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:38:07,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=557557.5, ans=0.2 2024-09-17 05:38:13,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=557557.5, ans=0.0 2024-09-17 05:38:29,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.123e+02 2.260e+02 2.429e+02 7.931e+02, threshold=4.519e+02, percent-clipped=1.0 2024-09-17 05:38:45,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 05:38:50,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-17 05:39:05,129 INFO [train.py:1198] (1/2) Epoch 31, batch 5100, loss[loss=0.2562, ctc_loss=0.1773, cr_loss=0.3947, over 20118.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1502, cr_loss=0.3755, over 4108783.00 frames. ], batch size: 80, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:39:05,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=557670.8333333334, ans=0.125 2024-09-17 05:40:19,411 INFO [train.py:1198] (1/2) Epoch 31, batch 5150, loss[loss=0.2546, ctc_loss=0.1727, cr_loss=0.4097, over 20324.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1503, cr_loss=0.3748, over 4102386.15 frames. ], batch size: 74, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:40:20,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-17 05:41:01,613 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.161e+02 2.291e+02 2.422e+02 3.607e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 05:41:18,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=557925.8333333334, ans=0.125 2024-09-17 05:41:29,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2024-09-17 05:41:34,433 INFO [train.py:1198] (1/2) Epoch 31, batch 5200, loss[loss=0.2295, ctc_loss=0.1533, cr_loss=0.3812, over 21006.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3751, over 4096230.17 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:41:40,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557954.1666666666, ans=0.1 2024-09-17 05:41:55,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 05:42:02,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=558010.8333333334, ans=0.0 2024-09-17 05:42:10,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558010.8333333334, ans=0.1 2024-09-17 05:42:19,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558039.1666666666, ans=0.0 2024-09-17 05:42:28,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=22.5 2024-09-17 05:42:35,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=558067.5, ans=0.0 2024-09-17 05:42:48,852 INFO [train.py:1198] (1/2) Epoch 31, batch 5250, loss[loss=0.2608, ctc_loss=0.1769, cr_loss=0.4193, over 19969.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.375, over 4096765.71 frames. ], batch size: 80, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:43:04,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2024-09-17 05:43:12,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=558124.1666666666, ans=0.125 2024-09-17 05:43:20,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=558152.5, ans=0.2 2024-09-17 05:43:29,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=558152.5, ans=0.05 2024-09-17 05:43:31,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.151e+02 2.267e+02 2.415e+02 5.981e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 05:43:43,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=558180.8333333334, ans=0.125 2024-09-17 05:44:02,429 INFO [train.py:1198] (1/2) Epoch 31, batch 5300, loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.3393, over 20953.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3747, over 4092271.92 frames. ], batch size: 50, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:44:39,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558294.1666666666, ans=0.125 2024-09-17 05:44:48,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=558322.5, ans=0.125 2024-09-17 05:44:56,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=558322.5, ans=0.2 2024-09-17 05:45:16,923 INFO [train.py:1198] (1/2) Epoch 31, batch 5350, loss[loss=0.2608, ctc_loss=0.1785, cr_loss=0.4111, over 20641.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3744, over 4098690.81 frames. ], batch size: 66, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:45:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=558435.8333333334, ans=0.2 2024-09-17 05:46:02,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.188e+02 2.311e+02 2.479e+02 3.936e+02, threshold=4.621e+02, percent-clipped=1.0 2024-09-17 05:46:20,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=558492.5, ans=10.0 2024-09-17 05:46:33,782 INFO [train.py:1198] (1/2) Epoch 31, batch 5400, loss[loss=0.2259, ctc_loss=0.1503, cr_loss=0.3784, over 20783.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.3748, over 4094285.50 frames. ], batch size: 53, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:46:56,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=558549.1666666666, ans=0.2 2024-09-17 05:46:58,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=558549.1666666666, ans=0.125 2024-09-17 05:47:37,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-17 05:47:47,312 INFO [train.py:1198] (1/2) Epoch 31, batch 5450, loss[loss=0.2357, ctc_loss=0.1611, cr_loss=0.3729, over 20110.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.3748, over 4088345.98 frames. ], batch size: 80, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:47:57,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=558662.5, ans=0.04949747468305833 2024-09-17 05:48:10,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-17 05:48:24,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558719.1666666666, ans=0.125 2024-09-17 05:48:32,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.211e+02 2.314e+02 2.502e+02 4.686e+02, threshold=4.629e+02, percent-clipped=1.0 2024-09-17 05:48:52,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-17 05:49:04,052 INFO [train.py:1198] (1/2) Epoch 31, batch 5500, loss[loss=0.22, ctc_loss=0.1486, cr_loss=0.357, over 21048.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3743, over 4089347.63 frames. ], batch size: 56, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:49:41,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558860.8333333334, ans=0.125 2024-09-17 05:49:48,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2024-09-17 05:49:54,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558889.1666666666, ans=0.0 2024-09-17 05:50:18,549 INFO [train.py:1198] (1/2) Epoch 31, batch 5550, loss[loss=0.2216, ctc_loss=0.1495, cr_loss=0.3603, over 20795.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3737, over 4078267.63 frames. ], batch size: 56, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:50:33,992 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:50:48,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=559002.5, ans=0.0 2024-09-17 05:51:01,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.141e+02 2.293e+02 2.462e+02 3.507e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-17 05:51:05,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-17 05:51:32,968 INFO [train.py:1198] (1/2) Epoch 31, batch 5600, loss[loss=0.2213, ctc_loss=0.1441, cr_loss=0.3861, over 20975.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3748, over 4071361.35 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:52:07,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=559144.1666666666, ans=0.125 2024-09-17 05:52:19,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=559172.5, ans=0.0 2024-09-17 05:52:23,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=559172.5, ans=0.125 2024-09-17 05:52:47,403 INFO [train.py:1198] (1/2) Epoch 31, batch 5650, loss[loss=0.2485, ctc_loss=0.1708, cr_loss=0.3885, over 19411.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.151, cr_loss=0.3755, over 4067372.91 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:52:56,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559229.1666666666, ans=0.0 2024-09-17 05:53:04,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=559257.5, ans=15.0 2024-09-17 05:53:23,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=559285.8333333334, ans=0.125 2024-09-17 05:53:30,807 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.165e+02 2.280e+02 2.460e+02 3.302e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-17 05:53:47,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=559342.5, ans=0.125 2024-09-17 05:54:01,637 INFO [train.py:1198] (1/2) Epoch 31, batch 5700, loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3408, over 20983.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3752, over 4073187.80 frames. ], batch size: 48, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:54:30,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2024-09-17 05:55:12,509 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:55:19,389 INFO [train.py:1198] (1/2) Epoch 31, batch 5750, loss[loss=0.2446, ctc_loss=0.1619, cr_loss=0.4134, over 20952.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1507, cr_loss=0.375, over 4073287.28 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:55:45,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-17 05:56:02,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.173e+02 2.324e+02 2.491e+02 3.999e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 05:56:33,437 INFO [train.py:1198] (1/2) Epoch 31, batch 5800, loss[loss=0.2245, ctc_loss=0.149, cr_loss=0.3776, over 21061.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1508, cr_loss=0.3755, over 4064409.26 frames. ], batch size: 56, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:57:49,465 INFO [train.py:1198] (1/2) Epoch 31, batch 5850, loss[loss=0.2159, ctc_loss=0.1435, cr_loss=0.3622, over 21040.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3739, over 4065608.34 frames. ], batch size: 62, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:58:13,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559824.1666666666, ans=0.1 2024-09-17 05:58:19,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2024-09-17 05:58:32,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.175e+02 2.331e+02 2.525e+02 3.423e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-17 05:58:40,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-17 05:58:57,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2024-09-17 05:59:02,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=559937.5, ans=0.125 2024-09-17 05:59:03,926 INFO [train.py:1198] (1/2) Epoch 31, batch 5900, loss[loss=0.2643, ctc_loss=0.1815, cr_loss=0.4138, over 18223.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3745, over 4071754.84 frames. ], batch size: 108, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:59:07,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=559937.5, ans=0.0 2024-09-17 05:59:26,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559965.8333333334, ans=0.0 2024-09-17 06:00:18,616 INFO [train.py:1198] (1/2) Epoch 31, batch 5950, loss[loss=0.2372, ctc_loss=0.1557, cr_loss=0.4076, over 20956.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3744, over 4079887.27 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:01:01,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.192e+02 2.322e+02 2.526e+02 3.100e+02, threshold=4.643e+02, percent-clipped=0.0 2024-09-17 06:01:16,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-17 06:01:19,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-09-17 06:01:31,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560220.8333333334, ans=0.1 2024-09-17 06:01:32,602 INFO [train.py:1198] (1/2) Epoch 31, batch 6000, loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3568, over 21013.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3745, over 4090810.60 frames. ], batch size: 63, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:01:32,602 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 06:01:54,234 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.04065, ctc_loss=0.04065, cr_loss=1.26e-14, over 944034.00 frames. 2024-09-17 06:01:54,235 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 06:02:33,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=560277.5, ans=0.0 2024-09-17 06:02:43,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=560305.8333333334, ans=0.125 2024-09-17 06:03:12,652 INFO [train.py:1198] (1/2) Epoch 31, batch 6050, loss[loss=0.191, ctc_loss=0.1257, cr_loss=0.3266, over 21063.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3732, over 4092498.21 frames. ], batch size: 53, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:03:16,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-09-17 06:03:18,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=560362.5, ans=0.125 2024-09-17 06:03:54,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=560419.1666666666, ans=0.2 2024-09-17 06:03:55,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.155e+02 2.299e+02 2.498e+02 3.704e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 06:03:57,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=560447.5, ans=0.125 2024-09-17 06:04:10,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-09-17 06:04:13,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-17 06:04:25,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:04:27,546 INFO [train.py:1198] (1/2) Epoch 31, batch 6100, loss[loss=0.2421, ctc_loss=0.1623, cr_loss=0.3988, over 20961.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1515, cr_loss=0.3756, over 4075757.70 frames. ], batch size: 64, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:05:04,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=560560.8333333334, ans=0.0 2024-09-17 06:05:34,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=560617.5, ans=15.0 2024-09-17 06:05:41,082 INFO [train.py:1198] (1/2) Epoch 31, batch 6150, loss[loss=0.1828, ctc_loss=0.1186, cr_loss=0.3209, over 20353.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3759, over 4080800.82 frames. ], batch size: 45, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:06:02,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=560674.1666666666, ans=0.0 2024-09-17 06:06:12,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=560702.5, ans=0.125 2024-09-17 06:06:15,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=560702.5, ans=0.05 2024-09-17 06:06:23,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.140e+02 2.291e+02 2.415e+02 4.856e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-17 06:06:41,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=560759.1666666666, ans=0.025 2024-09-17 06:06:54,835 INFO [train.py:1198] (1/2) Epoch 31, batch 6200, loss[loss=0.2019, ctc_loss=0.1305, cr_loss=0.3572, over 20987.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1516, cr_loss=0.3754, over 4064774.22 frames. ], batch size: 52, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:07:02,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-09-17 06:07:15,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=560815.8333333334, ans=0.125 2024-09-17 06:07:52,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560900.8333333334, ans=0.1 2024-09-17 06:08:05,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:08:07,918 INFO [train.py:1198] (1/2) Epoch 31, batch 6250, loss[loss=0.2683, ctc_loss=0.1921, cr_loss=0.3806, over 14361.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3754, over 4031461.15 frames. ], batch size: 150, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:08:30,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=560957.5, ans=0.2 2024-09-17 06:08:39,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560985.8333333334, ans=0.1 2024-09-17 06:08:43,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=560985.8333333334, ans=0.125 2024-09-17 06:08:47,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=560985.8333333334, ans=0.0 2024-09-17 06:08:50,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.170e+02 2.302e+02 2.558e+02 7.310e+02, threshold=4.603e+02, percent-clipped=1.0 2024-09-17 06:09:07,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=561042.5, ans=0.125 2024-09-17 06:09:21,716 INFO [train.py:1198] (1/2) Epoch 31, batch 6300, loss[loss=0.2824, ctc_loss=0.2023, cr_loss=0.4001, over 13632.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3745, over 3982203.25 frames. ], batch size: 149, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:09:23,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=22.5 2024-09-17 06:09:29,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561070.8333333334, ans=0.1 2024-09-17 06:09:54,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561127.5, ans=0.125 2024-09-17 06:10:13,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=561155.8333333334, ans=0.5 2024-09-17 06:10:22,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=561184.1666666666, ans=0.125 2024-09-17 06:10:26,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=561184.1666666666, ans=0.125 2024-09-17 06:10:34,970 INFO [train.py:1198] (1/2) Epoch 31, batch 6350, loss[loss=0.2826, ctc_loss=0.1991, cr_loss=0.4174, over 13978.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1535, cr_loss=0.3748, over 3912084.97 frames. ], batch size: 149, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:11:16,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.313e+02 2.603e+02 2.829e+02 3.684e+02, threshold=5.206e+02, percent-clipped=0.0 2024-09-17 06:12:20,748 INFO [train.py:1198] (1/2) Epoch 32, batch 0, loss[loss=0.2327, ctc_loss=0.1552, cr_loss=0.3871, over 21079.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1552, cr_loss=0.3871, over 21079.00 frames. ], batch size: 59, lr: 2.61e-03, grad_scale: 32.0 2024-09-17 06:12:20,749 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 06:12:39,140 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.04055, ctc_loss=0.04055, cr_loss=1.282e-14, over 944034.00 frames. 2024-09-17 06:12:39,141 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 06:12:48,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=561328.6666666666, ans=0.0 2024-09-17 06:13:14,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=561385.3333333334, ans=0.125 2024-09-17 06:13:24,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-17 06:13:36,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=561413.6666666666, ans=0.125 2024-09-17 06:13:47,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-17 06:13:49,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=561442.0, ans=0.125 2024-09-17 06:13:57,078 INFO [train.py:1198] (1/2) Epoch 32, batch 50, loss[loss=0.2232, ctc_loss=0.1496, cr_loss=0.3683, over 20868.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1524, cr_loss=0.3767, over 914949.03 frames. ], batch size: 57, lr: 2.61e-03, grad_scale: 16.0 2024-09-17 06:14:07,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=561470.3333333334, ans=0.2 2024-09-17 06:14:38,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=561527.0, ans=0.125 2024-09-17 06:14:56,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.148e+02 2.334e+02 2.565e+02 8.400e+02, threshold=4.668e+02, percent-clipped=2.0 2024-09-17 06:15:12,801 INFO [train.py:1198] (1/2) Epoch 32, batch 100, loss[loss=0.2659, ctc_loss=0.177, cr_loss=0.4445, over 20642.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3758, over 1621306.74 frames. ], batch size: 68, lr: 2.61e-03, grad_scale: 16.0 2024-09-17 06:15:40,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=561640.3333333334, ans=0.125 2024-09-17 06:15:43,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561668.6666666666, ans=0.125 2024-09-17 06:15:49,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561668.6666666666, ans=0.125 2024-09-17 06:15:59,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561697.0, ans=0.1 2024-09-17 06:16:31,457 INFO [train.py:1198] (1/2) Epoch 32, batch 150, loss[loss=0.2572, ctc_loss=0.1728, cr_loss=0.4218, over 19455.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.152, cr_loss=0.3772, over 2180867.90 frames. ], batch size: 90, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:16:34,830 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:16:52,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=561782.0, ans=0.2 2024-09-17 06:17:29,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.106e+02 2.273e+02 2.464e+02 3.847e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-17 06:17:39,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561867.0, ans=0.1 2024-09-17 06:17:46,430 INFO [train.py:1198] (1/2) Epoch 32, batch 200, loss[loss=0.2034, ctc_loss=0.1315, cr_loss=0.3594, over 20982.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.3728, over 2601692.69 frames. ], batch size: 52, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:18:09,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=561923.6666666666, ans=0.2 2024-09-17 06:18:09,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561923.6666666666, ans=0.1 2024-09-17 06:18:13,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=561923.6666666666, ans=0.09899494936611666 2024-09-17 06:19:02,076 INFO [train.py:1198] (1/2) Epoch 32, batch 250, loss[loss=0.2113, ctc_loss=0.1395, cr_loss=0.3594, over 20956.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1489, cr_loss=0.3727, over 2939065.35 frames. ], batch size: 50, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:19:05,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562037.0, ans=0.125 2024-09-17 06:20:04,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.137e+02 2.243e+02 2.422e+02 3.810e+02, threshold=4.486e+02, percent-clipped=0.0 2024-09-17 06:20:21,156 INFO [train.py:1198] (1/2) Epoch 32, batch 300, loss[loss=0.2145, ctc_loss=0.1427, cr_loss=0.3594, over 20763.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.148, cr_loss=0.3712, over 3208662.16 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:20:42,874 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:21:21,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=562292.0, ans=0.125 2024-09-17 06:21:32,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562292.0, ans=0.125 2024-09-17 06:21:36,863 INFO [train.py:1198] (1/2) Epoch 32, batch 350, loss[loss=0.2684, ctc_loss=0.1774, cr_loss=0.4548, over 20650.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3732, over 3406071.76 frames. ], batch size: 66, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:21:59,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=12.0 2024-09-17 06:22:25,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=562405.3333333334, ans=0.015 2024-09-17 06:22:38,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.150e+02 2.271e+02 2.406e+02 3.200e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-17 06:22:52,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=562433.6666666666, ans=0.2 2024-09-17 06:22:55,551 INFO [train.py:1198] (1/2) Epoch 32, batch 400, loss[loss=0.218, ctc_loss=0.1464, cr_loss=0.3584, over 21016.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3741, over 3563125.26 frames. ], batch size: 61, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:22:55,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=562462.0, ans=0.0 2024-09-17 06:23:06,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562462.0, ans=0.125 2024-09-17 06:23:45,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=562547.0, ans=0.125 2024-09-17 06:24:10,815 INFO [train.py:1198] (1/2) Epoch 32, batch 450, loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3688, over 21019.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1512, cr_loss=0.3764, over 3685495.43 frames. ], batch size: 61, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:24:38,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-17 06:25:10,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=562688.6666666666, ans=0.125 2024-09-17 06:25:12,921 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.141e+02 2.298e+02 2.521e+02 4.450e+02, threshold=4.595e+02, percent-clipped=0.0 2024-09-17 06:25:13,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-17 06:25:29,545 INFO [train.py:1198] (1/2) Epoch 32, batch 500, loss[loss=0.1988, ctc_loss=0.1301, cr_loss=0.3432, over 20962.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1501, cr_loss=0.3747, over 3782331.63 frames. ], batch size: 51, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:25:37,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562745.3333333334, ans=0.1 2024-09-17 06:26:03,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=562802.0, ans=0.0 2024-09-17 06:26:45,024 INFO [train.py:1198] (1/2) Epoch 32, batch 550, loss[loss=0.2305, ctc_loss=0.1535, cr_loss=0.3847, over 20680.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3726, over 3856129.62 frames. ], batch size: 68, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:27:14,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-17 06:27:30,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=562972.0, ans=0.0 2024-09-17 06:27:41,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562972.0, ans=0.1 2024-09-17 06:27:44,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.128e+02 2.252e+02 2.422e+02 3.990e+02, threshold=4.505e+02, percent-clipped=0.0 2024-09-17 06:28:01,107 INFO [train.py:1198] (1/2) Epoch 32, batch 600, loss[loss=0.1989, ctc_loss=0.1319, cr_loss=0.3348, over 20981.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1487, cr_loss=0.3716, over 3912888.75 frames. ], batch size: 50, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:28:10,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563028.6666666666, ans=0.1 2024-09-17 06:28:14,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=563028.6666666666, ans=0.05 2024-09-17 06:28:18,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=563057.0, ans=0.0 2024-09-17 06:28:57,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=563113.6666666666, ans=0.0 2024-09-17 06:29:01,623 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:29:06,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563142.0, ans=0.125 2024-09-17 06:29:18,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2024-09-17 06:29:19,318 INFO [train.py:1198] (1/2) Epoch 32, batch 650, loss[loss=0.1804, ctc_loss=0.1186, cr_loss=0.3091, over 20956.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3718, over 3950405.21 frames. ], batch size: 51, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:30:09,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=563255.3333333334, ans=0.125 2024-09-17 06:30:18,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.136e+02 2.277e+02 2.441e+02 2.853e+02, threshold=4.554e+02, percent-clipped=0.0 2024-09-17 06:30:24,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=563283.6666666666, ans=0.0 2024-09-17 06:30:28,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=563283.6666666666, ans=0.125 2024-09-17 06:30:34,641 INFO [train.py:1198] (1/2) Epoch 32, batch 700, loss[loss=0.213, ctc_loss=0.139, cr_loss=0.3699, over 20980.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1481, cr_loss=0.3716, over 3990199.19 frames. ], batch size: 55, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:31:07,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=563368.6666666666, ans=0.2 2024-09-17 06:31:13,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=563368.6666666666, ans=0.0 2024-09-17 06:31:16,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=563368.6666666666, ans=0.125 2024-09-17 06:31:37,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=563425.3333333334, ans=0.0 2024-09-17 06:31:54,041 INFO [train.py:1198] (1/2) Epoch 32, batch 750, loss[loss=0.2052, ctc_loss=0.1337, cr_loss=0.3572, over 20804.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3723, over 4012903.23 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:32:02,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=563453.6666666666, ans=0.125 2024-09-17 06:32:03,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=563453.6666666666, ans=0.2 2024-09-17 06:32:50,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563538.6666666666, ans=0.1 2024-09-17 06:32:54,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.150e+02 2.294e+02 2.462e+02 4.152e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 06:33:08,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=563595.3333333334, ans=0.125 2024-09-17 06:33:09,801 INFO [train.py:1198] (1/2) Epoch 32, batch 800, loss[loss=0.2505, ctc_loss=0.165, cr_loss=0.4276, over 20643.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.3729, over 4021499.22 frames. ], batch size: 66, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:33:20,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=563595.3333333334, ans=0.02 2024-09-17 06:33:25,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=563623.6666666666, ans=0.0 2024-09-17 06:33:42,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=563652.0, ans=0.2 2024-09-17 06:34:23,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=563708.6666666666, ans=0.125 2024-09-17 06:34:29,027 INFO [train.py:1198] (1/2) Epoch 32, batch 850, loss[loss=0.2347, ctc_loss=0.1581, cr_loss=0.3833, over 20738.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3735, over 4016754.08 frames. ], batch size: 71, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:35:13,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=563822.0, ans=0.125 2024-09-17 06:35:16,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=563822.0, ans=0.125 2024-09-17 06:35:20,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=563822.0, ans=0.125 2024-09-17 06:35:29,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.123e+02 2.293e+02 2.439e+02 3.333e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-17 06:35:30,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2024-09-17 06:35:42,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-09-17 06:35:44,276 INFO [train.py:1198] (1/2) Epoch 32, batch 900, loss[loss=0.2259, ctc_loss=0.1531, cr_loss=0.3643, over 20359.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1507, cr_loss=0.3749, over 4031048.44 frames. ], batch size: 74, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:35:47,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563878.6666666666, ans=0.0 2024-09-17 06:35:59,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=563907.0, ans=0.0 2024-09-17 06:36:18,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563935.3333333334, ans=0.125 2024-09-17 06:36:43,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=563963.6666666666, ans=0.125 2024-09-17 06:36:49,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=563992.0, ans=0.125 2024-09-17 06:37:03,037 INFO [train.py:1198] (1/2) Epoch 32, batch 950, loss[loss=0.2295, ctc_loss=0.1556, cr_loss=0.3696, over 21063.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3732, over 4041137.88 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:37:13,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=564020.3333333334, ans=10.0 2024-09-17 06:37:19,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=564048.6666666666, ans=0.125 2024-09-17 06:37:29,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564048.6666666666, ans=0.1 2024-09-17 06:37:51,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=564105.3333333334, ans=0.125 2024-09-17 06:38:04,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.115e+02 2.220e+02 2.319e+02 3.664e+02, threshold=4.441e+02, percent-clipped=0.0 2024-09-17 06:38:18,113 INFO [train.py:1198] (1/2) Epoch 32, batch 1000, loss[loss=0.2366, ctc_loss=0.1611, cr_loss=0.377, over 20945.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3742, over 4063959.64 frames. ], batch size: 64, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:38:59,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=564218.6666666666, ans=0.2 2024-09-17 06:38:59,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564218.6666666666, ans=0.1 2024-09-17 06:39:00,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564218.6666666666, ans=0.1 2024-09-17 06:39:05,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 06:39:26,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-17 06:39:27,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=564275.3333333334, ans=0.125 2024-09-17 06:39:36,165 INFO [train.py:1198] (1/2) Epoch 32, batch 1050, loss[loss=0.2033, ctc_loss=0.1304, cr_loss=0.364, over 20954.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3746, over 4077112.25 frames. ], batch size: 51, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:39:36,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=22.5 2024-09-17 06:39:41,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=564303.6666666666, ans=0.0 2024-09-17 06:39:43,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-09-17 06:39:53,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-17 06:40:02,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=564332.0, ans=0.2 2024-09-17 06:40:20,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=564388.6666666666, ans=0.0 2024-09-17 06:40:37,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.150e+02 2.287e+02 2.402e+02 5.259e+02, threshold=4.574e+02, percent-clipped=1.0 2024-09-17 06:40:42,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564417.0, ans=0.1 2024-09-17 06:40:51,452 INFO [train.py:1198] (1/2) Epoch 32, batch 1100, loss[loss=0.1891, ctc_loss=0.1235, cr_loss=0.3279, over 21073.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1499, cr_loss=0.3738, over 4092992.47 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:40:53,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=22.5 2024-09-17 06:40:57,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564445.3333333334, ans=0.1 2024-09-17 06:41:05,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=564473.6666666666, ans=0.125 2024-09-17 06:41:18,910 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:42:07,114 INFO [train.py:1198] (1/2) Epoch 32, batch 1150, loss[loss=0.211, ctc_loss=0.1383, cr_loss=0.3636, over 20812.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3737, over 4088469.37 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:42:16,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564587.0, ans=0.0 2024-09-17 06:42:30,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=564615.3333333334, ans=0.125 2024-09-17 06:42:40,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=564643.6666666666, ans=0.2 2024-09-17 06:42:42,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=564643.6666666666, ans=0.1 2024-09-17 06:43:11,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.100e+02 2.251e+02 2.429e+02 2.712e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-17 06:43:19,680 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:43:25,179 INFO [train.py:1198] (1/2) Epoch 32, batch 1200, loss[loss=0.2208, ctc_loss=0.1479, cr_loss=0.3644, over 20292.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3751, over 4094824.89 frames. ], batch size: 74, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:43:51,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:43:51,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=564757.0, ans=0.125 2024-09-17 06:43:52,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=564757.0, ans=0.2 2024-09-17 06:44:37,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=564842.0, ans=0.125 2024-09-17 06:44:40,194 INFO [train.py:1198] (1/2) Epoch 32, batch 1250, loss[loss=0.2084, ctc_loss=0.1353, cr_loss=0.3657, over 20787.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3732, over 4102480.16 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:44:54,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-17 06:45:27,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=564955.3333333334, ans=0.5 2024-09-17 06:45:38,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564955.3333333334, ans=0.0 2024-09-17 06:45:45,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.191e+02 2.363e+02 2.536e+02 4.105e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-17 06:45:58,600 INFO [train.py:1198] (1/2) Epoch 32, batch 1300, loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3715, over 20969.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3733, over 4087283.34 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:46:00,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565012.0, ans=0.125 2024-09-17 06:46:09,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=565012.0, ans=0.125 2024-09-17 06:46:23,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=22.5 2024-09-17 06:46:59,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=565125.3333333334, ans=0.125 2024-09-17 06:47:00,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=565125.3333333334, ans=0.125 2024-09-17 06:47:13,870 INFO [train.py:1198] (1/2) Epoch 32, batch 1350, loss[loss=0.214, ctc_loss=0.1426, cr_loss=0.3573, over 21041.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1502, cr_loss=0.3741, over 4093398.26 frames. ], batch size: 62, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:47:36,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=565182.0, ans=0.125 2024-09-17 06:48:06,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=565238.6666666666, ans=0.025 2024-09-17 06:48:18,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.174e+02 2.279e+02 2.413e+02 3.662e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-17 06:48:20,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=565267.0, ans=0.0 2024-09-17 06:48:24,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=565267.0, ans=0.125 2024-09-17 06:48:30,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565295.3333333334, ans=0.1 2024-09-17 06:48:31,842 INFO [train.py:1198] (1/2) Epoch 32, batch 1400, loss[loss=0.2259, ctc_loss=0.1495, cr_loss=0.3822, over 20973.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3736, over 4099430.62 frames. ], batch size: 48, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:48:41,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565295.3333333334, ans=0.1 2024-09-17 06:48:42,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=565295.3333333334, ans=0.0 2024-09-17 06:49:20,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565380.3333333334, ans=0.125 2024-09-17 06:49:27,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=565380.3333333334, ans=0.0 2024-09-17 06:49:32,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=565408.6666666666, ans=0.0 2024-09-17 06:49:43,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=565408.6666666666, ans=10.0 2024-09-17 06:49:47,334 INFO [train.py:1198] (1/2) Epoch 32, batch 1450, loss[loss=0.2377, ctc_loss=0.1616, cr_loss=0.38, over 20891.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3741, over 4105400.31 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:49:59,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565437.0, ans=0.1 2024-09-17 06:50:48,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=565550.3333333334, ans=0.125 2024-09-17 06:50:51,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.127e+02 2.261e+02 2.444e+02 5.232e+02, threshold=4.522e+02, percent-clipped=1.0 2024-09-17 06:50:58,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2024-09-17 06:51:05,133 INFO [train.py:1198] (1/2) Epoch 32, batch 1500, loss[loss=0.1854, ctc_loss=0.1219, cr_loss=0.3174, over 20954.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3736, over 4095871.08 frames. ], batch size: 48, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:51:22,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=565607.0, ans=0.125 2024-09-17 06:51:30,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.80 vs. limit=10.0 2024-09-17 06:51:50,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=565663.6666666666, ans=0.0 2024-09-17 06:52:05,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=565692.0, ans=0.0 2024-09-17 06:52:20,637 INFO [train.py:1198] (1/2) Epoch 32, batch 1550, loss[loss=0.2168, ctc_loss=0.1451, cr_loss=0.3587, over 21056.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3729, over 4110064.48 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:52:54,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=565777.0, ans=15.0 2024-09-17 06:53:22,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.131e+02 2.244e+02 2.401e+02 3.184e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-17 06:53:27,203 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:53:28,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=565833.6666666666, ans=0.125 2024-09-17 06:53:38,696 INFO [train.py:1198] (1/2) Epoch 32, batch 1600, loss[loss=0.2122, ctc_loss=0.1395, cr_loss=0.3639, over 20986.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3738, over 4111488.53 frames. ], batch size: 48, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:53:42,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-17 06:53:46,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=565862.0, ans=0.125 2024-09-17 06:53:58,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=565890.3333333334, ans=0.07 2024-09-17 06:54:11,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2024-09-17 06:54:15,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=15.0 2024-09-17 06:54:28,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=565947.0, ans=0.125 2024-09-17 06:54:28,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=22.5 2024-09-17 06:54:53,987 INFO [train.py:1198] (1/2) Epoch 32, batch 1650, loss[loss=0.213, ctc_loss=0.1394, cr_loss=0.3678, over 21055.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3744, over 4104825.49 frames. ], batch size: 53, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:55:02,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566003.6666666666, ans=0.125 2024-09-17 06:55:06,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-09-17 06:55:55,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.136e+02 2.269e+02 2.410e+02 3.416e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 06:56:09,240 INFO [train.py:1198] (1/2) Epoch 32, batch 1700, loss[loss=0.2444, ctc_loss=0.1653, cr_loss=0.3955, over 21017.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.374, over 4096732.47 frames. ], batch size: 61, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 06:56:30,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-17 06:56:39,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-09-17 06:57:13,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566258.6666666666, ans=0.125 2024-09-17 06:57:26,865 INFO [train.py:1198] (1/2) Epoch 32, batch 1750, loss[loss=0.2217, ctc_loss=0.1464, cr_loss=0.3765, over 20843.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3741, over 4105302.49 frames. ], batch size: 59, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 06:57:42,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566315.3333333334, ans=0.1 2024-09-17 06:57:49,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566315.3333333334, ans=0.125 2024-09-17 06:58:25,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=566400.3333333334, ans=0.125 2024-09-17 06:58:29,764 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.163e+02 2.251e+02 2.385e+02 4.582e+02, threshold=4.503e+02, percent-clipped=1.0 2024-09-17 06:58:41,780 INFO [train.py:1198] (1/2) Epoch 32, batch 1800, loss[loss=0.2152, ctc_loss=0.139, cr_loss=0.3807, over 21042.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1504, cr_loss=0.3752, over 4108900.54 frames. ], batch size: 56, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 06:58:43,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566428.6666666666, ans=0.125 2024-09-17 06:59:25,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=566485.3333333334, ans=0.035 2024-09-17 06:59:34,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2024-09-17 06:59:42,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=566513.6666666666, ans=0.0 2024-09-17 07:00:00,258 INFO [train.py:1198] (1/2) Epoch 32, batch 1850, loss[loss=0.2424, ctc_loss=0.1618, cr_loss=0.4031, over 21045.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3744, over 4115694.54 frames. ], batch size: 56, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:00:36,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=566627.0, ans=0.125 2024-09-17 07:00:57,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-17 07:01:04,178 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.156e+02 2.256e+02 2.428e+02 3.231e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-17 07:01:09,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566683.6666666666, ans=0.1 2024-09-17 07:01:10,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2024-09-17 07:01:16,314 INFO [train.py:1198] (1/2) Epoch 32, batch 1900, loss[loss=0.2491, ctc_loss=0.1665, cr_loss=0.4132, over 20703.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1489, cr_loss=0.3725, over 4112125.32 frames. ], batch size: 71, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:02:19,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=566825.3333333334, ans=0.0 2024-09-17 07:02:35,247 INFO [train.py:1198] (1/2) Epoch 32, batch 1950, loss[loss=0.2081, ctc_loss=0.1365, cr_loss=0.3582, over 20934.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.371, over 4122144.14 frames. ], batch size: 60, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:02:35,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=566853.6666666666, ans=0.125 2024-09-17 07:02:53,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 07:03:01,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-17 07:03:14,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=566910.3333333334, ans=0.05 2024-09-17 07:03:19,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=566938.6666666666, ans=10.0 2024-09-17 07:03:37,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=566967.0, ans=0.125 2024-09-17 07:03:38,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.153e+02 2.247e+02 2.428e+02 3.636e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-17 07:03:39,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-17 07:03:43,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=566967.0, ans=0.025 2024-09-17 07:03:50,997 INFO [train.py:1198] (1/2) Epoch 32, batch 2000, loss[loss=0.2231, ctc_loss=0.151, cr_loss=0.3608, over 20980.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1478, cr_loss=0.3703, over 4105896.35 frames. ], batch size: 55, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:04:17,073 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:04:20,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2024-09-17 07:04:45,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=15.0 2024-09-17 07:04:48,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=567080.3333333334, ans=0.5 2024-09-17 07:05:02,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567108.6666666666, ans=0.1 2024-09-17 07:05:09,872 INFO [train.py:1198] (1/2) Epoch 32, batch 2050, loss[loss=0.2605, ctc_loss=0.1858, cr_loss=0.3738, over 14041.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3718, over 4087009.56 frames. ], batch size: 149, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:05:13,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=567137.0, ans=0.025 2024-09-17 07:05:35,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=567165.3333333334, ans=0.2 2024-09-17 07:05:49,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=567193.6666666666, ans=0.2 2024-09-17 07:06:10,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=567250.3333333334, ans=0.2 2024-09-17 07:06:14,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.181e+02 2.301e+02 2.456e+02 3.502e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 07:06:19,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-17 07:06:24,789 INFO [train.py:1198] (1/2) Epoch 32, batch 2100, loss[loss=0.198, ctc_loss=0.1311, cr_loss=0.3345, over 21054.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3722, over 4096533.54 frames. ], batch size: 56, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:06:47,750 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:07:39,933 INFO [train.py:1198] (1/2) Epoch 32, batch 2150, loss[loss=0.2477, ctc_loss=0.1662, cr_loss=0.4076, over 20144.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3722, over 4088894.50 frames. ], batch size: 80, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:08:18,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-17 07:08:26,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=567505.3333333334, ans=0.125 2024-09-17 07:08:47,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.136e+02 2.257e+02 2.419e+02 4.178e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-17 07:08:57,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-17 07:08:57,946 INFO [train.py:1198] (1/2) Epoch 32, batch 2200, loss[loss=0.2322, ctc_loss=0.1532, cr_loss=0.3953, over 21002.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3737, over 4087721.25 frames. ], batch size: 61, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:09:12,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2024-09-17 07:09:15,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=567590.3333333334, ans=0.025 2024-09-17 07:09:15,401 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:09:19,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=567590.3333333334, ans=0.125 2024-09-17 07:09:19,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=567590.3333333334, ans=0.2 2024-09-17 07:09:44,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567647.0, ans=0.1 2024-09-17 07:09:45,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=567647.0, ans=0.025 2024-09-17 07:09:49,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-17 07:10:07,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=567675.3333333334, ans=0.0 2024-09-17 07:10:13,380 INFO [train.py:1198] (1/2) Epoch 32, batch 2250, loss[loss=0.1794, ctc_loss=0.1172, cr_loss=0.3109, over 20967.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.373, over 4093259.68 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:10:19,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=567703.6666666666, ans=0.0 2024-09-17 07:10:20,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=567703.6666666666, ans=0.025 2024-09-17 07:10:24,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=567703.6666666666, ans=0.125 2024-09-17 07:11:03,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567788.6666666666, ans=0.1 2024-09-17 07:11:15,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=567817.0, ans=0.2 2024-09-17 07:11:20,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=567817.0, ans=0.09899494936611666 2024-09-17 07:11:21,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.098e+02 2.214e+02 2.357e+02 4.754e+02, threshold=4.427e+02, percent-clipped=1.0 2024-09-17 07:11:31,824 INFO [train.py:1198] (1/2) Epoch 32, batch 2300, loss[loss=0.202, ctc_loss=0.1307, cr_loss=0.3564, over 21054.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3731, over 4082603.83 frames. ], batch size: 56, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:12:01,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=567902.0, ans=0.125 2024-09-17 07:12:02,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=567902.0, ans=0.0 2024-09-17 07:12:05,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=567902.0, ans=0.035 2024-09-17 07:12:08,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=567902.0, ans=0.125 2024-09-17 07:12:20,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=567930.3333333334, ans=0.025 2024-09-17 07:12:46,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=567987.0, ans=0.0 2024-09-17 07:12:47,369 INFO [train.py:1198] (1/2) Epoch 32, batch 2350, loss[loss=0.2326, ctc_loss=0.1535, cr_loss=0.3955, over 21008.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3737, over 4093738.97 frames. ], batch size: 63, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:13:06,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-17 07:13:07,406 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:13:28,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=568043.6666666666, ans=0.125 2024-09-17 07:13:55,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.169e+02 2.283e+02 2.413e+02 5.050e+02, threshold=4.567e+02, percent-clipped=1.0 2024-09-17 07:14:05,652 INFO [train.py:1198] (1/2) Epoch 32, batch 2400, loss[loss=0.2474, ctc_loss=0.165, cr_loss=0.4118, over 20958.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3743, over 4087058.61 frames. ], batch size: 58, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:14:19,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-09-17 07:14:22,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=568157.0, ans=0.0 2024-09-17 07:14:28,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-17 07:14:40,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=568185.3333333334, ans=0.125 2024-09-17 07:14:57,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-17 07:15:20,813 INFO [train.py:1198] (1/2) Epoch 32, batch 2450, loss[loss=0.1947, ctc_loss=0.1259, cr_loss=0.344, over 21002.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3734, over 4101214.29 frames. ], batch size: 48, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:15:24,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=568270.3333333334, ans=0.125 2024-09-17 07:16:03,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=568327.0, ans=0.0 2024-09-17 07:16:24,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=568383.6666666666, ans=0.0 2024-09-17 07:16:25,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.138e+02 2.262e+02 2.448e+02 4.421e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-17 07:16:38,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568412.0, ans=0.1 2024-09-17 07:16:39,426 INFO [train.py:1198] (1/2) Epoch 32, batch 2500, loss[loss=0.1824, ctc_loss=0.1185, cr_loss=0.3197, over 20972.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1489, cr_loss=0.3739, over 4104761.84 frames. ], batch size: 48, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:17:25,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=568497.0, ans=0.2 2024-09-17 07:17:28,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=568497.0, ans=0.125 2024-09-17 07:17:41,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=568525.3333333334, ans=0.0 2024-09-17 07:17:55,321 INFO [train.py:1198] (1/2) Epoch 32, batch 2550, loss[loss=0.2487, ctc_loss=0.1653, cr_loss=0.4174, over 20674.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3732, over 4093199.16 frames. ], batch size: 66, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:17:55,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=568553.6666666666, ans=0.025 2024-09-17 07:18:18,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=568582.0, ans=0.125 2024-09-17 07:18:54,308 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:18:59,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.179e+02 2.332e+02 2.505e+02 3.120e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-17 07:19:07,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=568667.0, ans=0.0 2024-09-17 07:19:10,435 INFO [train.py:1198] (1/2) Epoch 32, batch 2600, loss[loss=0.2412, ctc_loss=0.1605, cr_loss=0.4036, over 20905.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3724, over 4102770.65 frames. ], batch size: 60, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:19:38,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=568723.6666666666, ans=0.125 2024-09-17 07:20:01,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-17 07:20:20,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=568808.6666666666, ans=0.125 2024-09-17 07:20:29,033 INFO [train.py:1198] (1/2) Epoch 32, batch 2650, loss[loss=0.2134, ctc_loss=0.1417, cr_loss=0.3582, over 20878.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1475, cr_loss=0.3702, over 4105973.44 frames. ], batch size: 54, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:20:34,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=568837.0, ans=0.025 2024-09-17 07:20:56,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=22.5 2024-09-17 07:21:27,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=568922.0, ans=0.0 2024-09-17 07:21:33,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=568950.3333333334, ans=0.125 2024-09-17 07:21:34,594 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.134e+02 2.251e+02 2.417e+02 3.469e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 07:21:45,411 INFO [train.py:1198] (1/2) Epoch 32, batch 2700, loss[loss=0.1901, ctc_loss=0.1205, cr_loss=0.3478, over 20971.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.371, over 4114775.51 frames. ], batch size: 51, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:22:05,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-17 07:22:14,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=569035.3333333334, ans=0.2 2024-09-17 07:22:23,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-17 07:22:33,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.75 vs. limit=10.0 2024-09-17 07:22:42,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569063.6666666666, ans=0.1 2024-09-17 07:23:04,589 INFO [train.py:1198] (1/2) Epoch 32, batch 2750, loss[loss=0.2577, ctc_loss=0.1794, cr_loss=0.3911, over 18260.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3717, over 4104770.64 frames. ], batch size: 108, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:23:36,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-17 07:23:59,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569205.3333333334, ans=0.1 2024-09-17 07:24:02,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=569205.3333333334, ans=0.125 2024-09-17 07:24:09,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.139e+02 2.268e+02 2.431e+02 3.732e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-17 07:24:19,566 INFO [train.py:1198] (1/2) Epoch 32, batch 2800, loss[loss=0.2434, ctc_loss=0.1632, cr_loss=0.4008, over 20300.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3718, over 4101994.27 frames. ], batch size: 74, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:25:21,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569375.3333333334, ans=0.125 2024-09-17 07:25:26,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-09-17 07:25:27,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=569375.3333333334, ans=0.0 2024-09-17 07:25:29,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569375.3333333334, ans=0.1 2024-09-17 07:25:37,637 INFO [train.py:1198] (1/2) Epoch 32, batch 2850, loss[loss=0.1907, ctc_loss=0.1236, cr_loss=0.3353, over 19902.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1485, cr_loss=0.3713, over 4095080.36 frames. ], batch size: 44, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:25:56,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=569432.0, ans=0.125 2024-09-17 07:26:17,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.10 vs. limit=10.0 2024-09-17 07:26:18,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=569460.3333333334, ans=0.2 2024-09-17 07:26:19,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=569460.3333333334, ans=0.2 2024-09-17 07:26:27,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=569488.6666666666, ans=0.125 2024-09-17 07:26:42,602 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.164e+02 2.318e+02 2.473e+02 4.370e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-17 07:26:43,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569517.0, ans=0.1 2024-09-17 07:26:53,107 INFO [train.py:1198] (1/2) Epoch 32, batch 2900, loss[loss=0.2609, ctc_loss=0.1788, cr_loss=0.4105, over 19366.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3719, over 4087803.69 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:26:53,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569545.3333333334, ans=0.125 2024-09-17 07:26:54,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=569545.3333333334, ans=0.125 2024-09-17 07:27:05,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=569545.3333333334, ans=0.125 2024-09-17 07:27:32,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=569602.0, ans=0.125 2024-09-17 07:27:40,244 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:27:49,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=569630.3333333334, ans=0.125 2024-09-17 07:28:05,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=569658.6666666666, ans=0.125 2024-09-17 07:28:11,549 INFO [train.py:1198] (1/2) Epoch 32, batch 2950, loss[loss=0.2256, ctc_loss=0.1518, cr_loss=0.3692, over 20668.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3728, over 4101035.90 frames. ], batch size: 68, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:29:16,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.172e+02 2.360e+02 2.593e+02 4.618e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-17 07:29:21,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=569800.3333333334, ans=0.0 2024-09-17 07:29:26,797 INFO [train.py:1198] (1/2) Epoch 32, batch 3000, loss[loss=0.2519, ctc_loss=0.1695, cr_loss=0.4119, over 20520.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1492, cr_loss=0.3719, over 4092177.10 frames. ], batch size: 75, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:29:26,798 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 07:29:45,603 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.04051, ctc_loss=0.04051, cr_loss=1.305e-14, over 944034.00 frames. 2024-09-17 07:29:45,603 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 07:29:50,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=569828.6666666666, ans=0.2 2024-09-17 07:30:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=569857.0, ans=0.125 2024-09-17 07:30:32,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=569913.6666666666, ans=0.125 2024-09-17 07:30:38,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=569913.6666666666, ans=0.2 2024-09-17 07:30:38,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569913.6666666666, ans=0.125 2024-09-17 07:30:46,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-17 07:31:03,805 INFO [train.py:1198] (1/2) Epoch 32, batch 3050, loss[loss=0.2594, ctc_loss=0.1841, cr_loss=0.3766, over 13761.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3724, over 4091815.86 frames. ], batch size: 149, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:31:05,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=569970.3333333334, ans=0.125 2024-09-17 07:31:08,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569970.3333333334, ans=0.125 2024-09-17 07:31:09,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-09-17 07:31:14,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=569970.3333333334, ans=0.0 2024-09-17 07:31:40,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=570027.0, ans=0.125 2024-09-17 07:32:08,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.213e+02 2.329e+02 2.544e+02 4.650e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-17 07:32:13,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=570083.6666666666, ans=0.125 2024-09-17 07:32:14,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=570083.6666666666, ans=0.125 2024-09-17 07:32:19,002 INFO [train.py:1198] (1/2) Epoch 32, batch 3100, loss[loss=0.2409, ctc_loss=0.1645, cr_loss=0.3823, over 19271.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3725, over 4088213.51 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:32:47,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=570168.6666666666, ans=0.025 2024-09-17 07:32:52,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-17 07:32:57,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=570168.6666666666, ans=0.0 2024-09-17 07:33:20,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=570225.3333333334, ans=0.2 2024-09-17 07:33:32,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=570253.6666666666, ans=0.125 2024-09-17 07:33:33,892 INFO [train.py:1198] (1/2) Epoch 32, batch 3150, loss[loss=0.1971, ctc_loss=0.1295, cr_loss=0.338, over 19933.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3742, over 4080230.80 frames. ], batch size: 44, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:34:41,187 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.137e+02 2.259e+02 2.395e+02 4.184e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-17 07:34:51,750 INFO [train.py:1198] (1/2) Epoch 32, batch 3200, loss[loss=0.2717, ctc_loss=0.1873, cr_loss=0.422, over 17853.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1495, cr_loss=0.3735, over 4098265.73 frames. ], batch size: 108, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:34:52,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=570395.3333333334, ans=0.125 2024-09-17 07:34:56,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=570395.3333333334, ans=0.125 2024-09-17 07:35:04,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=570395.3333333334, ans=0.1 2024-09-17 07:35:12,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=570423.6666666666, ans=0.2 2024-09-17 07:35:22,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=570452.0, ans=0.025 2024-09-17 07:36:07,346 INFO [train.py:1198] (1/2) Epoch 32, batch 3250, loss[loss=0.2191, ctc_loss=0.148, cr_loss=0.3554, over 21072.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3732, over 4087246.82 frames. ], batch size: 53, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:36:09,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570537.0, ans=0.1 2024-09-17 07:36:43,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=570593.6666666666, ans=0.0 2024-09-17 07:36:56,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570622.0, ans=0.1 2024-09-17 07:37:05,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=570622.0, ans=0.0 2024-09-17 07:37:05,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=570622.0, ans=0.125 2024-09-17 07:37:09,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=570650.3333333334, ans=0.125 2024-09-17 07:37:15,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.125e+02 2.256e+02 2.459e+02 6.121e+02, threshold=4.513e+02, percent-clipped=2.0 2024-09-17 07:37:24,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2024-09-17 07:37:26,316 INFO [train.py:1198] (1/2) Epoch 32, batch 3300, loss[loss=0.2415, ctc_loss=0.1627, cr_loss=0.394, over 20788.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3727, over 4092835.21 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:37:31,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=570678.6666666666, ans=0.0 2024-09-17 07:37:58,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-17 07:38:05,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-17 07:38:39,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=12.0 2024-09-17 07:38:40,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=570820.3333333334, ans=0.07 2024-09-17 07:38:41,896 INFO [train.py:1198] (1/2) Epoch 32, batch 3350, loss[loss=0.1928, ctc_loss=0.1267, cr_loss=0.3305, over 20928.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3738, over 4080550.07 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:38:48,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=570820.3333333334, ans=0.125 2024-09-17 07:38:57,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-17 07:39:49,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.203e+02 2.306e+02 2.500e+02 3.116e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 07:39:55,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-17 07:40:00,143 INFO [train.py:1198] (1/2) Epoch 32, batch 3400, loss[loss=0.2414, ctc_loss=0.1618, cr_loss=0.3979, over 18292.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3746, over 4095179.54 frames. ], batch size: 108, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:40:03,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=570962.0, ans=0.2 2024-09-17 07:40:10,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=570962.0, ans=0.2 2024-09-17 07:40:32,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=12.0 2024-09-17 07:41:15,676 INFO [train.py:1198] (1/2) Epoch 32, batch 3450, loss[loss=0.2168, ctc_loss=0.1424, cr_loss=0.3724, over 20962.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3744, over 4101059.13 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:41:20,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=22.5 2024-09-17 07:41:40,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571132.0, ans=0.125 2024-09-17 07:41:46,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=571160.3333333334, ans=0.09899494936611666 2024-09-17 07:42:16,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=571188.6666666666, ans=10.0 2024-09-17 07:42:23,668 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.201e+02 2.305e+02 2.525e+02 3.316e+02, threshold=4.610e+02, percent-clipped=0.0 2024-09-17 07:42:34,281 INFO [train.py:1198] (1/2) Epoch 32, batch 3500, loss[loss=0.2369, ctc_loss=0.1601, cr_loss=0.3843, over 20336.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.3749, over 4097447.89 frames. ], batch size: 74, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:43:24,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=571330.3333333334, ans=0.125 2024-09-17 07:43:27,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=571330.3333333334, ans=0.0 2024-09-17 07:43:38,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=22.5 2024-09-17 07:43:49,871 INFO [train.py:1198] (1/2) Epoch 32, batch 3550, loss[loss=0.24, ctc_loss=0.1618, cr_loss=0.3912, over 21021.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1512, cr_loss=0.3762, over 4090391.05 frames. ], batch size: 63, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:43:55,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2024-09-17 07:44:01,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=571387.0, ans=0.125 2024-09-17 07:44:12,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-17 07:44:51,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=571500.3333333334, ans=0.0 2024-09-17 07:44:52,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=571500.3333333334, ans=0.0 2024-09-17 07:44:52,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=571500.3333333334, ans=0.0 2024-09-17 07:44:55,074 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.179e+02 2.302e+02 2.453e+02 3.233e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 07:45:05,623 INFO [train.py:1198] (1/2) Epoch 32, batch 3600, loss[loss=0.2132, ctc_loss=0.1385, cr_loss=0.3738, over 20980.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1511, cr_loss=0.3763, over 4106811.80 frames. ], batch size: 50, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:45:40,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571585.3333333334, ans=0.125 2024-09-17 07:45:55,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=22.5 2024-09-17 07:46:13,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571642.0, ans=0.1 2024-09-17 07:46:18,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=571642.0, ans=0.125 2024-09-17 07:46:23,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=571670.3333333334, ans=0.04949747468305833 2024-09-17 07:46:23,982 INFO [train.py:1198] (1/2) Epoch 32, batch 3650, loss[loss=0.2066, ctc_loss=0.1356, cr_loss=0.3549, over 20948.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1508, cr_loss=0.3762, over 4110124.21 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:47:25,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-17 07:47:28,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.151e+02 2.272e+02 2.492e+02 3.935e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 07:47:29,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-17 07:47:42,225 INFO [train.py:1198] (1/2) Epoch 32, batch 3700, loss[loss=0.2268, ctc_loss=0.1491, cr_loss=0.3885, over 20979.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1517, cr_loss=0.3779, over 4106161.11 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:48:19,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=571868.6666666666, ans=0.0 2024-09-17 07:48:25,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=571868.6666666666, ans=0.125 2024-09-17 07:48:58,068 INFO [train.py:1198] (1/2) Epoch 32, batch 3750, loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3709, over 20973.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1516, cr_loss=0.3774, over 4098756.63 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:49:07,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=571953.6666666666, ans=0.015 2024-09-17 07:49:41,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-17 07:50:02,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.170e+02 2.309e+02 2.463e+02 4.810e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 07:50:03,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-17 07:50:13,298 INFO [train.py:1198] (1/2) Epoch 32, batch 3800, loss[loss=0.2352, ctc_loss=0.1549, cr_loss=0.4016, over 21031.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1499, cr_loss=0.3747, over 4107146.74 frames. ], batch size: 62, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:50:22,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=572095.3333333334, ans=0.025 2024-09-17 07:50:22,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=572095.3333333334, ans=0.0 2024-09-17 07:51:17,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=572208.6666666666, ans=0.025 2024-09-17 07:51:25,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=572208.6666666666, ans=0.125 2024-09-17 07:51:32,478 INFO [train.py:1198] (1/2) Epoch 32, batch 3850, loss[loss=0.2526, ctc_loss=0.1742, cr_loss=0.392, over 18019.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.3752, over 4093374.89 frames. ], batch size: 108, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:51:46,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=572265.3333333334, ans=0.0 2024-09-17 07:52:03,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=12.0 2024-09-17 07:52:23,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572322.0, ans=0.1 2024-09-17 07:52:33,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=572350.3333333334, ans=0.015 2024-09-17 07:52:38,229 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.115e+02 2.269e+02 2.413e+02 5.300e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-17 07:52:48,890 INFO [train.py:1198] (1/2) Epoch 32, batch 3900, loss[loss=0.2253, ctc_loss=0.1497, cr_loss=0.3781, over 21023.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1504, cr_loss=0.3753, over 4084604.32 frames. ], batch size: 62, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:52:55,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=572378.6666666666, ans=0.125 2024-09-17 07:53:13,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=572407.0, ans=0.125 2024-09-17 07:53:15,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572407.0, ans=0.1 2024-09-17 07:53:30,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=572435.3333333334, ans=0.1 2024-09-17 07:54:01,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572492.0, ans=0.1 2024-09-17 07:54:08,727 INFO [train.py:1198] (1/2) Epoch 32, batch 3950, loss[loss=0.2314, ctc_loss=0.1547, cr_loss=0.3835, over 20980.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1503, cr_loss=0.376, over 4092141.23 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:54:36,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=572548.6666666666, ans=0.1 2024-09-17 07:55:13,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.147e+02 2.300e+02 2.436e+02 4.605e+02, threshold=4.601e+02, percent-clipped=2.0 2024-09-17 07:55:23,942 INFO [train.py:1198] (1/2) Epoch 32, batch 4000, loss[loss=0.2156, ctc_loss=0.1404, cr_loss=0.376, over 21053.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1499, cr_loss=0.3758, over 4093818.61 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:55:34,924 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:55:45,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=572690.3333333334, ans=0.125 2024-09-17 07:56:16,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=572747.0, ans=0.125 2024-09-17 07:56:25,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=572775.3333333334, ans=0.0 2024-09-17 07:56:38,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=572775.3333333334, ans=0.0 2024-09-17 07:56:43,123 INFO [train.py:1198] (1/2) Epoch 32, batch 4050, loss[loss=0.2476, ctc_loss=0.1707, cr_loss=0.3844, over 21024.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1497, cr_loss=0.3757, over 4103673.24 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:57:23,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2024-09-17 07:57:30,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-17 07:57:48,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.157e+02 2.277e+02 2.428e+02 2.938e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-17 07:57:59,005 INFO [train.py:1198] (1/2) Epoch 32, batch 4100, loss[loss=0.2353, ctc_loss=0.1537, cr_loss=0.4082, over 21087.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1499, cr_loss=0.3765, over 4101277.58 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 64.0 2024-09-17 07:58:00,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=8.0 2024-09-17 07:58:09,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.87 vs. limit=10.0 2024-09-17 07:58:13,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=572973.6666666666, ans=0.125 2024-09-17 07:58:41,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=573002.0, ans=0.125 2024-09-17 07:59:17,738 INFO [train.py:1198] (1/2) Epoch 32, batch 4150, loss[loss=0.2468, ctc_loss=0.1671, cr_loss=0.3985, over 20649.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1509, cr_loss=0.3773, over 4097167.59 frames. ], batch size: 68, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:59:18,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-17 07:59:19,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=573087.0, ans=0.125 2024-09-17 07:59:51,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=573143.6666666666, ans=0.125 2024-09-17 08:00:24,715 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.322e+02 2.502e+02 5.137e+02, threshold=4.644e+02, percent-clipped=1.0 2024-09-17 08:00:31,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573200.3333333334, ans=0.1 2024-09-17 08:00:33,721 INFO [train.py:1198] (1/2) Epoch 32, batch 4200, loss[loss=0.2844, ctc_loss=0.1956, cr_loss=0.444, over 18060.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1506, cr_loss=0.3761, over 4074496.56 frames. ], batch size: 108, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:00:40,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573228.6666666666, ans=0.1 2024-09-17 08:01:18,250 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:01:28,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=573313.6666666666, ans=0.125 2024-09-17 08:01:49,579 INFO [train.py:1198] (1/2) Epoch 32, batch 4250, loss[loss=0.2063, ctc_loss=0.1377, cr_loss=0.3426, over 19995.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1503, cr_loss=0.3756, over 4089159.44 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:01:53,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573370.3333333334, ans=0.125 2024-09-17 08:02:02,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2024-09-17 08:02:28,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=573427.0, ans=0.2 2024-09-17 08:03:00,064 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.170e+02 2.279e+02 2.432e+02 3.476e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 08:03:09,336 INFO [train.py:1198] (1/2) Epoch 32, batch 4300, loss[loss=0.1817, ctc_loss=0.1185, cr_loss=0.3161, over 20956.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1494, cr_loss=0.3749, over 4104194.73 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:03:33,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=573540.3333333334, ans=0.125 2024-09-17 08:04:01,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573597.0, ans=0.1 2024-09-17 08:04:04,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=573597.0, ans=0.125 2024-09-17 08:04:24,875 INFO [train.py:1198] (1/2) Epoch 32, batch 4350, loss[loss=0.185, ctc_loss=0.1188, cr_loss=0.331, over 20995.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3744, over 4103641.02 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:04:29,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=573653.6666666666, ans=0.125 2024-09-17 08:04:57,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=573710.3333333334, ans=0.125 2024-09-17 08:04:57,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573710.3333333334, ans=0.1 2024-09-17 08:04:58,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573710.3333333334, ans=0.125 2024-09-17 08:05:02,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-09-17 08:05:03,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=573710.3333333334, ans=0.125 2024-09-17 08:05:17,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-17 08:05:35,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.141e+02 2.271e+02 2.417e+02 5.155e+02, threshold=4.542e+02, percent-clipped=1.0 2024-09-17 08:05:42,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573795.3333333334, ans=0.0 2024-09-17 08:05:43,993 INFO [train.py:1198] (1/2) Epoch 32, batch 4400, loss[loss=0.2019, ctc_loss=0.1304, cr_loss=0.3577, over 21071.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4104694.60 frames. ], batch size: 53, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:06:01,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-17 08:06:38,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=573880.3333333334, ans=0.125 2024-09-17 08:07:00,079 INFO [train.py:1198] (1/2) Epoch 32, batch 4450, loss[loss=0.2223, ctc_loss=0.1475, cr_loss=0.3738, over 20977.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1482, cr_loss=0.3729, over 4111703.43 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:07:00,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573937.0, ans=0.1 2024-09-17 08:07:06,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=573937.0, ans=0.025 2024-09-17 08:07:14,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=573965.3333333334, ans=0.0 2024-09-17 08:07:18,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573965.3333333334, ans=0.125 2024-09-17 08:07:30,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=573993.6666666666, ans=0.125 2024-09-17 08:08:10,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.115e+02 2.257e+02 2.394e+02 3.432e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-17 08:08:15,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-09-17 08:08:18,079 INFO [train.py:1198] (1/2) Epoch 32, batch 4500, loss[loss=0.2335, ctc_loss=0.1528, cr_loss=0.4032, over 20785.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1484, cr_loss=0.3731, over 4111034.59 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:08:31,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574107.0, ans=0.125 2024-09-17 08:08:47,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=574135.3333333334, ans=0.015 2024-09-17 08:08:56,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-09-17 08:09:23,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=574192.0, ans=0.125 2024-09-17 08:09:33,775 INFO [train.py:1198] (1/2) Epoch 32, batch 4550, loss[loss=0.2302, ctc_loss=0.152, cr_loss=0.391, over 20879.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3745, over 4114872.04 frames. ], batch size: 57, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:09:56,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2024-09-17 08:10:04,840 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:10:45,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.219e+02 2.336e+02 2.528e+02 3.714e+02, threshold=4.672e+02, percent-clipped=0.0 2024-09-17 08:10:47,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=574333.6666666666, ans=0.2 2024-09-17 08:10:52,984 INFO [train.py:1198] (1/2) Epoch 32, batch 4600, loss[loss=0.2874, ctc_loss=0.2016, cr_loss=0.4292, over 18303.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3732, over 4099593.27 frames. ], batch size: 108, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:11:16,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574390.3333333334, ans=0.1 2024-09-17 08:11:24,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574418.6666666666, ans=0.1 2024-09-17 08:11:48,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=574447.0, ans=0.95 2024-09-17 08:11:57,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=574475.3333333334, ans=0.02 2024-09-17 08:12:03,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=574475.3333333334, ans=0.125 2024-09-17 08:12:05,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=574475.3333333334, ans=0.125 2024-09-17 08:12:09,634 INFO [train.py:1198] (1/2) Epoch 32, batch 4650, loss[loss=0.2254, ctc_loss=0.1533, cr_loss=0.3605, over 21081.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1482, cr_loss=0.3714, over 4109108.43 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:12:55,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574588.6666666666, ans=0.1 2024-09-17 08:13:17,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.145e+02 2.298e+02 2.514e+02 5.438e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-17 08:13:25,593 INFO [train.py:1198] (1/2) Epoch 32, batch 4700, loss[loss=0.232, ctc_loss=0.1567, cr_loss=0.3765, over 20666.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3719, over 4111473.02 frames. ], batch size: 71, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:13:47,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=574673.6666666666, ans=0.125 2024-09-17 08:14:02,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=574702.0, ans=0.125 2024-09-17 08:14:08,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=574702.0, ans=0.125 2024-09-17 08:14:38,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=574758.6666666666, ans=0.2 2024-09-17 08:14:43,936 INFO [train.py:1198] (1/2) Epoch 32, batch 4750, loss[loss=0.2176, ctc_loss=0.1456, cr_loss=0.3599, over 19563.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3724, over 4109994.33 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:14:50,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=574787.0, ans=0.0 2024-09-17 08:14:53,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-17 08:15:40,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=574872.0, ans=0.0 2024-09-17 08:15:45,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=574900.3333333334, ans=10.0 2024-09-17 08:15:46,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=574900.3333333334, ans=0.0 2024-09-17 08:15:52,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.165e+02 2.283e+02 2.412e+02 3.982e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 08:15:54,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=574900.3333333334, ans=0.125 2024-09-17 08:15:59,761 INFO [train.py:1198] (1/2) Epoch 32, batch 4800, loss[loss=0.2399, ctc_loss=0.1624, cr_loss=0.3877, over 20940.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3736, over 4103479.08 frames. ], batch size: 64, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:16:16,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=574957.0, ans=0.025 2024-09-17 08:16:31,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=574985.3333333334, ans=0.125 2024-09-17 08:17:05,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=575042.0, ans=0.125 2024-09-17 08:17:11,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575042.0, ans=0.125 2024-09-17 08:17:17,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575070.3333333334, ans=0.1 2024-09-17 08:17:18,630 INFO [train.py:1198] (1/2) Epoch 32, batch 4850, loss[loss=0.224, ctc_loss=0.1476, cr_loss=0.3822, over 20877.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3735, over 4103427.49 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:17:54,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=575127.0, ans=0.125 2024-09-17 08:17:58,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=575127.0, ans=0.0 2024-09-17 08:18:26,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.177e+02 2.291e+02 2.500e+02 8.857e+02, threshold=4.582e+02, percent-clipped=2.0 2024-09-17 08:18:27,014 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:18:34,131 INFO [train.py:1198] (1/2) Epoch 32, batch 4900, loss[loss=0.2316, ctc_loss=0.1551, cr_loss=0.3827, over 21085.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.374, over 4097520.26 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:18:37,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=575212.0, ans=0.0 2024-09-17 08:19:03,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=575268.6666666666, ans=0.125 2024-09-17 08:19:31,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575297.0, ans=0.1 2024-09-17 08:19:40,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=575325.3333333334, ans=0.125 2024-09-17 08:19:48,821 INFO [train.py:1198] (1/2) Epoch 32, batch 4950, loss[loss=0.236, ctc_loss=0.1558, cr_loss=0.4008, over 21079.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3736, over 4092734.71 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:20:59,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.159e+02 2.267e+02 2.468e+02 4.210e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 08:21:06,949 INFO [train.py:1198] (1/2) Epoch 32, batch 5000, loss[loss=0.2791, ctc_loss=0.199, cr_loss=0.4004, over 14365.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.149, cr_loss=0.3736, over 4086077.93 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:21:21,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=575523.6666666666, ans=0.2 2024-09-17 08:21:43,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575552.0, ans=0.125 2024-09-17 08:21:51,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=575580.3333333334, ans=0.125 2024-09-17 08:21:57,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=575580.3333333334, ans=0.2 2024-09-17 08:22:12,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 08:22:17,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575608.6666666666, ans=0.0 2024-09-17 08:22:21,554 INFO [train.py:1198] (1/2) Epoch 32, batch 5050, loss[loss=0.1894, ctc_loss=0.1226, cr_loss=0.3342, over 21058.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3736, over 4092145.36 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:22:30,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575637.0, ans=0.1 2024-09-17 08:23:09,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-17 08:23:28,219 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.163e+02 2.300e+02 2.480e+02 3.319e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-17 08:23:30,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-09-17 08:23:35,758 INFO [train.py:1198] (1/2) Epoch 32, batch 5100, loss[loss=0.237, ctc_loss=0.1562, cr_loss=0.4044, over 20964.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3744, over 4084609.49 frames. ], batch size: 64, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:24:04,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575835.3333333334, ans=0.125 2024-09-17 08:24:09,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575835.3333333334, ans=0.125 2024-09-17 08:24:20,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=575835.3333333334, ans=0.025 2024-09-17 08:24:47,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.26 vs. limit=6.0 2024-09-17 08:24:52,860 INFO [train.py:1198] (1/2) Epoch 32, batch 5150, loss[loss=0.2424, ctc_loss=0.1657, cr_loss=0.3835, over 20930.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.375, over 4071721.28 frames. ], batch size: 60, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:24:57,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575920.3333333334, ans=0.125 2024-09-17 08:25:00,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=575920.3333333334, ans=0.05 2024-09-17 08:25:21,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575977.0, ans=0.125 2024-09-17 08:25:46,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=576005.3333333334, ans=0.05 2024-09-17 08:25:59,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.218e+02 2.339e+02 2.528e+02 3.794e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-17 08:26:07,279 INFO [train.py:1198] (1/2) Epoch 32, batch 5200, loss[loss=0.246, ctc_loss=0.1635, cr_loss=0.4125, over 21024.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3739, over 4082005.19 frames. ], batch size: 61, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:26:09,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-17 08:26:40,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576118.6666666666, ans=0.1 2024-09-17 08:26:55,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.69 vs. limit=10.0 2024-09-17 08:27:02,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=576147.0, ans=0.1 2024-09-17 08:27:21,611 INFO [train.py:1198] (1/2) Epoch 32, batch 5250, loss[loss=0.2233, ctc_loss=0.1488, cr_loss=0.3722, over 21013.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3739, over 4091043.78 frames. ], batch size: 63, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:27:44,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=576232.0, ans=0.125 2024-09-17 08:27:45,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=576232.0, ans=0.025 2024-09-17 08:28:28,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.124e+02 2.226e+02 2.349e+02 2.879e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-17 08:28:35,970 INFO [train.py:1198] (1/2) Epoch 32, batch 5300, loss[loss=0.2321, ctc_loss=0.154, cr_loss=0.3903, over 21011.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3748, over 4088452.88 frames. ], batch size: 62, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:28:42,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=576345.3333333334, ans=0.0 2024-09-17 08:28:57,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=576373.6666666666, ans=0.125 2024-09-17 08:29:16,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=576402.0, ans=0.0 2024-09-17 08:29:28,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=576430.3333333334, ans=0.2 2024-09-17 08:29:32,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576430.3333333334, ans=0.0 2024-09-17 08:29:50,223 INFO [train.py:1198] (1/2) Epoch 32, batch 5350, loss[loss=0.267, ctc_loss=0.1842, cr_loss=0.4138, over 18391.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3745, over 4092930.75 frames. ], batch size: 108, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:29:50,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576487.0, ans=0.0 2024-09-17 08:30:03,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=576487.0, ans=0.125 2024-09-17 08:30:24,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=576543.6666666666, ans=0.125 2024-09-17 08:30:45,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2024-09-17 08:30:52,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=576600.3333333334, ans=0.0 2024-09-17 08:30:59,968 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.153e+02 2.282e+02 2.445e+02 6.157e+02, threshold=4.565e+02, percent-clipped=1.0 2024-09-17 08:31:02,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-17 08:31:07,414 INFO [train.py:1198] (1/2) Epoch 32, batch 5400, loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.367, over 20780.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1503, cr_loss=0.3752, over 4099271.42 frames. ], batch size: 56, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:31:30,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=576657.0, ans=10.0 2024-09-17 08:32:11,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-09-17 08:32:18,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-09-17 08:32:21,566 INFO [train.py:1198] (1/2) Epoch 32, batch 5450, loss[loss=0.1935, ctc_loss=0.1273, cr_loss=0.3312, over 20955.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3746, over 4107802.33 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:32:27,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=576770.3333333334, ans=0.2 2024-09-17 08:33:11,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576855.3333333334, ans=0.0 2024-09-17 08:33:23,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2024-09-17 08:33:28,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.140e+02 2.263e+02 2.431e+02 3.030e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-17 08:33:36,086 INFO [train.py:1198] (1/2) Epoch 32, batch 5500, loss[loss=0.1953, ctc_loss=0.1273, cr_loss=0.3404, over 19906.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3737, over 4103180.82 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:33:38,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576912.0, ans=0.1 2024-09-17 08:33:42,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-17 08:33:50,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=576912.0, ans=0.125 2024-09-17 08:33:54,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-17 08:34:35,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=576997.0, ans=0.2 2024-09-17 08:34:53,238 INFO [train.py:1198] (1/2) Epoch 32, batch 5550, loss[loss=0.2251, ctc_loss=0.1501, cr_loss=0.3747, over 21004.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3748, over 4098527.95 frames. ], batch size: 61, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:35:19,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-17 08:35:33,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=577110.3333333334, ans=0.125 2024-09-17 08:36:00,072 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.138e+02 2.254e+02 2.526e+02 5.415e+02, threshold=4.507e+02, percent-clipped=1.0 2024-09-17 08:36:03,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=577167.0, ans=0.125 2024-09-17 08:36:07,470 INFO [train.py:1198] (1/2) Epoch 32, batch 5600, loss[loss=0.1905, ctc_loss=0.1261, cr_loss=0.3222, over 20987.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3741, over 4096825.30 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:36:09,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=577195.3333333334, ans=0.125 2024-09-17 08:36:13,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=577195.3333333334, ans=0.125 2024-09-17 08:36:18,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=577195.3333333334, ans=0.125 2024-09-17 08:36:37,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=577252.0, ans=0.125 2024-09-17 08:37:22,027 INFO [train.py:1198] (1/2) Epoch 32, batch 5650, loss[loss=0.2483, ctc_loss=0.165, cr_loss=0.4166, over 20983.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3745, over 4102283.96 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:37:22,398 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:37:46,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=577365.3333333334, ans=0.0 2024-09-17 08:37:53,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=577393.6666666666, ans=0.125 2024-09-17 08:38:26,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=577450.3333333334, ans=0.0 2024-09-17 08:38:30,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.140e+02 2.256e+02 2.414e+02 2.895e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-17 08:38:36,429 INFO [train.py:1198] (1/2) Epoch 32, batch 5700, loss[loss=0.1916, ctc_loss=0.1251, cr_loss=0.3325, over 20948.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3744, over 4100086.66 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:39:03,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=577507.0, ans=0.125 2024-09-17 08:39:07,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.14 vs. limit=10.0 2024-09-17 08:39:36,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 08:39:42,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 08:39:53,452 INFO [train.py:1198] (1/2) Epoch 32, batch 5750, loss[loss=0.2742, ctc_loss=0.1912, cr_loss=0.4153, over 14075.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1497, cr_loss=0.374, over 4095602.53 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:39:55,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=577620.3333333334, ans=0.125 2024-09-17 08:39:55,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577620.3333333334, ans=0.125 2024-09-17 08:40:01,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=577620.3333333334, ans=0.2 2024-09-17 08:40:53,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577733.6666666666, ans=0.1 2024-09-17 08:41:02,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.147e+02 2.280e+02 2.436e+02 3.460e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 08:41:03,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577733.6666666666, ans=0.125 2024-09-17 08:41:06,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=577762.0, ans=0.2 2024-09-17 08:41:07,973 INFO [train.py:1198] (1/2) Epoch 32, batch 5800, loss[loss=0.2246, ctc_loss=0.1466, cr_loss=0.3898, over 20793.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3734, over 4103143.41 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:41:42,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2024-09-17 08:42:24,424 INFO [train.py:1198] (1/2) Epoch 32, batch 5850, loss[loss=0.25, ctc_loss=0.1724, cr_loss=0.3882, over 20273.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1498, cr_loss=0.3744, over 4099269.26 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:43:10,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=577988.6666666666, ans=0.2 2024-09-17 08:43:32,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.221e+02 2.314e+02 2.487e+02 3.047e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-17 08:43:33,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=578017.0, ans=0.07 2024-09-17 08:43:38,842 INFO [train.py:1198] (1/2) Epoch 32, batch 5900, loss[loss=0.2488, ctc_loss=0.167, cr_loss=0.4091, over 20715.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3763, over 4087139.71 frames. ], batch size: 68, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:43:45,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=578045.3333333334, ans=0.0 2024-09-17 08:43:49,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578045.3333333334, ans=0.1 2024-09-17 08:43:55,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578073.6666666666, ans=0.1 2024-09-17 08:44:16,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-17 08:44:40,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=578158.6666666666, ans=0.0 2024-09-17 08:44:43,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=578158.6666666666, ans=0.5 2024-09-17 08:44:53,763 INFO [train.py:1198] (1/2) Epoch 32, batch 5950, loss[loss=0.2334, ctc_loss=0.1544, cr_loss=0.3947, over 20848.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1497, cr_loss=0.3738, over 4094437.89 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:45:19,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578215.3333333334, ans=0.1 2024-09-17 08:46:02,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.180e+02 2.291e+02 2.464e+02 3.593e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 08:46:03,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=578300.3333333334, ans=0.07 2024-09-17 08:46:07,996 INFO [train.py:1198] (1/2) Epoch 32, batch 6000, loss[loss=0.2026, ctc_loss=0.1346, cr_loss=0.3402, over 20987.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3727, over 4081837.48 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:46:07,996 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 08:46:34,023 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.04093, ctc_loss=0.04093, cr_loss=1.254e-14, over 944034.00 frames. 2024-09-17 08:46:34,024 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 08:46:41,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=578328.6666666666, ans=0.125 2024-09-17 08:47:33,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-09-17 08:47:51,537 INFO [train.py:1198] (1/2) Epoch 32, batch 6050, loss[loss=0.2499, ctc_loss=0.1645, cr_loss=0.4271, over 20852.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.372, over 4092002.48 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:47:57,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=578470.3333333334, ans=0.0 2024-09-17 08:48:55,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=578583.6666666666, ans=0.0 2024-09-17 08:49:01,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.099e+02 2.286e+02 2.447e+02 8.010e+02, threshold=4.571e+02, percent-clipped=1.0 2024-09-17 08:49:07,160 INFO [train.py:1198] (1/2) Epoch 32, batch 6100, loss[loss=0.2081, ctc_loss=0.1403, cr_loss=0.3389, over 20781.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3709, over 4105309.56 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:49:47,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578668.6666666666, ans=0.1 2024-09-17 08:50:16,915 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:50:17,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-17 08:50:20,904 INFO [train.py:1198] (1/2) Epoch 32, batch 6150, loss[loss=0.2927, ctc_loss=0.2072, cr_loss=0.4276, over 14455.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3744, over 4083116.73 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:50:53,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578810.3333333334, ans=0.1 2024-09-17 08:50:58,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=578810.3333333334, ans=0.125 2024-09-17 08:51:30,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.191e+02 2.316e+02 2.492e+02 4.494e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 08:51:36,356 INFO [train.py:1198] (1/2) Epoch 32, batch 6200, loss[loss=0.2119, ctc_loss=0.1409, cr_loss=0.3551, over 20967.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3741, over 4075577.28 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:52:03,010 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:52:26,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=578980.3333333334, ans=0.125 2024-09-17 08:52:32,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 08:52:45,563 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:52:49,569 INFO [train.py:1198] (1/2) Epoch 32, batch 6250, loss[loss=0.2809, ctc_loss=0.2012, cr_loss=0.3983, over 13754.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3735, over 4026997.51 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:52:54,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=579037.0, ans=0.2 2024-09-17 08:53:50,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=579150.3333333334, ans=0.05 2024-09-17 08:53:57,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.209e+02 2.363e+02 2.577e+02 3.623e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-17 08:54:02,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=579178.6666666666, ans=0.0 2024-09-17 08:54:03,313 INFO [train.py:1198] (1/2) Epoch 32, batch 6300, loss[loss=0.2627, ctc_loss=0.1868, cr_loss=0.3797, over 13859.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1521, cr_loss=0.3758, over 3990734.05 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:54:24,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579207.0, ans=0.125 2024-09-17 08:54:56,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=579263.6666666666, ans=0.025 2024-09-17 08:55:15,925 INFO [train.py:1198] (1/2) Epoch 32, batch 6350, loss[loss=0.2695, ctc_loss=0.1906, cr_loss=0.3941, over 14448.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1578, cr_loss=0.3791, over 3766206.64 frames. ], batch size: 149, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:55:32,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-17 08:55:39,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=579348.6666666666, ans=0.125 2024-09-17 08:57:05,637 INFO [train.py:1198] (1/2) Epoch 33, batch 0, loss[loss=0.1873, ctc_loss=0.1208, cr_loss=0.3324, over 20310.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1208, cr_loss=0.3324, over 20310.00 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 08:57:05,637 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 08:57:24,040 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.04005, ctc_loss=0.04005, cr_loss=1.295e-14, over 944034.00 frames. 2024-09-17 08:57:24,041 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 08:57:31,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.630e+02 2.802e+02 3.019e+02 5.106e+02, threshold=5.605e+02, percent-clipped=1.0 2024-09-17 08:57:48,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=579464.8333333334, ans=0.0 2024-09-17 08:58:27,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579549.8333333334, ans=0.1 2024-09-17 08:58:27,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-17 08:58:31,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-17 08:58:42,108 INFO [train.py:1198] (1/2) Epoch 33, batch 50, loss[loss=0.214, ctc_loss=0.1428, cr_loss=0.3556, over 20302.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.149, cr_loss=0.3758, over 922358.89 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 08:58:43,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=579578.1666666666, ans=15.0 2024-09-17 08:58:48,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=579578.1666666666, ans=0.0 2024-09-17 08:59:09,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=579606.5, ans=0.09899494936611666 2024-09-17 08:59:14,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-17 08:59:29,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2024-09-17 08:59:52,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=15.0 2024-09-17 08:59:57,407 INFO [train.py:1198] (1/2) Epoch 33, batch 100, loss[loss=0.2359, ctc_loss=0.1582, cr_loss=0.3884, over 21056.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1493, cr_loss=0.3751, over 1618622.87 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:00:04,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.123e+02 2.249e+02 2.450e+02 4.320e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-17 09:00:08,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=579719.8333333334, ans=0.2 2024-09-17 09:00:14,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=579748.1666666666, ans=0.0 2024-09-17 09:00:35,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2024-09-17 09:00:37,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.0 2024-09-17 09:00:51,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-17 09:01:12,502 INFO [train.py:1198] (1/2) Epoch 33, batch 150, loss[loss=0.2423, ctc_loss=0.162, cr_loss=0.4015, over 20882.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.151, cr_loss=0.3773, over 2152143.33 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:02:03,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579946.5, ans=0.125 2024-09-17 09:02:05,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=579946.5, ans=0.125 2024-09-17 09:02:20,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=579974.8333333334, ans=0.125 2024-09-17 09:02:24,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-09-17 09:02:30,730 INFO [train.py:1198] (1/2) Epoch 33, batch 200, loss[loss=0.2028, ctc_loss=0.1349, cr_loss=0.3394, over 20772.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3744, over 2568043.97 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:02:39,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.186e+02 2.332e+02 2.527e+02 4.428e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-17 09:03:10,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-09-17 09:03:16,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=580088.1666666666, ans=0.125 2024-09-17 09:03:38,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=580116.5, ans=0.125 2024-09-17 09:03:49,518 INFO [train.py:1198] (1/2) Epoch 33, batch 250, loss[loss=0.2546, ctc_loss=0.171, cr_loss=0.4183, over 20322.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.373, over 2906573.38 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:04:03,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=580173.1666666666, ans=0.125 2024-09-17 09:04:07,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=580173.1666666666, ans=0.125 2024-09-17 09:04:12,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=580173.1666666666, ans=10.0 2024-09-17 09:04:54,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=580258.1666666666, ans=0.0 2024-09-17 09:05:00,726 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:05:04,513 INFO [train.py:1198] (1/2) Epoch 33, batch 300, loss[loss=0.222, ctc_loss=0.147, cr_loss=0.375, over 21074.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3753, over 3159193.75 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:05:04,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=580286.5, ans=0.0 2024-09-17 09:05:13,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.170e+02 2.282e+02 2.417e+02 3.185e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-17 09:05:24,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=580314.8333333334, ans=0.2 2024-09-17 09:05:24,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=580314.8333333334, ans=0.125 2024-09-17 09:05:26,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-17 09:05:32,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-17 09:05:48,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-17 09:05:57,052 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:06:11,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=580399.8333333334, ans=0.125 2024-09-17 09:06:19,099 INFO [train.py:1198] (1/2) Epoch 33, batch 350, loss[loss=0.2392, ctc_loss=0.1615, cr_loss=0.3883, over 21042.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1506, cr_loss=0.3756, over 3350506.92 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:06:22,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=580428.1666666666, ans=0.05 2024-09-17 09:06:41,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-17 09:06:43,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=580456.5, ans=0.2 2024-09-17 09:07:10,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=580513.1666666666, ans=0.125 2024-09-17 09:07:37,533 INFO [train.py:1198] (1/2) Epoch 33, batch 400, loss[loss=0.2102, ctc_loss=0.1383, cr_loss=0.3595, over 20765.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3746, over 3504055.89 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:07:46,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.156e+02 2.238e+02 2.437e+02 9.103e+02, threshold=4.477e+02, percent-clipped=1.0 2024-09-17 09:07:47,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-09-17 09:07:52,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=580598.1666666666, ans=0.0 2024-09-17 09:08:17,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 09:08:50,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=580683.1666666666, ans=0.125 2024-09-17 09:08:53,064 INFO [train.py:1198] (1/2) Epoch 33, batch 450, loss[loss=0.2422, ctc_loss=0.166, cr_loss=0.381, over 19489.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3729, over 3646083.29 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:09:52,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580796.5, ans=0.1 2024-09-17 09:10:12,198 INFO [train.py:1198] (1/2) Epoch 33, batch 500, loss[loss=0.2086, ctc_loss=0.1364, cr_loss=0.361, over 20989.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.148, cr_loss=0.3709, over 3759518.47 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:10:20,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.185e+02 2.280e+02 2.478e+02 4.436e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-17 09:11:27,447 INFO [train.py:1198] (1/2) Epoch 33, batch 550, loss[loss=0.2315, ctc_loss=0.1567, cr_loss=0.3743, over 21028.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1484, cr_loss=0.3717, over 3832687.60 frames. ], batch size: 63, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:11:40,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-17 09:12:16,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-17 09:12:42,074 INFO [train.py:1198] (1/2) Epoch 33, batch 600, loss[loss=0.185, ctc_loss=0.1194, cr_loss=0.328, over 20936.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3718, over 3897211.30 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:12:51,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.159e+02 2.286e+02 2.461e+02 2.995e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 09:12:59,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=581164.8333333334, ans=0.2 2024-09-17 09:13:07,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-17 09:13:42,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=581221.5, ans=0.0 2024-09-17 09:13:52,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2024-09-17 09:14:00,976 INFO [train.py:1198] (1/2) Epoch 33, batch 650, loss[loss=0.2038, ctc_loss=0.1332, cr_loss=0.3529, over 20958.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3733, over 3932510.63 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:14:17,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2024-09-17 09:14:24,118 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:15:07,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=581391.5, ans=0.125 2024-09-17 09:15:20,218 INFO [train.py:1198] (1/2) Epoch 33, batch 700, loss[loss=0.2139, ctc_loss=0.1412, cr_loss=0.3637, over 20967.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3734, over 3965843.64 frames. ], batch size: 64, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:15:21,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=581419.8333333334, ans=0.0 2024-09-17 09:15:29,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.140e+02 2.298e+02 2.476e+02 5.299e+02, threshold=4.596e+02, percent-clipped=1.0 2024-09-17 09:15:43,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=581448.1666666666, ans=0.125 2024-09-17 09:15:57,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-09-17 09:16:01,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=581476.5, ans=0.125 2024-09-17 09:16:35,770 INFO [train.py:1198] (1/2) Epoch 33, batch 750, loss[loss=0.2189, ctc_loss=0.1463, cr_loss=0.3633, over 20839.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.373, over 3999376.96 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:16:58,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=581589.8333333334, ans=0.125 2024-09-17 09:17:29,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=581646.5, ans=0.0 2024-09-17 09:17:51,357 INFO [train.py:1198] (1/2) Epoch 33, batch 800, loss[loss=0.2106, ctc_loss=0.1448, cr_loss=0.3292, over 21064.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.149, cr_loss=0.3724, over 4014829.53 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:18:00,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.168e+02 2.266e+02 2.451e+02 3.548e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 09:18:28,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-09-17 09:19:10,019 INFO [train.py:1198] (1/2) Epoch 33, batch 850, loss[loss=0.2405, ctc_loss=0.1632, cr_loss=0.3866, over 20719.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3729, over 4040821.64 frames. ], batch size: 71, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:19:47,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=581901.5, ans=0.125 2024-09-17 09:20:25,245 INFO [train.py:1198] (1/2) Epoch 33, batch 900, loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3687, over 20690.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3737, over 4042797.40 frames. ], batch size: 68, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:20:34,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.172e+02 2.301e+02 2.463e+02 3.738e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 09:20:51,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=582014.8333333334, ans=0.0 2024-09-17 09:20:57,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:21:06,554 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:21:26,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582071.5, ans=0.1 2024-09-17 09:21:37,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.87 vs. limit=10.0 2024-09-17 09:21:43,796 INFO [train.py:1198] (1/2) Epoch 33, batch 950, loss[loss=0.2404, ctc_loss=0.1609, cr_loss=0.3976, over 19725.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1492, cr_loss=0.3736, over 4059711.70 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:22:16,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=582184.8333333334, ans=0.125 2024-09-17 09:22:59,012 INFO [train.py:1198] (1/2) Epoch 33, batch 1000, loss[loss=0.2345, ctc_loss=0.157, cr_loss=0.3872, over 20667.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3738, over 4072291.61 frames. ], batch size: 71, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:23:08,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.176e+02 2.318e+02 2.450e+02 2.982e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-17 09:23:17,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582298.1666666666, ans=0.125 2024-09-17 09:23:52,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=582354.8333333334, ans=0.0 2024-09-17 09:23:55,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=582354.8333333334, ans=0.0 2024-09-17 09:23:57,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-17 09:24:14,641 INFO [train.py:1198] (1/2) Epoch 33, batch 1050, loss[loss=0.2068, ctc_loss=0.1364, cr_loss=0.3518, over 20950.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1482, cr_loss=0.3717, over 4085299.14 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:25:29,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=582524.8333333334, ans=0.0 2024-09-17 09:25:33,468 INFO [train.py:1198] (1/2) Epoch 33, batch 1100, loss[loss=0.2618, ctc_loss=0.1779, cr_loss=0.4193, over 20824.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3729, over 4085266.43 frames. ], batch size: 65, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:25:42,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.126e+02 2.267e+02 2.389e+02 3.726e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-17 09:25:46,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=582581.5, ans=0.125 2024-09-17 09:26:14,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=582609.8333333334, ans=0.2 2024-09-17 09:26:20,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-17 09:26:23,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=582638.1666666666, ans=0.0 2024-09-17 09:26:49,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=582666.5, ans=0.125 2024-09-17 09:26:52,408 INFO [train.py:1198] (1/2) Epoch 33, batch 1150, loss[loss=0.255, ctc_loss=0.1718, cr_loss=0.416, over 19369.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3721, over 4083373.65 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:26:55,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=582694.8333333334, ans=0.0 2024-09-17 09:27:09,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=582723.1666666666, ans=0.0 2024-09-17 09:27:23,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=582751.5, ans=0.0 2024-09-17 09:28:08,365 INFO [train.py:1198] (1/2) Epoch 33, batch 1200, loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3603, over 20761.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1477, cr_loss=0.3704, over 4090429.62 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:28:17,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.181e+02 2.326e+02 2.412e+02 3.196e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-17 09:29:24,148 INFO [train.py:1198] (1/2) Epoch 33, batch 1250, loss[loss=0.2464, ctc_loss=0.1667, cr_loss=0.3984, over 20328.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.147, cr_loss=0.3694, over 4092957.17 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:29:27,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=22.5 2024-09-17 09:29:37,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-17 09:29:39,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=583006.5, ans=0.125 2024-09-17 09:30:38,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=583091.5, ans=0.0 2024-09-17 09:30:43,140 INFO [train.py:1198] (1/2) Epoch 33, batch 1300, loss[loss=0.1945, ctc_loss=0.1232, cr_loss=0.3568, over 20945.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.147, cr_loss=0.3692, over 4096568.64 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:30:43,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=583119.8333333334, ans=0.125 2024-09-17 09:30:52,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.183e+02 2.292e+02 2.459e+02 3.044e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 09:30:54,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=583119.8333333334, ans=0.0 2024-09-17 09:30:57,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=583148.1666666666, ans=0.125 2024-09-17 09:31:05,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=583148.1666666666, ans=0.2 2024-09-17 09:31:09,470 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:31:33,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=583204.8333333334, ans=0.125 2024-09-17 09:31:39,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=583204.8333333334, ans=0.0 2024-09-17 09:31:46,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=583233.1666666666, ans=0.125 2024-09-17 09:31:58,674 INFO [train.py:1198] (1/2) Epoch 33, batch 1350, loss[loss=0.2158, ctc_loss=0.1443, cr_loss=0.3574, over 20961.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.3708, over 4104392.67 frames. ], batch size: 64, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:32:16,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2024-09-17 09:32:22,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-17 09:32:38,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-17 09:32:57,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=583346.5, ans=0.0 2024-09-17 09:33:17,330 INFO [train.py:1198] (1/2) Epoch 33, batch 1400, loss[loss=0.2457, ctc_loss=0.1654, cr_loss=0.4014, over 20323.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1478, cr_loss=0.3706, over 4095580.66 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:33:26,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.133e+02 2.260e+02 2.453e+02 4.298e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-17 09:34:33,339 INFO [train.py:1198] (1/2) Epoch 33, batch 1450, loss[loss=0.2085, ctc_loss=0.1377, cr_loss=0.354, over 20976.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3708, over 4106158.56 frames. ], batch size: 52, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:34:41,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583544.8333333334, ans=0.1 2024-09-17 09:34:53,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583573.1666666666, ans=0.1 2024-09-17 09:34:56,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=583573.1666666666, ans=0.125 2024-09-17 09:35:10,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=583601.5, ans=0.125 2024-09-17 09:35:49,330 INFO [train.py:1198] (1/2) Epoch 33, batch 1500, loss[loss=0.1907, ctc_loss=0.1267, cr_loss=0.3199, over 20873.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1479, cr_loss=0.3712, over 4100893.56 frames. ], batch size: 54, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:35:59,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.174e+02 2.285e+02 2.430e+02 3.782e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 09:36:24,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=583743.1666666666, ans=0.0 2024-09-17 09:36:30,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583743.1666666666, ans=0.1 2024-09-17 09:36:41,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=583771.5, ans=0.2 2024-09-17 09:36:48,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=583771.5, ans=0.05 2024-09-17 09:37:02,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=583799.8333333334, ans=0.125 2024-09-17 09:37:08,464 INFO [train.py:1198] (1/2) Epoch 33, batch 1550, loss[loss=0.2434, ctc_loss=0.1625, cr_loss=0.4044, over 20652.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3715, over 4102658.46 frames. ], batch size: 71, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:38:06,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=583913.1666666666, ans=0.2 2024-09-17 09:38:07,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583941.5, ans=0.1 2024-09-17 09:38:27,009 INFO [train.py:1198] (1/2) Epoch 33, batch 1600, loss[loss=0.2344, ctc_loss=0.159, cr_loss=0.3772, over 20669.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3741, over 4109035.47 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:38:37,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.139e+02 2.280e+02 2.404e+02 3.218e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 09:39:42,803 INFO [train.py:1198] (1/2) Epoch 33, batch 1650, loss[loss=0.204, ctc_loss=0.1339, cr_loss=0.3507, over 20870.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3723, over 4112774.87 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:39:49,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-17 09:40:48,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=584224.8333333334, ans=0.125 2024-09-17 09:40:49,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-17 09:40:57,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584253.1666666666, ans=0.1 2024-09-17 09:40:59,016 INFO [train.py:1198] (1/2) Epoch 33, batch 1700, loss[loss=0.2048, ctc_loss=0.1344, cr_loss=0.3518, over 21090.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1482, cr_loss=0.3715, over 4116766.14 frames. ], batch size: 59, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:41:09,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.145e+02 2.250e+02 2.407e+02 6.829e+02, threshold=4.500e+02, percent-clipped=1.0 2024-09-17 09:41:18,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=584281.5, ans=0.125 2024-09-17 09:41:28,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-17 09:41:48,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=584338.1666666666, ans=0.0 2024-09-17 09:41:58,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-17 09:42:17,532 INFO [train.py:1198] (1/2) Epoch 33, batch 1750, loss[loss=0.2143, ctc_loss=0.1439, cr_loss=0.3521, over 20995.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3731, over 4110229.75 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:42:30,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=584394.8333333334, ans=15.0 2024-09-17 09:42:52,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=584451.5, ans=0.025 2024-09-17 09:43:06,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-17 09:43:29,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=584508.1666666666, ans=0.125 2024-09-17 09:43:33,240 INFO [train.py:1198] (1/2) Epoch 33, batch 1800, loss[loss=0.199, ctc_loss=0.1331, cr_loss=0.3292, over 20858.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3737, over 4095685.15 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:43:33,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=584536.5, ans=0.0 2024-09-17 09:43:43,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.196e+02 2.310e+02 2.481e+02 3.292e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-17 09:43:45,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=584536.5, ans=0.2 2024-09-17 09:43:51,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=584564.8333333334, ans=0.125 2024-09-17 09:43:51,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=584564.8333333334, ans=0.125 2024-09-17 09:43:54,953 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:44:43,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=584649.8333333334, ans=0.125 2024-09-17 09:44:50,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=584649.8333333334, ans=15.0 2024-09-17 09:44:52,565 INFO [train.py:1198] (1/2) Epoch 33, batch 1850, loss[loss=0.1806, ctc_loss=0.12, cr_loss=0.3032, over 20930.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.3727, over 4091778.47 frames. ], batch size: 50, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:45:11,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584706.5, ans=0.1 2024-09-17 09:45:29,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=584734.8333333334, ans=0.2 2024-09-17 09:46:03,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=584791.5, ans=0.125 2024-09-17 09:46:08,673 INFO [train.py:1198] (1/2) Epoch 33, batch 1900, loss[loss=0.1937, ctc_loss=0.1255, cr_loss=0.3411, over 19938.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1475, cr_loss=0.3702, over 4099912.17 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:46:16,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=584819.8333333334, ans=0.2 2024-09-17 09:46:19,421 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.126e+02 2.319e+02 2.432e+02 3.467e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 09:47:23,703 INFO [train.py:1198] (1/2) Epoch 33, batch 1950, loss[loss=0.2317, ctc_loss=0.153, cr_loss=0.3935, over 20956.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1485, cr_loss=0.3715, over 4108425.26 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:47:42,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-17 09:47:47,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-17 09:47:49,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=584989.8333333334, ans=0.05 2024-09-17 09:48:15,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=585046.5, ans=0.0 2024-09-17 09:48:42,427 INFO [train.py:1198] (1/2) Epoch 33, batch 2000, loss[loss=0.194, ctc_loss=0.1259, cr_loss=0.3406, over 20282.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1481, cr_loss=0.3711, over 4116629.45 frames. ], batch size: 45, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:48:53,068 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.192e+02 2.288e+02 2.478e+02 4.432e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-17 09:49:28,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=585188.1666666666, ans=0.2 2024-09-17 09:49:50,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585216.5, ans=0.125 2024-09-17 09:50:01,305 INFO [train.py:1198] (1/2) Epoch 33, batch 2050, loss[loss=0.222, ctc_loss=0.1455, cr_loss=0.3823, over 20882.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1471, cr_loss=0.3701, over 4107721.71 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:50:04,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=585244.8333333334, ans=0.125 2024-09-17 09:50:21,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585273.1666666666, ans=0.1 2024-09-17 09:50:22,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=585273.1666666666, ans=0.125 2024-09-17 09:51:00,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-17 09:51:06,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=585358.1666666666, ans=0.95 2024-09-17 09:51:16,726 INFO [train.py:1198] (1/2) Epoch 33, batch 2100, loss[loss=0.2029, ctc_loss=0.1336, cr_loss=0.3466, over 20995.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1473, cr_loss=0.3703, over 4101326.54 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:51:30,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.177e+02 2.299e+02 2.464e+02 4.987e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-17 09:51:46,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585443.1666666666, ans=0.125 2024-09-17 09:52:20,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=585499.8333333334, ans=0.125 2024-09-17 09:52:33,009 INFO [train.py:1198] (1/2) Epoch 33, batch 2150, loss[loss=0.1888, ctc_loss=0.1219, cr_loss=0.3346, over 20997.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.3708, over 4091831.59 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:52:44,269 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:53:15,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=585584.8333333334, ans=0.125 2024-09-17 09:53:41,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=585641.5, ans=0.125 2024-09-17 09:53:44,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=585641.5, ans=0.125 2024-09-17 09:53:51,972 INFO [train.py:1198] (1/2) Epoch 33, batch 2200, loss[loss=0.2374, ctc_loss=0.1594, cr_loss=0.3904, over 20621.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1474, cr_loss=0.3699, over 4093792.07 frames. ], batch size: 68, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:54:05,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.158e+02 2.266e+02 2.414e+02 3.128e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-17 09:54:53,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=585783.1666666666, ans=0.025 2024-09-17 09:54:53,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=585783.1666666666, ans=0.2 2024-09-17 09:55:06,512 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:55:07,623 INFO [train.py:1198] (1/2) Epoch 33, batch 2250, loss[loss=0.2094, ctc_loss=0.1357, cr_loss=0.3684, over 21030.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.3711, over 4086650.03 frames. ], batch size: 63, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:55:41,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585868.1666666666, ans=0.1 2024-09-17 09:56:11,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2024-09-17 09:56:15,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=585924.8333333334, ans=0.2 2024-09-17 09:56:25,612 INFO [train.py:1198] (1/2) Epoch 33, batch 2300, loss[loss=0.211, ctc_loss=0.1423, cr_loss=0.3437, over 20971.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1481, cr_loss=0.3716, over 4089162.78 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:56:30,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=585953.1666666666, ans=0.2 2024-09-17 09:56:39,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.154e+02 2.293e+02 2.449e+02 3.027e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-17 09:56:56,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=586009.8333333334, ans=0.0 2024-09-17 09:57:29,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-17 09:57:41,092 INFO [train.py:1198] (1/2) Epoch 33, batch 2350, loss[loss=0.2191, ctc_loss=0.1459, cr_loss=0.3662, over 20976.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3717, over 4089846.96 frames. ], batch size: 49, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:58:04,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=586123.1666666666, ans=0.95 2024-09-17 09:58:52,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-17 09:58:57,216 INFO [train.py:1198] (1/2) Epoch 33, batch 2400, loss[loss=0.2351, ctc_loss=0.1585, cr_loss=0.3827, over 20780.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3724, over 4091579.67 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 09:59:14,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.113e+02 2.273e+02 2.441e+02 1.109e+03, threshold=4.547e+02, percent-clipped=1.0 2024-09-17 09:59:15,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=586264.8333333334, ans=0.0 2024-09-17 09:59:19,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=586264.8333333334, ans=0.2 2024-09-17 09:59:34,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-17 09:59:44,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=22.5 2024-09-17 10:00:15,757 INFO [train.py:1198] (1/2) Epoch 33, batch 2450, loss[loss=0.2375, ctc_loss=0.1582, cr_loss=0.3964, over 20258.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3723, over 4077186.00 frames. ], batch size: 74, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:00:34,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=586406.5, ans=0.025 2024-09-17 10:01:06,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=586463.1666666666, ans=0.0 2024-09-17 10:01:34,411 INFO [train.py:1198] (1/2) Epoch 33, batch 2500, loss[loss=0.1937, ctc_loss=0.124, cr_loss=0.3484, over 20784.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3718, over 4072588.31 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:01:34,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=586519.8333333334, ans=0.0 2024-09-17 10:01:48,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.145e+02 2.313e+02 2.499e+02 4.124e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-17 10:01:49,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-09-17 10:02:03,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=586576.5, ans=0.0 2024-09-17 10:02:05,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-17 10:02:17,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-17 10:02:21,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=586604.8333333334, ans=0.95 2024-09-17 10:02:38,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-09-17 10:02:49,582 INFO [train.py:1198] (1/2) Epoch 33, batch 2550, loss[loss=0.2441, ctc_loss=0.1626, cr_loss=0.4077, over 20834.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.372, over 4072367.04 frames. ], batch size: 59, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:02:58,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=586661.5, ans=0.04949747468305833 2024-09-17 10:04:05,436 INFO [train.py:1198] (1/2) Epoch 33, batch 2600, loss[loss=0.282, ctc_loss=0.1965, cr_loss=0.4276, over 14309.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1492, cr_loss=0.3723, over 4063836.68 frames. ], batch size: 149, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:04:16,620 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:04:19,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.162e+02 2.322e+02 2.472e+02 3.448e+02, threshold=4.643e+02, percent-clipped=0.0 2024-09-17 10:04:26,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-17 10:04:57,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=586888.1666666666, ans=0.125 2024-09-17 10:05:24,502 INFO [train.py:1198] (1/2) Epoch 33, batch 2650, loss[loss=0.2179, ctc_loss=0.1456, cr_loss=0.3616, over 20775.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1485, cr_loss=0.3717, over 4084460.82 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:05:52,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=586973.1666666666, ans=0.0 2024-09-17 10:06:08,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=587029.8333333334, ans=0.0 2024-09-17 10:06:09,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=587029.8333333334, ans=0.125 2024-09-17 10:06:30,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-09-17 10:06:40,565 INFO [train.py:1198] (1/2) Epoch 33, batch 2700, loss[loss=0.2057, ctc_loss=0.1384, cr_loss=0.3365, over 20894.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.372, over 4088189.95 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:06:51,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=587086.5, ans=0.125 2024-09-17 10:06:54,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.152e+02 2.291e+02 2.421e+02 3.428e+02, threshold=4.582e+02, percent-clipped=0.0 2024-09-17 10:06:57,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=587114.8333333334, ans=0.0 2024-09-17 10:07:08,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=587114.8333333334, ans=0.2 2024-09-17 10:07:09,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=587114.8333333334, ans=0.0 2024-09-17 10:07:33,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2024-09-17 10:07:59,770 INFO [train.py:1198] (1/2) Epoch 33, batch 2750, loss[loss=0.1701, ctc_loss=0.1101, cr_loss=0.3001, over 20948.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3717, over 4090677.37 frames. ], batch size: 50, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:08:03,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=587228.1666666666, ans=0.0 2024-09-17 10:08:41,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=587284.8333333334, ans=0.125 2024-09-17 10:08:51,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=587313.1666666666, ans=0.2 2024-09-17 10:09:15,224 INFO [train.py:1198] (1/2) Epoch 33, batch 2800, loss[loss=0.2431, ctc_loss=0.1642, cr_loss=0.3945, over 20834.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3714, over 4096669.20 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:09:18,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=587369.8333333334, ans=0.0 2024-09-17 10:09:24,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=587369.8333333334, ans=0.125 2024-09-17 10:09:28,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.127e+02 2.313e+02 2.493e+02 3.541e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-17 10:09:38,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=12.0 2024-09-17 10:10:34,147 INFO [train.py:1198] (1/2) Epoch 33, batch 2850, loss[loss=0.2199, ctc_loss=0.1487, cr_loss=0.3558, over 20929.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1483, cr_loss=0.371, over 4095660.49 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:10:35,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2024-09-17 10:11:09,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-09-17 10:11:14,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=587568.1666666666, ans=0.125 2024-09-17 10:11:25,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=587596.5, ans=0.025 2024-09-17 10:11:43,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-09-17 10:11:49,449 INFO [train.py:1198] (1/2) Epoch 33, batch 2900, loss[loss=0.2203, ctc_loss=0.1475, cr_loss=0.3643, over 20971.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3725, over 4082494.78 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:12:00,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=587653.1666666666, ans=0.2 2024-09-17 10:12:03,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.160e+02 2.292e+02 2.490e+02 6.948e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-17 10:12:23,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=587709.8333333334, ans=0.0 2024-09-17 10:12:27,610 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:12:54,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=587766.5, ans=0.125 2024-09-17 10:13:07,677 INFO [train.py:1198] (1/2) Epoch 33, batch 2950, loss[loss=0.253, ctc_loss=0.1719, cr_loss=0.4053, over 20831.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3733, over 4099851.61 frames. ], batch size: 59, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:13:46,253 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:13:49,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=587851.5, ans=0.125 2024-09-17 10:13:56,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587879.8333333334, ans=0.0 2024-09-17 10:14:20,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0 2024-09-17 10:14:23,078 INFO [train.py:1198] (1/2) Epoch 33, batch 3000, loss[loss=0.2627, ctc_loss=0.1888, cr_loss=0.3691, over 14438.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1494, cr_loss=0.3732, over 4089418.88 frames. ], batch size: 150, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:14:23,078 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 10:14:47,786 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.03983, ctc_loss=0.03983, cr_loss=1.313e-14, over 944034.00 frames. 2024-09-17 10:14:47,787 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 10:15:01,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.169e+02 2.293e+02 2.458e+02 5.072e+02, threshold=4.586e+02, percent-clipped=1.0 2024-09-17 10:15:31,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=588021.5, ans=0.025 2024-09-17 10:15:41,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=588021.5, ans=0.2 2024-09-17 10:16:06,193 INFO [train.py:1198] (1/2) Epoch 33, batch 3050, loss[loss=0.2329, ctc_loss=0.1534, cr_loss=0.3976, over 20108.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.374, over 4094466.17 frames. ], batch size: 80, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:16:14,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=588078.1666666666, ans=15.0 2024-09-17 10:16:18,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=588078.1666666666, ans=0.125 2024-09-17 10:16:45,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-17 10:17:07,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=588191.5, ans=0.04949747468305833 2024-09-17 10:17:21,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588219.8333333334, ans=0.1 2024-09-17 10:17:22,611 INFO [train.py:1198] (1/2) Epoch 33, batch 3100, loss[loss=0.2231, ctc_loss=0.1476, cr_loss=0.3777, over 20290.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3723, over 4108982.55 frames. ], batch size: 74, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:17:36,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.138e+02 2.243e+02 2.404e+02 3.026e+02, threshold=4.485e+02, percent-clipped=0.0 2024-09-17 10:17:41,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=588248.1666666666, ans=0.125 2024-09-17 10:18:25,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=22.5 2024-09-17 10:18:28,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-17 10:18:33,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=588333.1666666666, ans=0.125 2024-09-17 10:18:41,262 INFO [train.py:1198] (1/2) Epoch 33, batch 3150, loss[loss=0.2402, ctc_loss=0.159, cr_loss=0.4061, over 19281.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3732, over 4110630.21 frames. ], batch size: 90, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:18:43,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=588361.5, ans=0.125 2024-09-17 10:19:18,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-17 10:19:55,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 10:19:56,716 INFO [train.py:1198] (1/2) Epoch 33, batch 3200, loss[loss=0.2233, ctc_loss=0.152, cr_loss=0.3567, over 20306.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.373, over 4107307.67 frames. ], batch size: 74, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:20:06,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-17 10:20:10,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.222e+02 2.327e+02 2.511e+02 3.661e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 10:20:13,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=588531.5, ans=0.5 2024-09-17 10:20:16,975 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:20:30,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=588559.8333333334, ans=0.125 2024-09-17 10:20:37,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588559.8333333334, ans=0.1 2024-09-17 10:21:07,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=588616.5, ans=0.0 2024-09-17 10:21:08,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=588616.5, ans=0.025 2024-09-17 10:21:12,121 INFO [train.py:1198] (1/2) Epoch 33, batch 3250, loss[loss=0.2524, ctc_loss=0.1704, cr_loss=0.4098, over 19985.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3735, over 4105040.79 frames. ], batch size: 80, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:21:16,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=588644.8333333334, ans=0.0 2024-09-17 10:21:17,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-17 10:21:20,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=588644.8333333334, ans=0.0 2024-09-17 10:21:56,435 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:22:01,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=588729.8333333334, ans=0.125 2024-09-17 10:22:17,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588758.1666666666, ans=0.0 2024-09-17 10:22:31,068 INFO [train.py:1198] (1/2) Epoch 33, batch 3300, loss[loss=0.1914, ctc_loss=0.125, cr_loss=0.332, over 20968.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3734, over 4102210.54 frames. ], batch size: 51, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:22:39,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=588786.5, ans=0.0 2024-09-17 10:22:44,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.213e+02 2.316e+02 2.471e+02 4.382e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 10:23:46,654 INFO [train.py:1198] (1/2) Epoch 33, batch 3350, loss[loss=0.1852, ctc_loss=0.1236, cr_loss=0.3078, over 20996.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3735, over 4088462.08 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:23:47,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588928.1666666666, ans=0.1 2024-09-17 10:23:58,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588928.1666666666, ans=0.1 2024-09-17 10:24:00,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-09-17 10:24:23,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=588984.8333333334, ans=0.125 2024-09-17 10:24:40,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=589013.1666666666, ans=0.2 2024-09-17 10:25:04,438 INFO [train.py:1198] (1/2) Epoch 33, batch 3400, loss[loss=0.2552, ctc_loss=0.1746, cr_loss=0.403, over 18336.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3734, over 4081652.61 frames. ], batch size: 108, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:25:07,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=589069.8333333334, ans=0.125 2024-09-17 10:25:17,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.148e+02 2.304e+02 2.436e+02 5.081e+02, threshold=4.607e+02, percent-clipped=1.0 2024-09-17 10:25:22,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=589098.1666666666, ans=0.0 2024-09-17 10:25:24,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-09-17 10:25:30,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=589098.1666666666, ans=0.0 2024-09-17 10:25:46,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=589126.5, ans=0.07 2024-09-17 10:25:55,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589154.8333333334, ans=0.1 2024-09-17 10:26:08,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=589183.1666666666, ans=0.125 2024-09-17 10:26:19,979 INFO [train.py:1198] (1/2) Epoch 33, batch 3450, loss[loss=0.2222, ctc_loss=0.1459, cr_loss=0.3813, over 20883.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3745, over 4091580.92 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:26:20,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=589211.5, ans=0.125 2024-09-17 10:26:20,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=589211.5, ans=0.125 2024-09-17 10:26:21,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589211.5, ans=0.1 2024-09-17 10:26:21,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=589211.5, ans=0.2 2024-09-17 10:26:28,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=589211.5, ans=0.0 2024-09-17 10:26:37,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=589239.8333333334, ans=0.0 2024-09-17 10:26:48,023 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:26:54,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:26:58,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589268.1666666666, ans=0.125 2024-09-17 10:27:06,305 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:27:06,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-09-17 10:27:40,135 INFO [train.py:1198] (1/2) Epoch 33, batch 3500, loss[loss=0.2375, ctc_loss=0.1582, cr_loss=0.3963, over 20672.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3728, over 4084493.27 frames. ], batch size: 68, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:27:40,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=589353.1666666666, ans=0.125 2024-09-17 10:27:49,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:27:54,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.160e+02 2.311e+02 2.466e+02 3.491e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 10:28:02,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=589381.5, ans=0.125 2024-09-17 10:28:40,038 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:28:56,390 INFO [train.py:1198] (1/2) Epoch 33, batch 3550, loss[loss=0.1873, ctc_loss=0.1233, cr_loss=0.32, over 19943.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1482, cr_loss=0.3717, over 4093233.61 frames. ], batch size: 44, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:29:02,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589494.8333333334, ans=0.0 2024-09-17 10:29:12,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=589523.1666666666, ans=0.125 2024-09-17 10:29:17,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=589523.1666666666, ans=0.2 2024-09-17 10:29:46,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-17 10:29:47,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=589579.8333333334, ans=0.2 2024-09-17 10:29:59,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589608.1666666666, ans=0.125 2024-09-17 10:30:01,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589608.1666666666, ans=0.1 2024-09-17 10:30:03,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-17 10:30:14,409 INFO [train.py:1198] (1/2) Epoch 33, batch 3600, loss[loss=0.246, ctc_loss=0.1637, cr_loss=0.4114, over 20969.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3736, over 4101763.39 frames. ], batch size: 64, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:30:14,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=589636.5, ans=0.0 2024-09-17 10:30:28,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.194e+02 2.309e+02 2.464e+02 3.603e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 10:30:31,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-17 10:30:56,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589693.1666666666, ans=0.125 2024-09-17 10:31:10,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589721.5, ans=0.1 2024-09-17 10:31:29,641 INFO [train.py:1198] (1/2) Epoch 33, batch 3650, loss[loss=0.2334, ctc_loss=0.1546, cr_loss=0.394, over 20679.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3736, over 4092161.38 frames. ], batch size: 68, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:31:29,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=589778.1666666666, ans=0.125 2024-09-17 10:31:29,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=589778.1666666666, ans=0.125 2024-09-17 10:31:45,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589806.5, ans=0.1 2024-09-17 10:31:58,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=589834.8333333334, ans=0.125 2024-09-17 10:32:45,741 INFO [train.py:1198] (1/2) Epoch 33, batch 3700, loss[loss=0.2319, ctc_loss=0.1556, cr_loss=0.3812, over 21051.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3744, over 4087149.81 frames. ], batch size: 62, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:32:55,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=589919.8333333334, ans=0.025 2024-09-17 10:33:01,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.167e+02 2.352e+02 2.549e+02 4.743e+02, threshold=4.705e+02, percent-clipped=1.0 2024-09-17 10:33:21,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=589976.5, ans=0.0 2024-09-17 10:33:21,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=589976.5, ans=0.0 2024-09-17 10:33:45,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=590004.8333333334, ans=0.07 2024-09-17 10:33:50,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=12.0 2024-09-17 10:33:57,821 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:34:05,138 INFO [train.py:1198] (1/2) Epoch 33, batch 3750, loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3738, over 20872.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1497, cr_loss=0.3746, over 4091826.90 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:34:18,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=590089.8333333334, ans=0.2 2024-09-17 10:34:25,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-17 10:34:33,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=590118.1666666666, ans=0.125 2024-09-17 10:35:20,480 INFO [train.py:1198] (1/2) Epoch 33, batch 3800, loss[loss=0.2287, ctc_loss=0.155, cr_loss=0.3682, over 20676.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3739, over 4088247.80 frames. ], batch size: 66, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:35:38,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.140e+02 2.270e+02 2.410e+02 3.531e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-17 10:36:27,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=590316.5, ans=0.125 2024-09-17 10:36:38,811 INFO [train.py:1198] (1/2) Epoch 33, batch 3850, loss[loss=0.1993, ctc_loss=0.1277, cr_loss=0.3581, over 20958.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1505, cr_loss=0.3759, over 4082652.38 frames. ], batch size: 51, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:37:24,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=22.5 2024-09-17 10:37:26,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 10:37:28,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=590429.8333333334, ans=0.2 2024-09-17 10:37:38,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=590458.1666666666, ans=0.2 2024-09-17 10:37:54,760 INFO [train.py:1198] (1/2) Epoch 33, batch 3900, loss[loss=0.2118, ctc_loss=0.1399, cr_loss=0.3598, over 20882.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.375, over 4089750.40 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:38:09,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.174e+02 2.267e+02 2.471e+02 3.008e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 10:38:39,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=590571.5, ans=0.0 2024-09-17 10:38:50,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-17 10:39:01,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=590599.8333333334, ans=0.125 2024-09-17 10:39:06,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=590599.8333333334, ans=0.125 2024-09-17 10:39:12,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=590628.1666666666, ans=0.125 2024-09-17 10:39:13,711 INFO [train.py:1198] (1/2) Epoch 33, batch 3950, loss[loss=0.1968, ctc_loss=0.128, cr_loss=0.3439, over 20966.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.151, cr_loss=0.3759, over 4079373.26 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:39:17,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=590628.1666666666, ans=0.125 2024-09-17 10:39:27,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590656.5, ans=0.125 2024-09-17 10:39:48,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-17 10:39:59,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590713.1666666666, ans=0.1 2024-09-17 10:40:14,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=590741.5, ans=0.125 2024-09-17 10:40:26,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590741.5, ans=0.125 2024-09-17 10:40:29,081 INFO [train.py:1198] (1/2) Epoch 33, batch 4000, loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3738, over 20822.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1503, cr_loss=0.3748, over 4093975.81 frames. ], batch size: 65, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:40:43,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.235e+02 2.348e+02 2.541e+02 3.717e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-17 10:40:59,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=590826.5, ans=0.125 2024-09-17 10:41:23,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590854.8333333334, ans=0.1 2024-09-17 10:41:39,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=22.5 2024-09-17 10:41:41,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590883.1666666666, ans=0.1 2024-09-17 10:41:47,670 INFO [train.py:1198] (1/2) Epoch 33, batch 4050, loss[loss=0.2479, ctc_loss=0.1678, cr_loss=0.4002, over 19458.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.3752, over 4084832.43 frames. ], batch size: 90, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:41:50,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-17 10:42:18,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=590968.1666666666, ans=0.0 2024-09-17 10:42:34,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-17 10:42:46,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=590996.5, ans=10.0 2024-09-17 10:43:03,932 INFO [train.py:1198] (1/2) Epoch 33, batch 4100, loss[loss=0.2463, ctc_loss=0.1655, cr_loss=0.404, over 20692.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3741, over 4090836.10 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:43:18,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.171e+02 2.292e+02 2.440e+02 3.120e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-17 10:43:49,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=591138.1666666666, ans=0.0 2024-09-17 10:43:56,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=591138.1666666666, ans=0.125 2024-09-17 10:44:01,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=591138.1666666666, ans=0.2 2024-09-17 10:44:06,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-09-17 10:44:08,909 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:44:08,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=591166.5, ans=0.2 2024-09-17 10:44:19,227 INFO [train.py:1198] (1/2) Epoch 33, batch 4150, loss[loss=0.2029, ctc_loss=0.1327, cr_loss=0.3512, over 20973.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3724, over 4084532.68 frames. ], batch size: 48, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:44:39,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=591223.1666666666, ans=10.0 2024-09-17 10:44:50,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=22.5 2024-09-17 10:45:00,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=591251.5, ans=0.125 2024-09-17 10:45:09,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=591279.8333333334, ans=0.0 2024-09-17 10:45:27,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=591308.1666666666, ans=0.0 2024-09-17 10:45:37,785 INFO [train.py:1198] (1/2) Epoch 33, batch 4200, loss[loss=0.2393, ctc_loss=0.1624, cr_loss=0.3842, over 20574.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1486, cr_loss=0.3722, over 4094582.79 frames. ], batch size: 75, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:45:39,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=591336.5, ans=0.0 2024-09-17 10:45:45,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=591336.5, ans=0.05 2024-09-17 10:45:52,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.187e+02 2.279e+02 2.480e+02 3.393e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 10:46:17,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591393.1666666666, ans=0.125 2024-09-17 10:46:36,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=22.5 2024-09-17 10:46:40,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-17 10:46:41,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=591449.8333333334, ans=0.125 2024-09-17 10:46:51,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-17 10:46:56,858 INFO [train.py:1198] (1/2) Epoch 33, batch 4250, loss[loss=0.2677, ctc_loss=0.1839, cr_loss=0.4186, over 14705.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1486, cr_loss=0.3723, over 4089581.95 frames. ], batch size: 149, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:47:08,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591478.1666666666, ans=0.1 2024-09-17 10:47:29,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.67 vs. limit=10.0 2024-09-17 10:47:56,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591591.5, ans=0.125 2024-09-17 10:48:12,875 INFO [train.py:1198] (1/2) Epoch 33, batch 4300, loss[loss=0.1749, ctc_loss=0.1135, cr_loss=0.3068, over 20984.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1486, cr_loss=0.3716, over 4069373.87 frames. ], batch size: 49, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:48:28,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.143e+02 2.274e+02 2.457e+02 3.121e+02, threshold=4.548e+02, percent-clipped=0.0 2024-09-17 10:48:32,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591648.1666666666, ans=0.1 2024-09-17 10:48:36,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=591648.1666666666, ans=0.0 2024-09-17 10:48:40,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591648.1666666666, ans=0.125 2024-09-17 10:48:56,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=591676.5, ans=0.0 2024-09-17 10:48:59,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591704.8333333334, ans=0.1 2024-09-17 10:49:13,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.41 vs. limit=10.0 2024-09-17 10:49:23,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=591733.1666666666, ans=0.125 2024-09-17 10:49:27,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=591761.5, ans=0.125 2024-09-17 10:49:28,738 INFO [train.py:1198] (1/2) Epoch 33, batch 4350, loss[loss=0.2367, ctc_loss=0.1589, cr_loss=0.3892, over 20745.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1488, cr_loss=0.3725, over 4070395.86 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:50:47,188 INFO [train.py:1198] (1/2) Epoch 33, batch 4400, loss[loss=0.2059, ctc_loss=0.1381, cr_loss=0.3391, over 20869.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.149, cr_loss=0.3725, over 4064660.97 frames. ], batch size: 57, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:51:02,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.166e+02 2.312e+02 2.520e+02 4.444e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-17 10:51:33,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591988.1666666666, ans=0.1 2024-09-17 10:51:44,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591988.1666666666, ans=0.1 2024-09-17 10:51:55,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=592016.5, ans=0.125 2024-09-17 10:52:00,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-09-17 10:52:02,417 INFO [train.py:1198] (1/2) Epoch 33, batch 4450, loss[loss=0.2066, ctc_loss=0.1336, cr_loss=0.3646, over 21008.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1489, cr_loss=0.3723, over 4070988.81 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:52:39,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=592101.5, ans=12.0 2024-09-17 10:52:43,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=592101.5, ans=0.2 2024-09-17 10:52:45,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592101.5, ans=0.1 2024-09-17 10:52:51,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-17 10:53:21,559 INFO [train.py:1198] (1/2) Epoch 33, batch 4500, loss[loss=0.258, ctc_loss=0.1705, cr_loss=0.4372, over 20680.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.373, over 4064989.28 frames. ], batch size: 66, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:53:29,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=592186.5, ans=0.1 2024-09-17 10:53:33,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-17 10:53:36,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.218e+02 2.378e+02 2.567e+02 3.438e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-17 10:53:40,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-17 10:54:37,129 INFO [train.py:1198] (1/2) Epoch 33, batch 4550, loss[loss=0.2109, ctc_loss=0.1362, cr_loss=0.3733, over 20835.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3728, over 4059646.85 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:55:23,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-09-17 10:55:56,012 INFO [train.py:1198] (1/2) Epoch 33, batch 4600, loss[loss=0.2076, ctc_loss=0.1367, cr_loss=0.3544, over 20798.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.372, over 4069329.77 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:56:11,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.160e+02 2.269e+02 2.458e+02 3.109e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 10:56:20,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=592498.1666666666, ans=0.5 2024-09-17 10:56:32,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=592526.5, ans=0.0 2024-09-17 10:56:35,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=592526.5, ans=0.2 2024-09-17 10:56:59,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.74 vs. limit=15.0 2024-09-17 10:57:03,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=592583.1666666666, ans=0.125 2024-09-17 10:57:04,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=592583.1666666666, ans=0.125 2024-09-17 10:57:11,851 INFO [train.py:1198] (1/2) Epoch 33, batch 4650, loss[loss=0.224, ctc_loss=0.1486, cr_loss=0.3769, over 20811.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1492, cr_loss=0.3717, over 4051845.32 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:57:18,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2024-09-17 10:57:30,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=592639.8333333334, ans=0.0 2024-09-17 10:57:31,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=592639.8333333334, ans=0.125 2024-09-17 10:57:34,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=592639.8333333334, ans=0.125 2024-09-17 10:57:54,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=592668.1666666666, ans=0.025 2024-09-17 10:58:23,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=592724.8333333334, ans=0.125 2024-09-17 10:58:30,779 INFO [train.py:1198] (1/2) Epoch 33, batch 4700, loss[loss=0.2187, ctc_loss=0.146, cr_loss=0.3638, over 20836.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3733, over 4061544.58 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:58:42,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=592753.1666666666, ans=0.025 2024-09-17 10:58:44,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=592781.5, ans=0.0 2024-09-17 10:58:45,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.187e+02 2.314e+02 2.497e+02 7.068e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-17 10:59:18,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-17 10:59:46,438 INFO [train.py:1198] (1/2) Epoch 33, batch 4750, loss[loss=0.1921, ctc_loss=0.1265, cr_loss=0.3282, over 20950.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1487, cr_loss=0.3716, over 4068339.78 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:00:32,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=592979.8333333334, ans=0.04949747468305833 2024-09-17 11:00:44,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=592979.8333333334, ans=0.025 2024-09-17 11:00:51,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2024-09-17 11:01:01,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-17 11:01:02,124 INFO [train.py:1198] (1/2) Epoch 33, batch 4800, loss[loss=0.2252, ctc_loss=0.1493, cr_loss=0.3793, over 20832.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3713, over 4075831.02 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:01:16,781 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.129e+02 2.272e+02 2.423e+02 3.028e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 11:01:56,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=593121.5, ans=0.0 2024-09-17 11:01:58,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-09-17 11:02:19,496 INFO [train.py:1198] (1/2) Epoch 33, batch 4850, loss[loss=0.2529, ctc_loss=0.1664, cr_loss=0.4325, over 20665.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3736, over 4082254.69 frames. ], batch size: 66, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:03:34,554 INFO [train.py:1198] (1/2) Epoch 33, batch 4900, loss[loss=0.2022, ctc_loss=0.1334, cr_loss=0.3439, over 21044.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3734, over 4089590.12 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:03:49,426 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.200e+02 2.295e+02 2.472e+02 3.096e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 11:03:58,734 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:04:01,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593348.1666666666, ans=0.125 2024-09-17 11:04:04,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=593376.5, ans=0.125 2024-09-17 11:04:26,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=593404.8333333334, ans=0.2 2024-09-17 11:04:52,630 INFO [train.py:1198] (1/2) Epoch 33, batch 4950, loss[loss=0.2297, ctc_loss=0.1535, cr_loss=0.3812, over 21015.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3718, over 4083143.89 frames. ], batch size: 62, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:04:57,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=593461.5, ans=0.025 2024-09-17 11:05:16,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=593489.8333333334, ans=0.2 2024-09-17 11:05:47,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=593546.5, ans=0.125 2024-09-17 11:06:07,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-17 11:06:07,819 INFO [train.py:1198] (1/2) Epoch 33, batch 5000, loss[loss=0.2326, ctc_loss=0.1534, cr_loss=0.3957, over 21004.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1482, cr_loss=0.3715, over 4074480.14 frames. ], batch size: 63, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:06:22,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.228e+02 2.357e+02 2.496e+02 5.510e+02, threshold=4.715e+02, percent-clipped=1.0 2024-09-17 11:07:04,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=593688.1666666666, ans=0.0 2024-09-17 11:07:22,148 INFO [train.py:1198] (1/2) Epoch 33, batch 5050, loss[loss=0.2466, ctc_loss=0.1717, cr_loss=0.3746, over 14823.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3725, over 4076228.33 frames. ], batch size: 149, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:07:49,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=593773.1666666666, ans=0.125 2024-09-17 11:07:57,823 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:08:17,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-17 11:08:26,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-17 11:08:28,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-09-17 11:08:36,412 INFO [train.py:1198] (1/2) Epoch 33, batch 5100, loss[loss=0.2134, ctc_loss=0.1391, cr_loss=0.3711, over 20781.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3722, over 4084385.31 frames. ], batch size: 53, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:08:52,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.186e+02 2.310e+02 2.516e+02 3.575e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 11:09:13,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=593943.1666666666, ans=0.2 2024-09-17 11:09:28,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=593971.5, ans=0.2 2024-09-17 11:09:50,596 INFO [train.py:1198] (1/2) Epoch 33, batch 5150, loss[loss=0.2429, ctc_loss=0.1646, cr_loss=0.3914, over 20136.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3721, over 4080520.60 frames. ], batch size: 80, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:10:42,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.76 vs. limit=10.0 2024-09-17 11:10:46,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=594113.1666666666, ans=0.0 2024-09-17 11:10:54,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594141.5, ans=0.1 2024-09-17 11:11:06,416 INFO [train.py:1198] (1/2) Epoch 33, batch 5200, loss[loss=0.2431, ctc_loss=0.1612, cr_loss=0.4093, over 20931.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3734, over 4078222.00 frames. ], batch size: 60, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:11:06,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=594169.8333333334, ans=0.125 2024-09-17 11:11:19,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=594198.1666666666, ans=0.025 2024-09-17 11:11:21,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=594198.1666666666, ans=0.125 2024-09-17 11:11:22,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.138e+02 2.278e+02 2.479e+02 3.410e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 11:11:53,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=594254.8333333334, ans=0.0 2024-09-17 11:12:05,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=594283.1666666666, ans=0.0 2024-09-17 11:12:16,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-17 11:12:19,425 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:12:19,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=594311.5, ans=0.0 2024-09-17 11:12:19,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=594311.5, ans=0.125 2024-09-17 11:12:20,575 INFO [train.py:1198] (1/2) Epoch 33, batch 5250, loss[loss=0.2533, ctc_loss=0.1736, cr_loss=0.3983, over 14627.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3725, over 4081565.24 frames. ], batch size: 150, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:13:32,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=594424.8333333334, ans=0.2 2024-09-17 11:13:34,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-17 11:13:35,063 INFO [train.py:1198] (1/2) Epoch 33, batch 5300, loss[loss=0.2179, ctc_loss=0.1449, cr_loss=0.3652, over 21054.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3717, over 4080328.32 frames. ], batch size: 56, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:13:40,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-17 11:13:51,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.200e+02 2.302e+02 2.433e+02 3.289e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 11:13:54,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-09-17 11:14:02,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-17 11:14:51,767 INFO [train.py:1198] (1/2) Epoch 33, batch 5350, loss[loss=0.2525, ctc_loss=0.1698, cr_loss=0.4137, over 20977.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3727, over 4082558.71 frames. ], batch size: 64, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:15:03,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=594594.8333333334, ans=0.5 2024-09-17 11:15:14,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=594623.1666666666, ans=0.2 2024-09-17 11:15:50,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=594708.1666666666, ans=0.125 2024-09-17 11:16:06,810 INFO [train.py:1198] (1/2) Epoch 33, batch 5400, loss[loss=0.2187, ctc_loss=0.1449, cr_loss=0.3687, over 20898.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3724, over 4089933.56 frames. ], batch size: 54, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:16:10,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=594736.5, ans=0.2 2024-09-17 11:16:22,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-17 11:16:22,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.196e+02 2.326e+02 2.483e+02 3.555e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-17 11:17:01,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=594821.5, ans=0.125 2024-09-17 11:17:20,892 INFO [train.py:1198] (1/2) Epoch 33, batch 5450, loss[loss=0.1751, ctc_loss=0.1133, cr_loss=0.3092, over 20333.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3714, over 4090515.82 frames. ], batch size: 45, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:17:36,130 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:17:46,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=594906.5, ans=0.025 2024-09-17 11:18:04,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=594963.1666666666, ans=0.125 2024-09-17 11:18:35,392 INFO [train.py:1198] (1/2) Epoch 33, batch 5500, loss[loss=0.2049, ctc_loss=0.1363, cr_loss=0.343, over 20987.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3717, over 4096812.62 frames. ], batch size: 52, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:18:51,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.209e+02 2.354e+02 2.512e+02 7.886e+02, threshold=4.708e+02, percent-clipped=2.0 2024-09-17 11:19:37,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=595133.1666666666, ans=0.0 2024-09-17 11:19:51,998 INFO [train.py:1198] (1/2) Epoch 33, batch 5550, loss[loss=0.2135, ctc_loss=0.1417, cr_loss=0.359, over 20701.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.372, over 4098680.43 frames. ], batch size: 71, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:20:14,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=595189.8333333334, ans=0.025 2024-09-17 11:20:51,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=595274.8333333334, ans=0.125 2024-09-17 11:21:06,313 INFO [train.py:1198] (1/2) Epoch 33, batch 5600, loss[loss=0.171, ctc_loss=0.1111, cr_loss=0.2991, over 19831.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3726, over 4085262.63 frames. ], batch size: 44, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:21:10,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2024-09-17 11:21:22,682 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.150e+02 2.319e+02 2.523e+02 3.734e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 11:21:53,021 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:22:21,066 INFO [train.py:1198] (1/2) Epoch 33, batch 5650, loss[loss=0.1912, ctc_loss=0.1254, cr_loss=0.3291, over 21065.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3708, over 4084436.11 frames. ], batch size: 53, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:22:24,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-17 11:22:27,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=595444.8333333334, ans=0.2 2024-09-17 11:22:49,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=595501.5, ans=0.125 2024-09-17 11:23:05,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=595501.5, ans=0.125 2024-09-17 11:23:29,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-17 11:23:30,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=595558.1666666666, ans=0.0 2024-09-17 11:23:37,933 INFO [train.py:1198] (1/2) Epoch 33, batch 5700, loss[loss=0.2935, ctc_loss=0.2077, cr_loss=0.429, over 14501.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3723, over 4083745.93 frames. ], batch size: 149, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:23:55,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.343e+02 2.527e+02 3.731e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 11:24:52,047 INFO [train.py:1198] (1/2) Epoch 33, batch 5750, loss[loss=0.2228, ctc_loss=0.1467, cr_loss=0.3803, over 21073.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4093354.00 frames. ], batch size: 59, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:25:08,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=595756.5, ans=0.125 2024-09-17 11:25:10,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=595756.5, ans=0.0 2024-09-17 11:25:33,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=595784.8333333334, ans=0.0 2024-09-17 11:25:53,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=595841.5, ans=0.125 2024-09-17 11:26:06,178 INFO [train.py:1198] (1/2) Epoch 33, batch 5800, loss[loss=0.2474, ctc_loss=0.1668, cr_loss=0.4027, over 20676.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4094933.31 frames. ], batch size: 68, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:26:24,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.134e+02 2.304e+02 2.444e+02 4.548e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-17 11:26:32,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-17 11:26:48,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=595926.5, ans=0.125 2024-09-17 11:27:04,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=595983.1666666666, ans=0.035 2024-09-17 11:27:07,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=595983.1666666666, ans=0.0 2024-09-17 11:27:10,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=595983.1666666666, ans=0.125 2024-09-17 11:27:14,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=595983.1666666666, ans=0.04949747468305833 2024-09-17 11:27:22,647 INFO [train.py:1198] (1/2) Epoch 33, batch 5850, loss[loss=0.2431, ctc_loss=0.1644, cr_loss=0.3936, over 20357.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1489, cr_loss=0.374, over 4079455.84 frames. ], batch size: 74, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:27:24,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=596011.5, ans=0.2 2024-09-17 11:27:49,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=596039.8333333334, ans=0.125 2024-09-17 11:27:55,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=596068.1666666666, ans=0.05 2024-09-17 11:28:37,380 INFO [train.py:1198] (1/2) Epoch 33, batch 5900, loss[loss=0.2367, ctc_loss=0.1594, cr_loss=0.3866, over 20967.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1497, cr_loss=0.3749, over 4070796.15 frames. ], batch size: 64, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:28:43,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=596153.1666666666, ans=0.0 2024-09-17 11:28:51,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=596181.5, ans=0.125 2024-09-17 11:28:55,043 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.186e+02 2.299e+02 2.506e+02 6.411e+02, threshold=4.599e+02, percent-clipped=1.0 2024-09-17 11:29:09,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=22.5 2024-09-17 11:29:16,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=596209.8333333334, ans=0.09899494936611666 2024-09-17 11:29:25,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=596238.1666666666, ans=0.0 2024-09-17 11:29:40,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=22.5 2024-09-17 11:29:51,593 INFO [train.py:1198] (1/2) Epoch 33, batch 5950, loss[loss=0.2198, ctc_loss=0.1465, cr_loss=0.3664, over 21074.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1498, cr_loss=0.3746, over 4067749.42 frames. ], batch size: 59, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:29:52,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-17 11:30:23,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=596351.5, ans=0.125 2024-09-17 11:30:59,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=596408.1666666666, ans=0.0 2024-09-17 11:31:05,517 INFO [train.py:1198] (1/2) Epoch 33, batch 6000, loss[loss=0.1809, ctc_loss=0.1174, cr_loss=0.3176, over 20963.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.3742, over 4082716.92 frames. ], batch size: 51, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:31:05,518 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 11:31:27,792 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.04003, ctc_loss=0.04003, cr_loss=1.289e-14, over 944034.00 frames. 2024-09-17 11:31:27,792 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 11:31:41,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=596464.8333333334, ans=0.05 2024-09-17 11:31:45,820 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.162e+02 2.293e+02 2.434e+02 5.495e+02, threshold=4.587e+02, percent-clipped=1.0 2024-09-17 11:31:56,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=596493.1666666666, ans=0.0 2024-09-17 11:31:56,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-17 11:32:13,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=596521.5, ans=0.05 2024-09-17 11:32:22,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=596521.5, ans=0.125 2024-09-17 11:32:42,831 INFO [train.py:1198] (1/2) Epoch 33, batch 6050, loss[loss=0.2105, ctc_loss=0.1381, cr_loss=0.3616, over 21036.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3722, over 4091720.55 frames. ], batch size: 63, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:32:46,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=596578.1666666666, ans=0.05 2024-09-17 11:33:03,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596606.5, ans=0.125 2024-09-17 11:33:05,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=596606.5, ans=0.125 2024-09-17 11:33:09,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596606.5, ans=0.125 2024-09-17 11:33:17,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=596634.8333333334, ans=0.0 2024-09-17 11:33:42,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=596691.5, ans=0.025 2024-09-17 11:33:46,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596691.5, ans=0.1 2024-09-17 11:33:46,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=596691.5, ans=0.0 2024-09-17 11:33:57,061 INFO [train.py:1198] (1/2) Epoch 33, batch 6100, loss[loss=0.2366, ctc_loss=0.1569, cr_loss=0.3984, over 20836.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3721, over 4098601.07 frames. ], batch size: 59, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:34:00,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=22.5 2024-09-17 11:34:16,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.193e+02 2.300e+02 2.454e+02 3.475e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 11:34:33,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2024-09-17 11:34:36,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596776.5, ans=0.125 2024-09-17 11:34:45,071 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:34:58,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-17 11:35:12,499 INFO [train.py:1198] (1/2) Epoch 33, batch 6150, loss[loss=0.276, ctc_loss=0.194, cr_loss=0.41, over 14129.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3748, over 4086637.86 frames. ], batch size: 150, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:35:29,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596889.8333333334, ans=0.1 2024-09-17 11:36:26,607 INFO [train.py:1198] (1/2) Epoch 33, batch 6200, loss[loss=0.2012, ctc_loss=0.133, cr_loss=0.341, over 20832.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1512, cr_loss=0.3771, over 4072705.76 frames. ], batch size: 59, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:36:41,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=597031.5, ans=0.0 2024-09-17 11:36:45,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.172e+02 2.361e+02 2.503e+02 4.032e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 11:37:17,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=597088.1666666666, ans=0.125 2024-09-17 11:37:41,363 INFO [train.py:1198] (1/2) Epoch 33, batch 6250, loss[loss=0.214, ctc_loss=0.1391, cr_loss=0.3745, over 20978.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1516, cr_loss=0.3778, over 4065761.68 frames. ], batch size: 55, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:38:45,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=597258.1666666666, ans=0.125 2024-09-17 11:38:46,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=597258.1666666666, ans=0.0 2024-09-17 11:38:56,976 INFO [train.py:1198] (1/2) Epoch 33, batch 6300, loss[loss=0.2358, ctc_loss=0.1574, cr_loss=0.3923, over 20986.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1518, cr_loss=0.3779, over 4033922.46 frames. ], batch size: 55, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:39:14,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=597314.8333333334, ans=0.125 2024-09-17 11:39:16,366 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.162e+02 2.295e+02 2.523e+02 3.244e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 11:39:25,703 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:39:56,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597399.8333333334, ans=0.1 2024-09-17 11:40:07,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=597399.8333333334, ans=0.125 2024-09-17 11:40:10,061 INFO [train.py:1198] (1/2) Epoch 33, batch 6350, loss[loss=0.2502, ctc_loss=0.1756, cr_loss=0.3727, over 14204.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1537, cr_loss=0.3784, over 3925767.05 frames. ], batch size: 150, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:40:29,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-17 11:40:59,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597513.1666666666, ans=0.125 2024-09-17 11:41:57,288 INFO [train.py:1198] (1/2) Epoch 34, batch 0, loss[loss=0.2339, ctc_loss=0.1551, cr_loss=0.394, over 20670.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1551, cr_loss=0.394, over 20670.00 frames. ], batch size: 71, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:41:57,289 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 11:42:15,795 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.04044, ctc_loss=0.04044, cr_loss=1.35e-14, over 944034.00 frames. 2024-09-17 11:42:15,796 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 11:42:20,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=597541.5, ans=0.125 2024-09-17 11:42:52,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.346e+02 2.669e+02 2.868e+02 6.385e+02, threshold=5.337e+02, percent-clipped=1.0 2024-09-17 11:42:56,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-17 11:43:07,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=597626.5, ans=10.0 2024-09-17 11:43:15,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=597626.5, ans=0.2 2024-09-17 11:43:20,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-17 11:43:32,913 INFO [train.py:1198] (1/2) Epoch 34, batch 50, loss[loss=0.2357, ctc_loss=0.1544, cr_loss=0.4066, over 20683.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3746, over 928317.63 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:44:03,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=597739.8333333334, ans=0.0 2024-09-17 11:44:11,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=597739.8333333334, ans=0.125 2024-09-17 11:44:25,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=597768.1666666666, ans=0.0 2024-09-17 11:44:43,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2024-09-17 11:44:49,238 INFO [train.py:1198] (1/2) Epoch 34, batch 100, loss[loss=0.2077, ctc_loss=0.14, cr_loss=0.3386, over 20770.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3713, over 1632098.51 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:45:20,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-17 11:45:24,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.127e+02 2.241e+02 2.411e+02 3.379e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-17 11:45:55,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=597938.1666666666, ans=0.125 2024-09-17 11:46:05,621 INFO [train.py:1198] (1/2) Epoch 34, batch 150, loss[loss=0.2568, ctc_loss=0.1722, cr_loss=0.423, over 19847.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3728, over 2158253.53 frames. ], batch size: 80, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:46:40,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598023.1666666666, ans=0.1 2024-09-17 11:46:42,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=598023.1666666666, ans=15.0 2024-09-17 11:47:23,786 INFO [train.py:1198] (1/2) Epoch 34, batch 200, loss[loss=0.2145, ctc_loss=0.1404, cr_loss=0.3701, over 20839.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3708, over 2592751.37 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:47:28,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=598108.1666666666, ans=0.125 2024-09-17 11:47:34,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=598108.1666666666, ans=0.125 2024-09-17 11:47:53,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=598164.8333333334, ans=0.2 2024-09-17 11:47:58,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.160e+02 2.334e+02 2.503e+02 3.108e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 11:48:05,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-17 11:48:07,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=598193.1666666666, ans=0.2 2024-09-17 11:48:28,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=598221.5, ans=0.2 2024-09-17 11:48:29,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=598221.5, ans=0.0 2024-09-17 11:48:36,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=598221.5, ans=0.125 2024-09-17 11:48:41,970 INFO [train.py:1198] (1/2) Epoch 34, batch 250, loss[loss=0.2771, ctc_loss=0.1937, cr_loss=0.4168, over 14459.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3736, over 2915262.09 frames. ], batch size: 149, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:48:52,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=598249.8333333334, ans=0.125 2024-09-17 11:49:27,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=598334.8333333334, ans=0.2 2024-09-17 11:49:38,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598334.8333333334, ans=0.0 2024-09-17 11:49:56,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-09-17 11:49:57,327 INFO [train.py:1198] (1/2) Epoch 34, batch 300, loss[loss=0.1936, ctc_loss=0.127, cr_loss=0.3328, over 20973.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3733, over 3176082.25 frames. ], batch size: 49, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:50:11,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-17 11:50:32,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.223e+02 2.344e+02 2.528e+02 4.380e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-17 11:50:46,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=598476.5, ans=0.125 2024-09-17 11:51:12,862 INFO [train.py:1198] (1/2) Epoch 34, batch 350, loss[loss=0.2168, ctc_loss=0.1426, cr_loss=0.3708, over 21054.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3744, over 3367687.69 frames. ], batch size: 62, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:51:13,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=598533.1666666666, ans=0.0 2024-09-17 11:51:13,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2024-09-17 11:51:55,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-17 11:51:58,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-17 11:52:13,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598646.5, ans=0.0 2024-09-17 11:52:28,315 INFO [train.py:1198] (1/2) Epoch 34, batch 400, loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3696, over 20830.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3738, over 3534571.36 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:52:31,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598674.8333333334, ans=0.1 2024-09-17 11:52:33,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=22.5 2024-09-17 11:52:34,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598674.8333333334, ans=0.125 2024-09-17 11:52:36,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=598674.8333333334, ans=0.025 2024-09-17 11:53:02,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=598731.5, ans=0.125 2024-09-17 11:53:05,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=598731.5, ans=0.125 2024-09-17 11:53:06,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.144e+02 2.252e+02 2.429e+02 3.054e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-17 11:53:21,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=598759.8333333334, ans=0.2 2024-09-17 11:53:41,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=598788.1666666666, ans=0.0 2024-09-17 11:53:43,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-09-17 11:53:47,064 INFO [train.py:1198] (1/2) Epoch 34, batch 450, loss[loss=0.2201, ctc_loss=0.1495, cr_loss=0.3531, over 20891.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3723, over 3667887.94 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:54:31,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=598873.1666666666, ans=0.125 2024-09-17 11:55:06,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2024-09-17 11:55:06,505 INFO [train.py:1198] (1/2) Epoch 34, batch 500, loss[loss=0.2218, ctc_loss=0.1489, cr_loss=0.3647, over 20689.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.3714, over 3760200.59 frames. ], batch size: 71, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:55:41,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.164e+02 2.301e+02 2.454e+02 3.791e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 11:56:15,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599071.5, ans=0.1 2024-09-17 11:56:22,229 INFO [train.py:1198] (1/2) Epoch 34, batch 550, loss[loss=0.2617, ctc_loss=0.1765, cr_loss=0.426, over 20964.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3717, over 3840347.70 frames. ], batch size: 64, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:56:34,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=599099.8333333334, ans=0.2 2024-09-17 11:56:54,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599156.5, ans=0.0 2024-09-17 11:56:57,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=599156.5, ans=0.125 2024-09-17 11:57:14,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 11:57:15,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=599184.8333333334, ans=0.125 2024-09-17 11:57:19,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=599184.8333333334, ans=0.5 2024-09-17 11:57:30,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-09-17 11:57:37,678 INFO [train.py:1198] (1/2) Epoch 34, batch 600, loss[loss=0.2121, ctc_loss=0.1404, cr_loss=0.3582, over 20785.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3715, over 3908414.15 frames. ], batch size: 53, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:58:02,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=599269.8333333334, ans=0.125 2024-09-17 11:58:02,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2024-09-17 11:58:12,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.172e+02 2.291e+02 2.524e+02 3.924e+02, threshold=4.581e+02, percent-clipped=0.0 2024-09-17 11:58:28,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=599326.5, ans=0.125 2024-09-17 11:58:56,079 INFO [train.py:1198] (1/2) Epoch 34, batch 650, loss[loss=0.2337, ctc_loss=0.1589, cr_loss=0.374, over 21074.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1476, cr_loss=0.3706, over 3940914.09 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:58:56,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599383.1666666666, ans=0.1 2024-09-17 11:58:58,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=599383.1666666666, ans=0.125 2024-09-17 11:59:25,264 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:59:44,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=599468.1666666666, ans=0.0 2024-09-17 12:00:04,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=599496.5, ans=0.0 2024-09-17 12:00:14,814 INFO [train.py:1198] (1/2) Epoch 34, batch 700, loss[loss=0.2222, ctc_loss=0.1484, cr_loss=0.369, over 21062.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1458, cr_loss=0.3674, over 3980276.22 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:00:49,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.162e+02 2.272e+02 2.432e+02 7.544e+02, threshold=4.544e+02, percent-clipped=1.0 2024-09-17 12:00:51,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=599581.5, ans=0.025 2024-09-17 12:01:17,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=599638.1666666666, ans=0.125 2024-09-17 12:01:30,761 INFO [train.py:1198] (1/2) Epoch 34, batch 750, loss[loss=0.2822, ctc_loss=0.2006, cr_loss=0.4078, over 14465.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1467, cr_loss=0.3686, over 3997650.32 frames. ], batch size: 149, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:01:42,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-17 12:01:46,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=599694.8333333334, ans=0.0 2024-09-17 12:02:02,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=599723.1666666666, ans=0.125 2024-09-17 12:02:36,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=599779.8333333334, ans=0.0 2024-09-17 12:02:38,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-17 12:02:39,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=599779.8333333334, ans=0.2 2024-09-17 12:02:46,499 INFO [train.py:1198] (1/2) Epoch 34, batch 800, loss[loss=0.2177, ctc_loss=0.1429, cr_loss=0.374, over 21029.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3709, over 4020706.45 frames. ], batch size: 63, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:02:54,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=599808.1666666666, ans=0.125 2024-09-17 12:03:09,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599836.5, ans=0.0 2024-09-17 12:03:09,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=599836.5, ans=0.125 2024-09-17 12:03:18,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=599864.8333333334, ans=0.0 2024-09-17 12:03:20,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.189e+02 2.313e+02 2.479e+02 6.233e+02, threshold=4.625e+02, percent-clipped=1.0 2024-09-17 12:03:30,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=599893.1666666666, ans=0.0 2024-09-17 12:03:31,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=599893.1666666666, ans=0.125 2024-09-17 12:03:39,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=599893.1666666666, ans=0.0 2024-09-17 12:04:01,513 INFO [train.py:1198] (1/2) Epoch 34, batch 850, loss[loss=0.2387, ctc_loss=0.1596, cr_loss=0.3954, over 20697.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3725, over 4036346.75 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:05:19,421 INFO [train.py:1198] (1/2) Epoch 34, batch 900, loss[loss=0.2217, ctc_loss=0.1464, cr_loss=0.3765, over 20889.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1484, cr_loss=0.3731, over 4052350.07 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:05:38,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=600119.8333333334, ans=0.125 2024-09-17 12:05:57,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.140e+02 2.258e+02 2.425e+02 4.265e+02, threshold=4.516e+02, percent-clipped=0.0 2024-09-17 12:06:00,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=600148.1666666666, ans=0.125 2024-09-17 12:06:14,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=600176.5, ans=0.0 2024-09-17 12:06:38,369 INFO [train.py:1198] (1/2) Epoch 34, batch 950, loss[loss=0.236, ctc_loss=0.1565, cr_loss=0.3972, over 20840.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3732, over 4043381.83 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:06:40,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=600233.1666666666, ans=0.125 2024-09-17 12:06:54,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-17 12:06:59,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=600261.5, ans=0.125 2024-09-17 12:07:40,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=600346.5, ans=0.0 2024-09-17 12:07:45,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-17 12:07:53,672 INFO [train.py:1198] (1/2) Epoch 34, batch 1000, loss[loss=0.234, ctc_loss=0.1614, cr_loss=0.363, over 20821.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1491, cr_loss=0.3739, over 4066310.14 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:08:28,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.189e+02 2.311e+02 2.508e+02 3.416e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 12:08:29,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-09-17 12:08:48,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=600459.8333333334, ans=0.125 2024-09-17 12:09:05,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=600488.1666666666, ans=0.125 2024-09-17 12:09:08,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 12:09:09,394 INFO [train.py:1198] (1/2) Epoch 34, batch 1050, loss[loss=0.2223, ctc_loss=0.1471, cr_loss=0.3757, over 21067.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1496, cr_loss=0.375, over 4075360.51 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:09:31,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-17 12:09:42,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=600573.1666666666, ans=0.2 2024-09-17 12:10:04,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=600601.5, ans=0.125 2024-09-17 12:10:28,483 INFO [train.py:1198] (1/2) Epoch 34, batch 1100, loss[loss=0.1902, ctc_loss=0.1249, cr_loss=0.3265, over 21033.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.3739, over 4078433.14 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:10:41,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=22.5 2024-09-17 12:10:51,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=600686.5, ans=0.125 2024-09-17 12:10:55,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=600686.5, ans=0.125 2024-09-17 12:11:04,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.160e+02 2.290e+02 2.475e+02 4.873e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-17 12:11:10,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2024-09-17 12:11:21,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600743.1666666666, ans=0.1 2024-09-17 12:11:29,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=600743.1666666666, ans=0.0 2024-09-17 12:11:48,663 INFO [train.py:1198] (1/2) Epoch 34, batch 1150, loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3727, over 20971.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1492, cr_loss=0.3737, over 4078687.88 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:12:22,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=600856.5, ans=0.0 2024-09-17 12:12:42,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=600884.8333333334, ans=0.125 2024-09-17 12:13:05,158 INFO [train.py:1198] (1/2) Epoch 34, batch 1200, loss[loss=0.2463, ctc_loss=0.1632, cr_loss=0.4154, over 21077.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1487, cr_loss=0.3738, over 4085219.79 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:13:29,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=600969.8333333334, ans=0.0 2024-09-17 12:13:39,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.188e+02 2.327e+02 2.488e+02 4.386e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 12:13:42,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600998.1666666666, ans=0.1 2024-09-17 12:13:44,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=12.0 2024-09-17 12:13:45,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=600998.1666666666, ans=0.125 2024-09-17 12:13:55,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-09-17 12:13:56,717 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:14:13,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-17 12:14:18,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-17 12:14:20,376 INFO [train.py:1198] (1/2) Epoch 34, batch 1250, loss[loss=0.225, ctc_loss=0.1503, cr_loss=0.3735, over 20934.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1496, cr_loss=0.375, over 4077335.91 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:14:42,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=601111.5, ans=0.0 2024-09-17 12:14:49,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=601139.8333333334, ans=0.125 2024-09-17 12:14:52,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=601139.8333333334, ans=0.125 2024-09-17 12:15:36,370 INFO [train.py:1198] (1/2) Epoch 34, batch 1300, loss[loss=0.2564, ctc_loss=0.1715, cr_loss=0.4244, over 20725.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3724, over 4089957.98 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:15:41,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=601224.8333333334, ans=0.125 2024-09-17 12:16:01,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=601253.1666666666, ans=0.125 2024-09-17 12:16:02,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601253.1666666666, ans=0.125 2024-09-17 12:16:14,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.131e+02 2.259e+02 2.447e+02 8.802e+02, threshold=4.518e+02, percent-clipped=1.0 2024-09-17 12:16:19,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=601281.5, ans=0.125 2024-09-17 12:16:27,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=601309.8333333334, ans=0.0 2024-09-17 12:16:34,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=601309.8333333334, ans=0.0 2024-09-17 12:16:39,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=601338.1666666666, ans=0.125 2024-09-17 12:16:55,656 INFO [train.py:1198] (1/2) Epoch 34, batch 1350, loss[loss=0.2326, ctc_loss=0.1567, cr_loss=0.3795, over 21026.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3718, over 4089775.90 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:17:32,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=601423.1666666666, ans=0.0 2024-09-17 12:17:39,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-09-17 12:17:42,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=601451.5, ans=0.125 2024-09-17 12:18:01,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=601479.8333333334, ans=0.125 2024-09-17 12:18:14,636 INFO [train.py:1198] (1/2) Epoch 34, batch 1400, loss[loss=0.2073, ctc_loss=0.1368, cr_loss=0.3525, over 20785.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3718, over 4097696.19 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:18:27,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=601508.1666666666, ans=0.025 2024-09-17 12:18:43,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=601564.8333333334, ans=0.0 2024-09-17 12:18:45,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=601564.8333333334, ans=0.2 2024-09-17 12:18:49,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.171e+02 2.295e+02 2.442e+02 3.235e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 12:19:29,938 INFO [train.py:1198] (1/2) Epoch 34, batch 1450, loss[loss=0.2087, ctc_loss=0.1401, cr_loss=0.3426, over 21083.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3741, over 4088365.47 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:19:37,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=601649.8333333334, ans=0.125 2024-09-17 12:20:01,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=601706.5, ans=0.0 2024-09-17 12:20:41,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=601763.1666666666, ans=0.125 2024-09-17 12:20:45,510 INFO [train.py:1198] (1/2) Epoch 34, batch 1500, loss[loss=0.2572, ctc_loss=0.1735, cr_loss=0.4185, over 19470.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3731, over 4097481.90 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:21:00,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=601819.8333333334, ans=0.125 2024-09-17 12:21:19,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.192e+02 2.293e+02 2.474e+02 3.224e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-17 12:21:32,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=601876.5, ans=0.125 2024-09-17 12:22:00,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=601904.8333333334, ans=0.0 2024-09-17 12:22:03,835 INFO [train.py:1198] (1/2) Epoch 34, batch 1550, loss[loss=0.2021, ctc_loss=0.1333, cr_loss=0.3443, over 21076.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1495, cr_loss=0.375, over 4101638.59 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:22:14,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=601933.1666666666, ans=0.125 2024-09-17 12:22:31,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601961.5, ans=0.1 2024-09-17 12:22:40,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=601989.8333333334, ans=0.125 2024-09-17 12:22:56,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602018.1666666666, ans=0.125 2024-09-17 12:23:22,665 INFO [train.py:1198] (1/2) Epoch 34, batch 1600, loss[loss=0.2096, ctc_loss=0.1352, cr_loss=0.3719, over 20869.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1497, cr_loss=0.3751, over 4100478.59 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:23:57,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.176e+02 2.326e+02 2.475e+02 3.570e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 12:24:14,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602159.8333333334, ans=0.125 2024-09-17 12:24:21,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.23 vs. limit=6.0 2024-09-17 12:24:38,683 INFO [train.py:1198] (1/2) Epoch 34, batch 1650, loss[loss=0.2065, ctc_loss=0.1354, cr_loss=0.3555, over 20883.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1499, cr_loss=0.3757, over 4081580.77 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:24:43,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-17 12:24:50,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=602216.5, ans=0.0 2024-09-17 12:24:57,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=602244.8333333334, ans=0.0 2024-09-17 12:25:00,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=602244.8333333334, ans=0.125 2024-09-17 12:25:14,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-17 12:25:34,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=12.0 2024-09-17 12:25:48,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602329.8333333334, ans=0.1 2024-09-17 12:25:54,739 INFO [train.py:1198] (1/2) Epoch 34, batch 1700, loss[loss=0.2081, ctc_loss=0.1357, cr_loss=0.3621, over 21047.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1487, cr_loss=0.3735, over 4086411.67 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 64.0 2024-09-17 12:26:29,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.136e+02 2.268e+02 2.431e+02 3.384e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-17 12:26:42,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=602443.1666666666, ans=0.125 2024-09-17 12:26:49,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=602443.1666666666, ans=0.2 2024-09-17 12:27:09,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602471.5, ans=0.0 2024-09-17 12:27:13,567 INFO [train.py:1198] (1/2) Epoch 34, batch 1750, loss[loss=0.2301, ctc_loss=0.1543, cr_loss=0.3786, over 20955.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.149, cr_loss=0.3743, over 4104164.67 frames. ], batch size: 58, lr: 2.44e-03, grad_scale: 64.0 2024-09-17 12:27:42,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=602556.5, ans=0.125 2024-09-17 12:27:44,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=602556.5, ans=0.0 2024-09-17 12:28:28,932 INFO [train.py:1198] (1/2) Epoch 34, batch 1800, loss[loss=0.2556, ctc_loss=0.1771, cr_loss=0.3924, over 14625.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3738, over 4102498.32 frames. ], batch size: 149, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:28:30,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=602641.5, ans=0.05 2024-09-17 12:29:08,453 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.178e+02 2.311e+02 2.479e+02 3.589e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 12:29:11,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=602698.1666666666, ans=0.95 2024-09-17 12:29:44,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=602754.8333333334, ans=0.125 2024-09-17 12:29:47,779 INFO [train.py:1198] (1/2) Epoch 34, batch 1850, loss[loss=0.2152, ctc_loss=0.1414, cr_loss=0.3692, over 20769.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3726, over 4089610.71 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:29:51,212 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:30:12,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=602811.5, ans=0.125 2024-09-17 12:30:24,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602839.8333333334, ans=0.0 2024-09-17 12:30:25,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=602839.8333333334, ans=0.125 2024-09-17 12:30:40,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2024-09-17 12:30:41,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=602868.1666666666, ans=0.0 2024-09-17 12:31:03,603 INFO [train.py:1198] (1/2) Epoch 34, batch 1900, loss[loss=0.2521, ctc_loss=0.1678, cr_loss=0.4213, over 19259.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3724, over 4083083.29 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:31:10,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=602924.8333333334, ans=0.0 2024-09-17 12:31:17,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-17 12:31:32,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-09-17 12:31:40,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.188e+02 2.308e+02 2.457e+02 4.132e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 12:32:03,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=603038.1666666666, ans=0.2 2024-09-17 12:32:19,631 INFO [train.py:1198] (1/2) Epoch 34, batch 1950, loss[loss=0.228, ctc_loss=0.1511, cr_loss=0.3847, over 20786.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3721, over 4093226.60 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:32:41,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-17 12:33:33,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=603179.8333333334, ans=0.0 2024-09-17 12:33:37,973 INFO [train.py:1198] (1/2) Epoch 34, batch 2000, loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3758, over 21009.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.372, over 4102370.81 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:33:57,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2024-09-17 12:34:14,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.186e+02 2.309e+02 2.461e+02 3.400e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 12:34:39,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=603293.1666666666, ans=0.1 2024-09-17 12:34:44,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=603321.5, ans=0.0 2024-09-17 12:34:48,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=603321.5, ans=0.0 2024-09-17 12:34:54,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=603321.5, ans=0.125 2024-09-17 12:34:57,200 INFO [train.py:1198] (1/2) Epoch 34, batch 2050, loss[loss=0.2198, ctc_loss=0.1451, cr_loss=0.3733, over 20772.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3716, over 4088286.94 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:35:30,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=603406.5, ans=0.2 2024-09-17 12:35:34,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=603406.5, ans=0.015 2024-09-17 12:36:12,646 INFO [train.py:1198] (1/2) Epoch 34, batch 2100, loss[loss=0.1805, ctc_loss=0.1187, cr_loss=0.3091, over 20789.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.3707, over 4095151.78 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:36:17,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=603491.5, ans=0.125 2024-09-17 12:36:20,629 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:36:31,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-09-17 12:36:46,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-17 12:36:48,797 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.178e+02 2.294e+02 2.445e+02 3.418e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 12:37:10,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=603576.5, ans=0.125 2024-09-17 12:37:28,131 INFO [train.py:1198] (1/2) Epoch 34, batch 2150, loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3698, over 20969.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3712, over 4107583.49 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:37:53,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-17 12:38:23,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-17 12:38:47,027 INFO [train.py:1198] (1/2) Epoch 34, batch 2200, loss[loss=0.2716, ctc_loss=0.1857, cr_loss=0.4297, over 17992.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3729, over 4105521.60 frames. ], batch size: 108, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:39:04,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=603803.1666666666, ans=0.125 2024-09-17 12:39:13,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=22.5 2024-09-17 12:39:22,899 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.165e+02 2.320e+02 2.433e+02 4.912e+02, threshold=4.641e+02, percent-clipped=1.0 2024-09-17 12:39:38,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=603859.8333333334, ans=0.0 2024-09-17 12:40:02,359 INFO [train.py:1198] (1/2) Epoch 34, batch 2250, loss[loss=0.2138, ctc_loss=0.1425, cr_loss=0.3564, over 21072.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.372, over 4103331.50 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:40:21,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603944.8333333334, ans=0.1 2024-09-17 12:40:48,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-17 12:41:09,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=22.5 2024-09-17 12:41:15,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=604029.8333333334, ans=0.125 2024-09-17 12:41:20,844 INFO [train.py:1198] (1/2) Epoch 34, batch 2300, loss[loss=0.2301, ctc_loss=0.1522, cr_loss=0.3894, over 21035.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3712, over 4092711.49 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:41:57,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.174e+02 2.294e+02 2.464e+02 3.145e+02, threshold=4.587e+02, percent-clipped=0.0 2024-09-17 12:42:03,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=604114.8333333334, ans=0.0 2024-09-17 12:42:04,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=604143.1666666666, ans=10.0 2024-09-17 12:42:26,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-09-17 12:42:36,728 INFO [train.py:1198] (1/2) Epoch 34, batch 2350, loss[loss=0.1734, ctc_loss=0.1132, cr_loss=0.3009, over 20962.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.148, cr_loss=0.3713, over 4091694.42 frames. ], batch size: 50, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:42:50,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604228.1666666666, ans=0.125 2024-09-17 12:43:18,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=604256.5, ans=10.0 2024-09-17 12:43:26,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=604284.8333333334, ans=15.0 2024-09-17 12:43:36,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=604313.1666666666, ans=0.125 2024-09-17 12:43:53,068 INFO [train.py:1198] (1/2) Epoch 34, batch 2400, loss[loss=0.239, ctc_loss=0.1609, cr_loss=0.3904, over 20841.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3711, over 4093477.46 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:44:22,599 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:44:32,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.181e+02 2.329e+02 2.467e+02 3.409e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-17 12:44:57,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604454.8333333334, ans=0.125 2024-09-17 12:45:12,244 INFO [train.py:1198] (1/2) Epoch 34, batch 2450, loss[loss=0.244, ctc_loss=0.1635, cr_loss=0.4021, over 20692.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3716, over 4081976.83 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:45:15,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=604483.1666666666, ans=0.125 2024-09-17 12:45:33,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=604511.5, ans=0.125 2024-09-17 12:46:07,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=604568.1666666666, ans=15.0 2024-09-17 12:46:12,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-17 12:46:31,331 INFO [train.py:1198] (1/2) Epoch 34, batch 2500, loss[loss=0.2235, ctc_loss=0.15, cr_loss=0.3679, over 20675.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3724, over 4082016.58 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:46:33,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-17 12:47:07,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.142e+02 2.289e+02 2.463e+02 3.544e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-17 12:47:12,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=604681.5, ans=0.125 2024-09-17 12:47:16,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=22.5 2024-09-17 12:47:26,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-17 12:47:46,871 INFO [train.py:1198] (1/2) Epoch 34, batch 2550, loss[loss=0.2019, ctc_loss=0.1304, cr_loss=0.3575, over 20984.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3711, over 4089736.67 frames. ], batch size: 51, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:48:22,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=604823.1666666666, ans=0.125 2024-09-17 12:48:32,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=604851.5, ans=0.0 2024-09-17 12:48:33,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=604851.5, ans=0.0 2024-09-17 12:48:54,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:49:02,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=604908.1666666666, ans=0.125 2024-09-17 12:49:03,435 INFO [train.py:1198] (1/2) Epoch 34, batch 2600, loss[loss=0.238, ctc_loss=0.1613, cr_loss=0.3833, over 20687.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3713, over 4091509.87 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:49:08,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=604908.1666666666, ans=0.015 2024-09-17 12:49:40,075 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.105e+02 2.240e+02 2.428e+02 3.435e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 12:50:22,782 INFO [train.py:1198] (1/2) Epoch 34, batch 2650, loss[loss=0.1845, ctc_loss=0.1195, cr_loss=0.3251, over 19862.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3698, over 4091108.40 frames. ], batch size: 44, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:50:32,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=605049.8333333334, ans=0.2 2024-09-17 12:50:50,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-17 12:50:58,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=22.5 2024-09-17 12:51:27,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=605163.1666666666, ans=0.0 2024-09-17 12:51:33,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=605163.1666666666, ans=10.0 2024-09-17 12:51:37,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=605191.5, ans=0.04949747468305833 2024-09-17 12:51:38,879 INFO [train.py:1198] (1/2) Epoch 34, batch 2700, loss[loss=0.2485, ctc_loss=0.1656, cr_loss=0.4144, over 20676.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.372, over 4083285.55 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:51:51,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=605191.5, ans=0.1 2024-09-17 12:52:18,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.223e+02 2.340e+02 2.514e+02 4.321e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-17 12:52:45,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=12.0 2024-09-17 12:52:46,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=605304.8333333334, ans=15.0 2024-09-17 12:52:57,113 INFO [train.py:1198] (1/2) Epoch 34, batch 2750, loss[loss=0.2168, ctc_loss=0.141, cr_loss=0.3791, over 20933.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.148, cr_loss=0.3716, over 4077514.35 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:53:17,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-17 12:53:24,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=605361.5, ans=0.0 2024-09-17 12:53:33,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=605389.8333333334, ans=0.0 2024-09-17 12:53:38,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-17 12:53:56,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=605446.5, ans=0.025 2024-09-17 12:53:56,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=605446.5, ans=0.025 2024-09-17 12:53:57,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-17 12:54:12,993 INFO [train.py:1198] (1/2) Epoch 34, batch 2800, loss[loss=0.2017, ctc_loss=0.1329, cr_loss=0.344, over 20989.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3725, over 4077220.33 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:54:14,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=605474.8333333334, ans=0.5 2024-09-17 12:54:17,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=605474.8333333334, ans=0.125 2024-09-17 12:54:22,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=605474.8333333334, ans=0.125 2024-09-17 12:54:25,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=605474.8333333334, ans=0.125 2024-09-17 12:54:46,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=605531.5, ans=0.125 2024-09-17 12:54:49,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.122e+02 2.280e+02 2.445e+02 4.334e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 12:54:54,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-17 12:55:28,708 INFO [train.py:1198] (1/2) Epoch 34, batch 2850, loss[loss=0.2063, ctc_loss=0.1358, cr_loss=0.3525, over 21076.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3725, over 4079216.90 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:55:35,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=605616.5, ans=0.2 2024-09-17 12:56:12,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605673.1666666666, ans=0.1 2024-09-17 12:56:37,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=605729.8333333334, ans=0.0 2024-09-17 12:56:41,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=605729.8333333334, ans=0.125 2024-09-17 12:56:47,701 INFO [train.py:1198] (1/2) Epoch 34, batch 2900, loss[loss=0.2294, ctc_loss=0.1507, cr_loss=0.3933, over 20830.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3721, over 4092246.14 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:57:12,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605786.5, ans=0.1 2024-09-17 12:57:24,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.176e+02 2.315e+02 2.536e+02 3.273e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-17 12:57:47,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=605843.1666666666, ans=0.125 2024-09-17 12:58:07,468 INFO [train.py:1198] (1/2) Epoch 34, batch 2950, loss[loss=0.2411, ctc_loss=0.1625, cr_loss=0.393, over 19540.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3725, over 4077871.23 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:58:21,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=605928.1666666666, ans=0.0 2024-09-17 12:58:22,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2024-09-17 12:58:36,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=605956.5, ans=0.125 2024-09-17 12:58:50,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=605956.5, ans=0.0 2024-09-17 12:59:23,955 INFO [train.py:1198] (1/2) Epoch 34, batch 3000, loss[loss=0.2225, ctc_loss=0.1422, cr_loss=0.4014, over 21041.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3727, over 4082643.55 frames. ], batch size: 63, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:59:23,956 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 12:59:32,869 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5096, 4.1133, 4.1154, 4.1646], device='cuda:1') 2024-09-17 12:59:34,661 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3581, 3.9710, 3.9991, 3.9931], device='cuda:1') 2024-09-17 12:59:45,050 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.04044, ctc_loss=0.04044, cr_loss=1.35e-14, over 944034.00 frames. 2024-09-17 12:59:45,051 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 12:59:57,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2024-09-17 12:59:58,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=606069.8333333334, ans=0.2 2024-09-17 13:00:03,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=606069.8333333334, ans=0.5 2024-09-17 13:00:16,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=606098.1666666666, ans=0.025 2024-09-17 13:00:18,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606098.1666666666, ans=0.1 2024-09-17 13:00:21,292 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.196e+02 2.327e+02 2.458e+02 4.975e+02, threshold=4.653e+02, percent-clipped=1.0 2024-09-17 13:00:35,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=606126.5, ans=0.2 2024-09-17 13:00:40,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-09-17 13:01:00,629 INFO [train.py:1198] (1/2) Epoch 34, batch 3050, loss[loss=0.2378, ctc_loss=0.1623, cr_loss=0.3775, over 20967.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4078820.09 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:01:07,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=606183.1666666666, ans=0.0 2024-09-17 13:01:30,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2024-09-17 13:01:48,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2024-09-17 13:02:19,219 INFO [train.py:1198] (1/2) Epoch 34, batch 3100, loss[loss=0.2371, ctc_loss=0.1611, cr_loss=0.38, over 19970.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.3728, over 4093592.42 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:02:21,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-09-17 13:02:42,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=22.5 2024-09-17 13:02:55,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.167e+02 2.286e+02 2.394e+02 3.278e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 13:03:00,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=606381.5, ans=0.125 2024-09-17 13:03:15,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606409.8333333334, ans=0.1 2024-09-17 13:03:27,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=606438.1666666666, ans=0.0 2024-09-17 13:03:38,052 INFO [train.py:1198] (1/2) Epoch 34, batch 3150, loss[loss=0.2472, ctc_loss=0.1652, cr_loss=0.4099, over 20955.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1489, cr_loss=0.3749, over 4102824.57 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:03:41,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-17 13:03:45,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=606466.5, ans=0.0 2024-09-17 13:03:48,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=606466.5, ans=0.025 2024-09-17 13:04:18,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.54 vs. limit=10.0 2024-09-17 13:04:54,091 INFO [train.py:1198] (1/2) Epoch 34, batch 3200, loss[loss=0.236, ctc_loss=0.16, cr_loss=0.3796, over 20661.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1487, cr_loss=0.3745, over 4094856.89 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:05:03,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-09-17 13:05:20,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606636.5, ans=0.125 2024-09-17 13:05:28,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-09-17 13:05:30,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.113e+02 2.249e+02 2.408e+02 5.715e+02, threshold=4.498e+02, percent-clipped=1.0 2024-09-17 13:05:35,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=606664.8333333334, ans=0.0 2024-09-17 13:05:38,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=606693.1666666666, ans=0.125 2024-09-17 13:05:43,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=606693.1666666666, ans=0.2 2024-09-17 13:06:09,852 INFO [train.py:1198] (1/2) Epoch 34, batch 3250, loss[loss=0.2445, ctc_loss=0.1649, cr_loss=0.3984, over 20854.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1484, cr_loss=0.3741, over 4105631.72 frames. ], batch size: 65, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:06:22,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=606749.8333333334, ans=0.0 2024-09-17 13:06:28,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606778.1666666666, ans=0.1 2024-09-17 13:07:15,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-09-17 13:07:27,818 INFO [train.py:1198] (1/2) Epoch 34, batch 3300, loss[loss=0.2383, ctc_loss=0.1605, cr_loss=0.3895, over 21047.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3726, over 4095978.97 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:07:34,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=606891.5, ans=0.125 2024-09-17 13:07:39,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=22.5 2024-09-17 13:08:05,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.213e+02 2.360e+02 2.548e+02 3.998e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 13:08:15,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606976.5, ans=0.1 2024-09-17 13:08:22,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=606976.5, ans=0.125 2024-09-17 13:08:32,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=607004.8333333334, ans=0.0 2024-09-17 13:08:43,881 INFO [train.py:1198] (1/2) Epoch 34, batch 3350, loss[loss=0.2683, ctc_loss=0.1883, cr_loss=0.4, over 14423.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1491, cr_loss=0.3743, over 4084113.85 frames. ], batch size: 149, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:09:04,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=607061.5, ans=0.2 2024-09-17 13:09:22,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=607089.8333333334, ans=0.0 2024-09-17 13:09:57,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607146.5, ans=0.1 2024-09-17 13:10:00,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607146.5, ans=0.125 2024-09-17 13:10:03,126 INFO [train.py:1198] (1/2) Epoch 34, batch 3400, loss[loss=0.2202, ctc_loss=0.1484, cr_loss=0.3589, over 20953.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1498, cr_loss=0.3751, over 4076939.47 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:10:41,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.194e+02 2.342e+02 2.524e+02 4.486e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 13:10:46,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=607231.5, ans=0.125 2024-09-17 13:10:47,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=607259.8333333334, ans=0.0 2024-09-17 13:10:52,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=607259.8333333334, ans=0.125 2024-09-17 13:11:05,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=607288.1666666666, ans=0.125 2024-09-17 13:11:10,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=607288.1666666666, ans=0.125 2024-09-17 13:11:19,186 INFO [train.py:1198] (1/2) Epoch 34, batch 3450, loss[loss=0.2297, ctc_loss=0.1539, cr_loss=0.3788, over 21030.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1501, cr_loss=0.3752, over 4074498.08 frames. ], batch size: 61, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:11:22,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=607316.5, ans=0.0 2024-09-17 13:11:25,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=607316.5, ans=0.125 2024-09-17 13:11:40,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=607344.8333333334, ans=0.125 2024-09-17 13:11:48,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607373.1666666666, ans=0.125 2024-09-17 13:12:34,736 INFO [train.py:1198] (1/2) Epoch 34, batch 3500, loss[loss=0.1903, ctc_loss=0.122, cr_loss=0.3413, over 20956.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1493, cr_loss=0.3743, over 4075470.76 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:12:41,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=607458.1666666666, ans=0.2 2024-09-17 13:13:15,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.189e+02 2.290e+02 2.442e+02 5.551e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-17 13:13:35,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607543.1666666666, ans=0.1 2024-09-17 13:13:37,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-17 13:13:53,696 INFO [train.py:1198] (1/2) Epoch 34, batch 3550, loss[loss=0.2737, ctc_loss=0.1964, cr_loss=0.3868, over 13615.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1494, cr_loss=0.3732, over 4067206.18 frames. ], batch size: 149, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:14:09,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=607628.1666666666, ans=0.2 2024-09-17 13:14:28,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.02 vs. limit=10.0 2024-09-17 13:14:30,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=607656.5, ans=0.5 2024-09-17 13:14:40,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-17 13:14:59,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=607713.1666666666, ans=0.125 2024-09-17 13:15:12,596 INFO [train.py:1198] (1/2) Epoch 34, batch 3600, loss[loss=0.2375, ctc_loss=0.157, cr_loss=0.4024, over 20821.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.374, over 4074204.02 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:15:50,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.214e+02 2.345e+02 2.506e+02 3.046e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 13:15:53,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=607798.1666666666, ans=0.125 2024-09-17 13:16:06,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=607826.5, ans=0.125 2024-09-17 13:16:15,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607854.8333333334, ans=0.125 2024-09-17 13:16:15,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-17 13:16:28,410 INFO [train.py:1198] (1/2) Epoch 34, batch 3650, loss[loss=0.2531, ctc_loss=0.1701, cr_loss=0.4145, over 20665.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.3739, over 4075203.20 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:16:28,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607883.1666666666, ans=0.1 2024-09-17 13:16:42,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607911.5, ans=0.125 2024-09-17 13:17:04,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607939.8333333334, ans=0.125 2024-09-17 13:17:44,281 INFO [train.py:1198] (1/2) Epoch 34, batch 3700, loss[loss=0.2423, ctc_loss=0.1635, cr_loss=0.3941, over 20706.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3742, over 4085902.90 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:18:11,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=608053.1666666666, ans=0.125 2024-09-17 13:18:11,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608053.1666666666, ans=0.1 2024-09-17 13:18:24,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.180e+02 2.320e+02 2.477e+02 4.347e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 13:18:29,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=608081.5, ans=0.07 2024-09-17 13:18:48,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=608138.1666666666, ans=0.025 2024-09-17 13:19:02,807 INFO [train.py:1198] (1/2) Epoch 34, batch 3750, loss[loss=0.2172, ctc_loss=0.1431, cr_loss=0.3703, over 20789.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3746, over 4084301.18 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:19:13,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=608166.5, ans=10.0 2024-09-17 13:19:21,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=608194.8333333334, ans=0.2 2024-09-17 13:19:44,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608223.1666666666, ans=0.1 2024-09-17 13:19:44,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=608223.1666666666, ans=0.0 2024-09-17 13:20:08,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=608279.8333333334, ans=0.2 2024-09-17 13:20:18,272 INFO [train.py:1198] (1/2) Epoch 34, batch 3800, loss[loss=0.2532, ctc_loss=0.1775, cr_loss=0.3786, over 14195.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3737, over 4089795.56 frames. ], batch size: 149, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:20:30,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608308.1666666666, ans=0.125 2024-09-17 13:20:59,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.150e+02 2.272e+02 2.403e+02 2.988e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 13:21:01,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608364.8333333334, ans=0.1 2024-09-17 13:21:37,789 INFO [train.py:1198] (1/2) Epoch 34, batch 3850, loss[loss=0.2242, ctc_loss=0.151, cr_loss=0.3658, over 20973.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3746, over 4087819.28 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:21:39,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=608449.8333333334, ans=0.125 2024-09-17 13:22:08,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=608506.5, ans=0.2 2024-09-17 13:22:52,506 INFO [train.py:1198] (1/2) Epoch 34, batch 3900, loss[loss=0.2045, ctc_loss=0.1349, cr_loss=0.3481, over 20945.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1499, cr_loss=0.3751, over 4071624.82 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:23:09,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=608619.8333333334, ans=0.125 2024-09-17 13:23:09,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=608619.8333333334, ans=0.2 2024-09-17 13:23:19,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-17 13:23:19,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=15.0 2024-09-17 13:23:21,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=608648.1666666666, ans=0.125 2024-09-17 13:23:28,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608648.1666666666, ans=0.1 2024-09-17 13:23:30,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.185e+02 2.295e+02 2.500e+02 3.064e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 13:24:11,289 INFO [train.py:1198] (1/2) Epoch 34, batch 3950, loss[loss=0.2321, ctc_loss=0.1533, cr_loss=0.394, over 21081.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3736, over 4074879.61 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:24:25,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=608761.5, ans=0.125 2024-09-17 13:24:34,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=608761.5, ans=0.0 2024-09-17 13:25:27,187 INFO [train.py:1198] (1/2) Epoch 34, batch 4000, loss[loss=0.2134, ctc_loss=0.1396, cr_loss=0.3689, over 20828.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1492, cr_loss=0.3738, over 4075401.62 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:25:48,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=608903.1666666666, ans=0.0 2024-09-17 13:25:50,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=608903.1666666666, ans=0.125 2024-09-17 13:26:05,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.298e+02 2.524e+02 3.297e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-17 13:26:37,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608988.1666666666, ans=0.1 2024-09-17 13:26:40,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=608988.1666666666, ans=0.0 2024-09-17 13:26:45,807 INFO [train.py:1198] (1/2) Epoch 34, batch 4050, loss[loss=0.2559, ctc_loss=0.1701, cr_loss=0.4288, over 20105.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.149, cr_loss=0.3739, over 4082722.63 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:27:44,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=609129.8333333334, ans=0.125 2024-09-17 13:28:00,778 INFO [train.py:1198] (1/2) Epoch 34, batch 4100, loss[loss=0.1958, ctc_loss=0.1294, cr_loss=0.332, over 20888.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3737, over 4088075.02 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:28:04,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=609158.1666666666, ans=0.035 2024-09-17 13:28:10,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=609158.1666666666, ans=0.125 2024-09-17 13:28:31,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=609214.8333333334, ans=0.0 2024-09-17 13:28:31,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-17 13:28:38,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.179e+02 2.331e+02 2.481e+02 7.163e+02, threshold=4.661e+02, percent-clipped=1.0 2024-09-17 13:28:38,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=609214.8333333334, ans=0.0 2024-09-17 13:28:43,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=609214.8333333334, ans=0.0 2024-09-17 13:29:15,887 INFO [train.py:1198] (1/2) Epoch 34, batch 4150, loss[loss=0.2342, ctc_loss=0.1548, cr_loss=0.3967, over 20949.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3728, over 4092637.11 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:29:29,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-17 13:29:31,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=609328.1666666666, ans=0.0 2024-09-17 13:30:24,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=609413.1666666666, ans=0.2 2024-09-17 13:30:35,025 INFO [train.py:1198] (1/2) Epoch 34, batch 4200, loss[loss=0.2314, ctc_loss=0.1571, cr_loss=0.3718, over 20344.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3712, over 4108833.67 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:31:08,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=609498.1666666666, ans=0.125 2024-09-17 13:31:13,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.207e+02 2.340e+02 2.535e+02 3.723e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-17 13:31:16,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609498.1666666666, ans=0.1 2024-09-17 13:31:53,655 INFO [train.py:1198] (1/2) Epoch 34, batch 4250, loss[loss=0.2513, ctc_loss=0.168, cr_loss=0.4161, over 20831.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3723, over 4098942.12 frames. ], batch size: 65, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:31:58,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=609583.1666666666, ans=0.0 2024-09-17 13:32:10,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=609611.5, ans=0.0 2024-09-17 13:32:24,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=609639.8333333334, ans=0.2 2024-09-17 13:32:30,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=609639.8333333334, ans=0.125 2024-09-17 13:32:39,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=609668.1666666666, ans=0.05 2024-09-17 13:32:44,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=609668.1666666666, ans=0.125 2024-09-17 13:33:09,310 INFO [train.py:1198] (1/2) Epoch 34, batch 4300, loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3703, over 20872.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3726, over 4108073.92 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:33:12,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=609724.8333333334, ans=0.0 2024-09-17 13:33:21,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=609724.8333333334, ans=0.0 2024-09-17 13:33:22,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-17 13:33:47,297 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.170e+02 2.319e+02 2.498e+02 3.591e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 13:33:48,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=22.5 2024-09-17 13:33:50,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=609781.5, ans=0.125 2024-09-17 13:34:25,509 INFO [train.py:1198] (1/2) Epoch 34, batch 4350, loss[loss=0.1908, ctc_loss=0.1235, cr_loss=0.3366, over 21005.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1482, cr_loss=0.3732, over 4103624.30 frames. ], batch size: 52, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:34:34,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=609866.5, ans=0.125 2024-09-17 13:34:40,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=609894.8333333334, ans=0.125 2024-09-17 13:35:14,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=609951.5, ans=0.0 2024-09-17 13:35:23,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609951.5, ans=0.125 2024-09-17 13:35:44,811 INFO [train.py:1198] (1/2) Epoch 34, batch 4400, loss[loss=0.246, ctc_loss=0.1653, cr_loss=0.4034, over 20676.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1482, cr_loss=0.3729, over 4101480.64 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:35:49,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=610008.1666666666, ans=0.0 2024-09-17 13:36:00,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610036.5, ans=0.1 2024-09-17 13:36:14,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=610064.8333333334, ans=22.5 2024-09-17 13:36:22,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.182e+02 2.345e+02 2.533e+02 5.369e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-17 13:36:35,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=610093.1666666666, ans=0.07 2024-09-17 13:37:00,570 INFO [train.py:1198] (1/2) Epoch 34, batch 4450, loss[loss=0.2392, ctc_loss=0.1582, cr_loss=0.4052, over 19470.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3734, over 4093015.09 frames. ], batch size: 90, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:37:27,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-09-17 13:37:41,438 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:38:19,083 INFO [train.py:1198] (1/2) Epoch 34, batch 4500, loss[loss=0.2405, ctc_loss=0.166, cr_loss=0.3725, over 21071.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3726, over 4087180.81 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:38:24,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=610291.5, ans=0.0 2024-09-17 13:38:32,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=610291.5, ans=0.2 2024-09-17 13:38:35,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:38:36,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=610319.8333333334, ans=0.5 2024-09-17 13:38:57,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.222e+02 2.306e+02 2.536e+02 4.017e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 13:39:16,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=610376.5, ans=0.125 2024-09-17 13:39:22,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=610404.8333333334, ans=0.2 2024-09-17 13:39:35,277 INFO [train.py:1198] (1/2) Epoch 34, batch 4550, loss[loss=0.2288, ctc_loss=0.1521, cr_loss=0.3835, over 20651.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3724, over 4093227.60 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:40:14,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=610489.8333333334, ans=0.2 2024-09-17 13:40:48,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=610546.5, ans=10.0 2024-09-17 13:40:50,465 INFO [train.py:1198] (1/2) Epoch 34, batch 4600, loss[loss=0.2498, ctc_loss=0.1655, cr_loss=0.4213, over 20685.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3744, over 4089274.72 frames. ], batch size: 71, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:41:31,860 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.138e+02 2.254e+02 2.371e+02 3.030e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-17 13:42:10,400 INFO [train.py:1198] (1/2) Epoch 34, batch 4650, loss[loss=0.2318, ctc_loss=0.1545, cr_loss=0.3864, over 20874.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3722, over 4104469.80 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:42:18,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=610716.5, ans=0.04949747468305833 2024-09-17 13:42:32,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=610744.8333333334, ans=0.2 2024-09-17 13:42:50,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=610773.1666666666, ans=0.0 2024-09-17 13:43:28,822 INFO [train.py:1198] (1/2) Epoch 34, batch 4700, loss[loss=0.2404, ctc_loss=0.1679, cr_loss=0.3621, over 19263.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3727, over 4104609.40 frames. ], batch size: 90, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:43:32,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=610858.1666666666, ans=0.125 2024-09-17 13:43:54,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610886.5, ans=0.125 2024-09-17 13:43:59,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610914.8333333334, ans=0.125 2024-09-17 13:44:05,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610914.8333333334, ans=0.125 2024-09-17 13:44:06,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.173e+02 2.272e+02 2.414e+02 3.059e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 13:44:09,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610914.8333333334, ans=0.1 2024-09-17 13:44:35,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=610971.5, ans=0.125 2024-09-17 13:44:41,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-17 13:44:43,838 INFO [train.py:1198] (1/2) Epoch 34, batch 4750, loss[loss=0.2259, ctc_loss=0.1502, cr_loss=0.3785, over 20952.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.373, over 4101474.21 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:45:00,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-17 13:45:25,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5 2024-09-17 13:45:35,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=611084.8333333334, ans=0.125 2024-09-17 13:45:59,811 INFO [train.py:1198] (1/2) Epoch 34, batch 4800, loss[loss=0.2035, ctc_loss=0.1292, cr_loss=0.3717, over 19839.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3709, over 4095819.04 frames. ], batch size: 44, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:46:00,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=12.0 2024-09-17 13:46:10,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=611141.5, ans=0.125 2024-09-17 13:46:18,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=611169.8333333334, ans=0.0 2024-09-17 13:46:18,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=611169.8333333334, ans=0.125 2024-09-17 13:46:37,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.186e+02 2.296e+02 2.536e+02 6.072e+02, threshold=4.591e+02, percent-clipped=1.0 2024-09-17 13:46:44,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611226.5, ans=0.125 2024-09-17 13:47:18,193 INFO [train.py:1198] (1/2) Epoch 34, batch 4850, loss[loss=0.2443, ctc_loss=0.1633, cr_loss=0.4053, over 21054.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3721, over 4085881.80 frames. ], batch size: 62, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:47:30,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=611283.1666666666, ans=0.125 2024-09-17 13:47:30,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=611283.1666666666, ans=0.125 2024-09-17 13:47:36,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=611311.5, ans=0.2 2024-09-17 13:47:59,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-09-17 13:48:33,861 INFO [train.py:1198] (1/2) Epoch 34, batch 4900, loss[loss=0.2511, ctc_loss=0.1671, cr_loss=0.4201, over 20820.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3728, over 4090069.26 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:48:53,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-17 13:49:11,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.145e+02 2.313e+02 2.528e+02 4.796e+02, threshold=4.626e+02, percent-clipped=1.0 2024-09-17 13:49:35,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=611538.1666666666, ans=0.2 2024-09-17 13:49:35,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=611538.1666666666, ans=0.125 2024-09-17 13:49:37,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=611538.1666666666, ans=0.025 2024-09-17 13:49:46,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=611538.1666666666, ans=0.125 2024-09-17 13:49:51,933 INFO [train.py:1198] (1/2) Epoch 34, batch 4950, loss[loss=0.2592, ctc_loss=0.1789, cr_loss=0.4015, over 18248.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3731, over 4074421.33 frames. ], batch size: 108, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:49:55,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=611566.5, ans=0.025 2024-09-17 13:50:31,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=611623.1666666666, ans=0.0 2024-09-17 13:50:41,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=611651.5, ans=0.0 2024-09-17 13:50:45,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=611651.5, ans=0.0 2024-09-17 13:50:49,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 13:51:06,986 INFO [train.py:1198] (1/2) Epoch 34, batch 5000, loss[loss=0.2256, ctc_loss=0.149, cr_loss=0.3831, over 20964.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3714, over 4079509.54 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:51:19,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2024-09-17 13:51:29,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=611736.5, ans=0.125 2024-09-17 13:51:43,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.204e+02 2.316e+02 2.456e+02 4.302e+02, threshold=4.633e+02, percent-clipped=0.0 2024-09-17 13:51:51,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=611793.1666666666, ans=0.2 2024-09-17 13:52:06,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611821.5, ans=0.125 2024-09-17 13:52:20,766 INFO [train.py:1198] (1/2) Epoch 34, batch 5050, loss[loss=0.2843, ctc_loss=0.2002, cr_loss=0.4204, over 14487.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3715, over 4079797.63 frames. ], batch size: 149, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:52:49,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=611906.5, ans=0.125 2024-09-17 13:52:50,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:52:55,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=611906.5, ans=0.2 2024-09-17 13:53:14,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611934.8333333334, ans=0.1 2024-09-17 13:53:34,818 INFO [train.py:1198] (1/2) Epoch 34, batch 5100, loss[loss=0.2086, ctc_loss=0.1351, cr_loss=0.3672, over 20829.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3732, over 4083852.78 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:54:13,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.198e+02 2.370e+02 2.602e+02 4.625e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-17 13:54:50,290 INFO [train.py:1198] (1/2) Epoch 34, batch 5150, loss[loss=0.2118, ctc_loss=0.1386, cr_loss=0.3662, over 20989.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3748, over 4077970.34 frames. ], batch size: 52, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:55:04,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=612161.5, ans=0.0 2024-09-17 13:55:26,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612189.8333333334, ans=0.1 2024-09-17 13:55:28,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-09-17 13:55:35,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=612218.1666666666, ans=0.2 2024-09-17 13:55:40,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=612218.1666666666, ans=0.125 2024-09-17 13:55:48,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=612218.1666666666, ans=0.125 2024-09-17 13:56:07,608 INFO [train.py:1198] (1/2) Epoch 34, batch 5200, loss[loss=0.1765, ctc_loss=0.1147, cr_loss=0.309, over 20989.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.15, cr_loss=0.3755, over 4075343.85 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:56:12,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=612274.8333333334, ans=0.025 2024-09-17 13:56:45,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.166e+02 2.330e+02 2.489e+02 8.863e+02, threshold=4.659e+02, percent-clipped=1.0 2024-09-17 13:56:56,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-09-17 13:56:59,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612359.8333333334, ans=0.1 2024-09-17 13:57:22,370 INFO [train.py:1198] (1/2) Epoch 34, batch 5250, loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3747, over 20778.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.3742, over 4081137.85 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:57:52,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-17 13:58:20,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=612529.8333333334, ans=0.125 2024-09-17 13:58:34,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2024-09-17 13:58:36,687 INFO [train.py:1198] (1/2) Epoch 34, batch 5300, loss[loss=0.2014, ctc_loss=0.1321, cr_loss=0.3462, over 20990.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1487, cr_loss=0.3736, over 4085737.14 frames. ], batch size: 52, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:58:38,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=612558.1666666666, ans=0.2 2024-09-17 13:58:46,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=612558.1666666666, ans=0.125 2024-09-17 13:59:16,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.156e+02 2.275e+02 2.449e+02 5.228e+02, threshold=4.550e+02, percent-clipped=1.0 2024-09-17 13:59:39,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=612671.5, ans=0.2 2024-09-17 13:59:53,791 INFO [train.py:1198] (1/2) Epoch 34, batch 5350, loss[loss=0.2499, ctc_loss=0.1643, cr_loss=0.428, over 20032.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3716, over 4092157.07 frames. ], batch size: 80, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:00:31,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612756.5, ans=0.1 2024-09-17 14:01:08,333 INFO [train.py:1198] (1/2) Epoch 34, batch 5400, loss[loss=0.2404, ctc_loss=0.1596, cr_loss=0.4043, over 21064.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3725, over 4089771.56 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:01:19,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=612841.5, ans=0.125 2024-09-17 14:01:28,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=612869.8333333334, ans=0.125 2024-09-17 14:01:47,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.167e+02 2.303e+02 2.469e+02 5.351e+02, threshold=4.605e+02, percent-clipped=1.0 2024-09-17 14:01:56,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=612926.5, ans=0.125 2024-09-17 14:02:23,192 INFO [train.py:1198] (1/2) Epoch 34, batch 5450, loss[loss=0.2488, ctc_loss=0.1653, cr_loss=0.4175, over 20704.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3731, over 4085689.43 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:03:06,583 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:03:37,784 INFO [train.py:1198] (1/2) Epoch 34, batch 5500, loss[loss=0.2479, ctc_loss=0.1681, cr_loss=0.3989, over 20855.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3736, over 4091618.55 frames. ], batch size: 65, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:03:39,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-17 14:04:02,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=613153.1666666666, ans=0.125 2024-09-17 14:04:06,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=613181.5, ans=0.2 2024-09-17 14:04:08,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=613181.5, ans=0.125 2024-09-17 14:04:16,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.199e+02 2.307e+02 2.476e+02 3.144e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 14:04:38,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=613238.1666666666, ans=0.0 2024-09-17 14:04:55,190 INFO [train.py:1198] (1/2) Epoch 34, batch 5550, loss[loss=0.1978, ctc_loss=0.1293, cr_loss=0.3426, over 21081.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3726, over 4091195.11 frames. ], batch size: 53, lr: 2.42e-03, grad_scale: 16.0 2024-09-17 14:06:09,901 INFO [train.py:1198] (1/2) Epoch 34, batch 5600, loss[loss=0.2125, ctc_loss=0.1391, cr_loss=0.3669, over 20983.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3728, over 4093579.03 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:06:16,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-09-17 14:06:17,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=613408.1666666666, ans=0.125 2024-09-17 14:06:29,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-17 14:06:50,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.188e+02 2.330e+02 2.477e+02 4.047e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-17 14:06:54,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=613493.1666666666, ans=0.125 2024-09-17 14:07:02,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613493.1666666666, ans=0.0 2024-09-17 14:07:23,934 INFO [train.py:1198] (1/2) Epoch 34, batch 5650, loss[loss=0.2319, ctc_loss=0.1573, cr_loss=0.373, over 21019.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3711, over 4095232.56 frames. ], batch size: 63, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:07:38,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=613578.1666666666, ans=0.125 2024-09-17 14:07:56,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=613606.5, ans=0.09899494936611666 2024-09-17 14:08:29,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=613663.1666666666, ans=0.0 2024-09-17 14:08:41,336 INFO [train.py:1198] (1/2) Epoch 34, batch 5700, loss[loss=0.2403, ctc_loss=0.1622, cr_loss=0.3905, over 20959.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3708, over 4095712.24 frames. ], batch size: 64, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:09:07,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=613719.8333333334, ans=0.125 2024-09-17 14:09:21,030 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.163e+02 2.283e+02 2.479e+02 3.080e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-17 14:09:39,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=613804.8333333334, ans=0.125 2024-09-17 14:09:48,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=613804.8333333334, ans=0.125 2024-09-17 14:09:55,693 INFO [train.py:1198] (1/2) Epoch 34, batch 5750, loss[loss=0.2565, ctc_loss=0.1782, cr_loss=0.3916, over 14688.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.372, over 4090753.04 frames. ], batch size: 149, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:10:04,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=613833.1666666666, ans=0.0 2024-09-17 14:10:05,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-17 14:10:07,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613833.1666666666, ans=0.0 2024-09-17 14:10:18,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2024-09-17 14:10:22,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=613861.5, ans=0.2 2024-09-17 14:10:30,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=613889.8333333334, ans=0.0 2024-09-17 14:11:08,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=613974.8333333334, ans=0.0 2024-09-17 14:11:10,052 INFO [train.py:1198] (1/2) Epoch 34, batch 5800, loss[loss=0.2134, ctc_loss=0.1402, cr_loss=0.366, over 20959.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3709, over 4089469.09 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:11:15,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613974.8333333334, ans=0.1 2024-09-17 14:11:22,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613974.8333333334, ans=0.0 2024-09-17 14:11:23,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614003.1666666666, ans=0.1 2024-09-17 14:11:28,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=614003.1666666666, ans=0.09899494936611666 2024-09-17 14:11:50,080 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.219e+02 2.320e+02 2.534e+02 4.392e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 14:12:24,588 INFO [train.py:1198] (1/2) Epoch 34, batch 5850, loss[loss=0.2145, ctc_loss=0.1431, cr_loss=0.3572, over 20770.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.371, over 4093353.26 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:12:32,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=614116.5, ans=0.0 2024-09-17 14:13:06,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614173.1666666666, ans=0.1 2024-09-17 14:13:11,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=614201.5, ans=0.0 2024-09-17 14:13:19,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-17 14:13:27,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=614229.8333333334, ans=0.125 2024-09-17 14:13:31,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=614229.8333333334, ans=0.0 2024-09-17 14:13:31,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=614229.8333333334, ans=22.5 2024-09-17 14:13:40,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=614258.1666666666, ans=0.125 2024-09-17 14:13:41,579 INFO [train.py:1198] (1/2) Epoch 34, batch 5900, loss[loss=0.2268, ctc_loss=0.1491, cr_loss=0.3886, over 20935.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1473, cr_loss=0.3703, over 4098239.91 frames. ], batch size: 67, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:13:45,058 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:13:59,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=614286.5, ans=0.125 2024-09-17 14:14:10,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=614314.8333333334, ans=0.04949747468305833 2024-09-17 14:14:17,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=614314.8333333334, ans=0.2 2024-09-17 14:14:21,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.170e+02 2.303e+02 2.431e+02 5.111e+02, threshold=4.605e+02, percent-clipped=1.0 2024-09-17 14:14:46,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-17 14:14:53,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2024-09-17 14:14:55,688 INFO [train.py:1198] (1/2) Epoch 34, batch 5950, loss[loss=0.2316, ctc_loss=0.1509, cr_loss=0.4036, over 20990.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1471, cr_loss=0.37, over 4087262.72 frames. ], batch size: 55, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:15:33,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=614456.5, ans=0.05 2024-09-17 14:15:49,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=614484.8333333334, ans=0.0 2024-09-17 14:16:11,991 INFO [train.py:1198] (1/2) Epoch 34, batch 6000, loss[loss=0.2294, ctc_loss=0.1517, cr_loss=0.3889, over 20901.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3704, over 4079670.61 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:16:11,992 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 14:16:33,503 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.04046, ctc_loss=0.04046, cr_loss=1.343e-14, over 944034.00 frames. 2024-09-17 14:16:33,504 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 14:17:14,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.225e+02 2.353e+02 2.545e+02 3.255e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-17 14:17:24,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=614626.5, ans=0.125 2024-09-17 14:17:48,387 INFO [train.py:1198] (1/2) Epoch 34, batch 6050, loss[loss=0.1944, ctc_loss=0.1277, cr_loss=0.3332, over 20886.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1471, cr_loss=0.3697, over 4080552.53 frames. ], batch size: 54, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:18:14,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-09-17 14:18:28,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=614739.8333333334, ans=0.2 2024-09-17 14:18:30,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-09-17 14:18:36,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-17 14:19:04,237 INFO [train.py:1198] (1/2) Epoch 34, batch 6100, loss[loss=0.2632, ctc_loss=0.1838, cr_loss=0.3972, over 14378.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1468, cr_loss=0.3696, over 4082963.14 frames. ], batch size: 149, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:19:10,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2024-09-17 14:19:28,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=614853.1666666666, ans=0.0 2024-09-17 14:19:44,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.172e+02 2.312e+02 2.473e+02 3.608e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 14:20:18,551 INFO [train.py:1198] (1/2) Epoch 34, batch 6150, loss[loss=0.2249, ctc_loss=0.1504, cr_loss=0.3728, over 21040.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3722, over 4073740.25 frames. ], batch size: 62, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:20:20,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614966.5, ans=0.125 2024-09-17 14:20:25,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-09-17 14:20:29,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=614966.5, ans=0.0 2024-09-17 14:20:42,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614994.8333333334, ans=0.0 2024-09-17 14:20:43,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=614994.8333333334, ans=0.125 2024-09-17 14:20:46,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=615023.1666666666, ans=0.025 2024-09-17 14:20:53,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-17 14:20:54,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=615023.1666666666, ans=0.0 2024-09-17 14:20:56,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=615023.1666666666, ans=0.125 2024-09-17 14:21:24,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=615079.8333333334, ans=0.2 2024-09-17 14:21:33,265 INFO [train.py:1198] (1/2) Epoch 34, batch 6200, loss[loss=0.2, ctc_loss=0.1285, cr_loss=0.3577, over 20985.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3738, over 4066896.42 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:22:13,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.184e+02 2.290e+02 2.420e+02 3.320e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 14:22:39,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=22.5 2024-09-17 14:22:47,222 INFO [train.py:1198] (1/2) Epoch 34, batch 6250, loss[loss=0.2848, ctc_loss=0.1994, cr_loss=0.427, over 13936.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.3753, over 4052482.56 frames. ], batch size: 149, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:22:54,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615249.8333333334, ans=0.1 2024-09-17 14:23:40,266 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:24:01,151 INFO [train.py:1198] (1/2) Epoch 34, batch 6300, loss[loss=0.2279, ctc_loss=0.1537, cr_loss=0.371, over 20728.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1524, cr_loss=0.3768, over 3993133.17 frames. ], batch size: 71, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:24:19,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=615419.8333333334, ans=0.125 2024-09-17 14:24:29,349 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:24:39,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.278e+02 2.453e+02 2.629e+02 4.068e+02, threshold=4.906e+02, percent-clipped=0.0 2024-09-17 14:24:40,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=615448.1666666666, ans=0.125 2024-09-17 14:24:41,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615448.1666666666, ans=0.125 2024-09-17 14:24:56,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-17 14:25:08,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.07 vs. limit=22.5 2024-09-17 14:25:12,462 INFO [train.py:1198] (1/2) Epoch 34, batch 6350, loss[loss=0.2751, ctc_loss=0.1941, cr_loss=0.4052, over 14201.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1581, cr_loss=0.3827, over 3840015.57 frames. ], batch size: 149, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:25:37,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615561.5, ans=0.125 2024-09-17 14:25:41,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=615589.8333333334, ans=0.2 2024-09-17 14:25:42,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=615589.8333333334, ans=0.125 2024-09-17 14:27:02,488 INFO [train.py:1198] (1/2) Epoch 35, batch 0, loss[loss=0.229, ctc_loss=0.1514, cr_loss=0.3877, over 21044.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1514, cr_loss=0.3877, over 21044.00 frames. ], batch size: 62, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:27:02,489 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 14:27:20,889 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.04033, ctc_loss=0.04033, cr_loss=1.343e-14, over 944034.00 frames. 2024-09-17 14:27:20,890 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 14:27:44,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=12.0 2024-09-17 14:27:57,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=615706.0, ans=0.125 2024-09-17 14:28:15,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.346e+02 2.639e+02 2.882e+02 4.248e+02, threshold=5.277e+02, percent-clipped=0.0 2024-09-17 14:28:26,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-09-17 14:28:36,651 INFO [train.py:1198] (1/2) Epoch 35, batch 50, loss[loss=0.2121, ctc_loss=0.1409, cr_loss=0.3559, over 20817.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.149, cr_loss=0.3742, over 923903.68 frames. ], batch size: 59, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:29:24,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=615876.0, ans=15.0 2024-09-17 14:29:49,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=615904.3333333334, ans=0.0 2024-09-17 14:29:52,571 INFO [train.py:1198] (1/2) Epoch 35, batch 100, loss[loss=0.1739, ctc_loss=0.1114, cr_loss=0.3125, over 20945.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1478, cr_loss=0.3741, over 1631201.40 frames. ], batch size: 50, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:29:57,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=615932.6666666666, ans=0.0 2024-09-17 14:30:46,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.125e+02 2.216e+02 2.377e+02 3.062e+02, threshold=4.432e+02, percent-clipped=0.0 2024-09-17 14:30:50,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-17 14:31:07,905 INFO [train.py:1198] (1/2) Epoch 35, batch 150, loss[loss=0.2097, ctc_loss=0.1385, cr_loss=0.3558, over 21020.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3732, over 2180760.48 frames. ], batch size: 61, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:31:18,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=616074.3333333334, ans=0.2 2024-09-17 14:31:32,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=616102.6666666666, ans=0.125 2024-09-17 14:31:32,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616102.6666666666, ans=0.1 2024-09-17 14:31:45,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=616131.0, ans=0.125 2024-09-17 14:31:46,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=616131.0, ans=0.125 2024-09-17 14:32:04,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=616159.3333333334, ans=0.0 2024-09-17 14:32:26,983 INFO [train.py:1198] (1/2) Epoch 35, batch 200, loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.369, over 21071.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3726, over 2612698.10 frames. ], batch size: 59, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:32:33,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=616216.0, ans=0.0 2024-09-17 14:32:53,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=616244.3333333334, ans=0.07 2024-09-17 14:33:26,687 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.197e+02 2.340e+02 2.528e+02 4.891e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-17 14:33:45,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=616357.6666666666, ans=0.0 2024-09-17 14:33:46,218 INFO [train.py:1198] (1/2) Epoch 35, batch 250, loss[loss=0.1662, ctc_loss=0.1092, cr_loss=0.2848, over 20974.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3707, over 2947036.80 frames. ], batch size: 48, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:33:58,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=22.5 2024-09-17 14:34:01,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=616386.0, ans=0.0 2024-09-17 14:34:06,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616386.0, ans=0.1 2024-09-17 14:34:21,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616414.3333333334, ans=0.1 2024-09-17 14:34:23,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-17 14:34:30,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=616442.6666666666, ans=0.0 2024-09-17 14:34:32,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616442.6666666666, ans=0.1 2024-09-17 14:34:40,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=616442.6666666666, ans=0.125 2024-09-17 14:34:49,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=616471.0, ans=0.125 2024-09-17 14:35:01,888 INFO [train.py:1198] (1/2) Epoch 35, batch 300, loss[loss=0.2304, ctc_loss=0.1561, cr_loss=0.3714, over 20922.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.371, over 3199520.85 frames. ], batch size: 60, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:35:17,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=22.5 2024-09-17 14:35:57,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.179e+02 2.311e+02 2.475e+02 3.658e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 14:36:10,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=616612.6666666666, ans=0.07 2024-09-17 14:36:17,700 INFO [train.py:1198] (1/2) Epoch 35, batch 350, loss[loss=0.238, ctc_loss=0.1581, cr_loss=0.3998, over 20655.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3713, over 3411597.89 frames. ], batch size: 66, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:36:33,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=616669.3333333334, ans=0.125 2024-09-17 14:37:14,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=616726.0, ans=0.025 2024-09-17 14:37:33,483 INFO [train.py:1198] (1/2) Epoch 35, batch 400, loss[loss=0.2533, ctc_loss=0.1715, cr_loss=0.409, over 20694.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3711, over 3575318.35 frames. ], batch size: 71, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:37:46,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=616782.6666666666, ans=0.125 2024-09-17 14:37:50,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=616811.0, ans=0.0 2024-09-17 14:38:08,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=616839.3333333334, ans=0.0 2024-09-17 14:38:32,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.223e+02 2.348e+02 2.513e+02 3.156e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-17 14:38:54,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=616924.3333333334, ans=0.125 2024-09-17 14:38:55,740 INFO [train.py:1198] (1/2) Epoch 35, batch 450, loss[loss=0.2023, ctc_loss=0.1342, cr_loss=0.3406, over 20968.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.3717, over 3683622.61 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:39:03,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=616924.3333333334, ans=0.0 2024-09-17 14:39:06,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=616924.3333333334, ans=0.125 2024-09-17 14:39:19,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-09-17 14:40:11,662 INFO [train.py:1198] (1/2) Epoch 35, batch 500, loss[loss=0.1954, ctc_loss=0.128, cr_loss=0.3369, over 20948.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1467, cr_loss=0.37, over 3777273.02 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:40:27,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=617094.3333333334, ans=0.0 2024-09-17 14:41:07,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.163e+02 2.284e+02 2.476e+02 3.109e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 14:41:21,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=617179.3333333334, ans=0.025 2024-09-17 14:41:27,199 INFO [train.py:1198] (1/2) Epoch 35, batch 550, loss[loss=0.2414, ctc_loss=0.1604, cr_loss=0.405, over 20639.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.147, cr_loss=0.3705, over 3848619.81 frames. ], batch size: 68, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:41:42,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617236.0, ans=0.1 2024-09-17 14:41:46,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=617236.0, ans=0.0 2024-09-17 14:42:17,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=617292.6666666666, ans=0.125 2024-09-17 14:42:35,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=617321.0, ans=0.0 2024-09-17 14:42:35,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-09-17 14:42:41,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=617349.3333333334, ans=0.125 2024-09-17 14:42:42,702 INFO [train.py:1198] (1/2) Epoch 35, batch 600, loss[loss=0.2369, ctc_loss=0.1577, cr_loss=0.3962, over 20962.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3722, over 3901914.18 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:42:42,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=617349.3333333334, ans=0.2 2024-09-17 14:43:41,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.137e+02 2.286e+02 2.474e+02 5.128e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-17 14:44:01,870 INFO [train.py:1198] (1/2) Epoch 35, batch 650, loss[loss=0.1951, ctc_loss=0.1291, cr_loss=0.3297, over 20980.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3725, over 3934622.80 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:44:05,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=617491.0, ans=0.0 2024-09-17 14:44:23,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=617519.3333333334, ans=0.125 2024-09-17 14:44:28,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=617519.3333333334, ans=0.125 2024-09-17 14:44:37,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=15.0 2024-09-17 14:45:11,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617604.3333333334, ans=0.125 2024-09-17 14:45:20,650 INFO [train.py:1198] (1/2) Epoch 35, batch 700, loss[loss=0.199, ctc_loss=0.1308, cr_loss=0.3412, over 20979.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3716, over 3970882.11 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:45:30,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=617632.6666666666, ans=22.5 2024-09-17 14:46:06,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=617717.6666666666, ans=0.125 2024-09-17 14:46:16,846 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.142e+02 2.223e+02 2.439e+02 3.650e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-17 14:46:24,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=617746.0, ans=0.125 2024-09-17 14:46:27,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=617746.0, ans=0.07 2024-09-17 14:46:30,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617746.0, ans=0.1 2024-09-17 14:46:36,678 INFO [train.py:1198] (1/2) Epoch 35, batch 750, loss[loss=0.2046, ctc_loss=0.1368, cr_loss=0.3392, over 21067.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1477, cr_loss=0.3707, over 3984609.19 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:46:46,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.13 vs. limit=6.0 2024-09-17 14:46:49,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=617774.3333333334, ans=0.025 2024-09-17 14:47:09,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=617831.0, ans=0.125 2024-09-17 14:47:09,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=22.5 2024-09-17 14:47:33,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=617859.3333333334, ans=0.025 2024-09-17 14:47:40,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617887.6666666666, ans=0.1 2024-09-17 14:47:52,445 INFO [train.py:1198] (1/2) Epoch 35, batch 800, loss[loss=0.2003, ctc_loss=0.1315, cr_loss=0.3438, over 21023.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3715, over 4018106.21 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:48:16,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617944.3333333334, ans=0.1 2024-09-17 14:48:48,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.187e+02 2.316e+02 2.474e+02 3.498e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-17 14:49:08,730 INFO [train.py:1198] (1/2) Epoch 35, batch 850, loss[loss=0.2486, ctc_loss=0.1636, cr_loss=0.4248, over 20939.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3721, over 4047550.37 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:49:54,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=618114.3333333334, ans=0.0 2024-09-17 14:50:30,284 INFO [train.py:1198] (1/2) Epoch 35, batch 900, loss[loss=0.1753, ctc_loss=0.1127, cr_loss=0.3132, over 20950.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1462, cr_loss=0.3696, over 4068762.56 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:50:35,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618199.3333333334, ans=0.1 2024-09-17 14:50:47,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-17 14:51:07,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=618256.0, ans=0.125 2024-09-17 14:51:13,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=618256.0, ans=0.0 2024-09-17 14:51:25,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=618284.3333333334, ans=0.0 2024-09-17 14:51:26,661 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.155e+02 2.255e+02 2.427e+02 3.618e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-17 14:51:46,176 INFO [train.py:1198] (1/2) Epoch 35, batch 950, loss[loss=0.2137, ctc_loss=0.1403, cr_loss=0.3669, over 20896.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3696, over 4079580.02 frames. ], batch size: 54, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:51:53,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=618341.0, ans=0.2 2024-09-17 14:52:04,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618369.3333333334, ans=0.125 2024-09-17 14:52:06,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-17 14:53:02,153 INFO [train.py:1198] (1/2) Epoch 35, batch 1000, loss[loss=0.2405, ctc_loss=0.1602, cr_loss=0.4011, over 20836.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.371, over 4077116.69 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:53:08,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=618482.6666666666, ans=0.125 2024-09-17 14:53:12,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-09-17 14:53:23,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=618511.0, ans=0.125 2024-09-17 14:53:57,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=618567.6666666666, ans=0.0 2024-09-17 14:53:58,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.192e+02 2.316e+02 2.461e+02 4.578e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 14:54:17,848 INFO [train.py:1198] (1/2) Epoch 35, batch 1050, loss[loss=0.2513, ctc_loss=0.1703, cr_loss=0.4047, over 21024.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3724, over 4085694.80 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:54:30,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618624.3333333334, ans=0.1 2024-09-17 14:54:34,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=618652.6666666666, ans=0.0 2024-09-17 14:55:33,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=618737.6666666666, ans=0.0 2024-09-17 14:55:36,570 INFO [train.py:1198] (1/2) Epoch 35, batch 1100, loss[loss=0.2559, ctc_loss=0.171, cr_loss=0.4247, over 20856.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3717, over 4095452.72 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:55:59,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=618794.3333333334, ans=0.125 2024-09-17 14:56:34,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.196e+02 2.311e+02 2.530e+02 3.718e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 14:56:36,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=618851.0, ans=0.125 2024-09-17 14:56:54,231 INFO [train.py:1198] (1/2) Epoch 35, batch 1150, loss[loss=0.2076, ctc_loss=0.1342, cr_loss=0.3668, over 20889.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3718, over 4092722.73 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:57:03,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=618907.6666666666, ans=0.125 2024-09-17 14:57:40,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=618992.6666666666, ans=0.125 2024-09-17 14:57:59,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=619021.0, ans=0.04949747468305833 2024-09-17 14:58:08,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619049.3333333334, ans=0.1 2024-09-17 14:58:09,197 INFO [train.py:1198] (1/2) Epoch 35, batch 1200, loss[loss=0.2202, ctc_loss=0.1489, cr_loss=0.3565, over 21013.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3714, over 4092986.97 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:58:31,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=619077.6666666666, ans=0.2 2024-09-17 14:59:05,038 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.162e+02 2.302e+02 2.491e+02 3.463e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 14:59:11,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=619162.6666666666, ans=0.025 2024-09-17 14:59:13,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-17 14:59:25,057 INFO [train.py:1198] (1/2) Epoch 35, batch 1250, loss[loss=0.2125, ctc_loss=0.1397, cr_loss=0.3642, over 20850.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3704, over 4099839.32 frames. ], batch size: 59, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:59:29,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619191.0, ans=0.1 2024-09-17 14:59:31,501 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:59:47,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=8.0 2024-09-17 15:00:18,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=619276.0, ans=0.125 2024-09-17 15:00:35,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=619304.3333333334, ans=0.0 2024-09-17 15:00:43,953 INFO [train.py:1198] (1/2) Epoch 35, batch 1300, loss[loss=0.2312, ctc_loss=0.1528, cr_loss=0.3921, over 20944.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.146, cr_loss=0.369, over 4107304.96 frames. ], batch size: 60, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:00:56,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=619332.6666666666, ans=0.125 2024-09-17 15:01:10,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2024-09-17 15:01:22,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2024-09-17 15:01:40,143 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.131e+02 2.259e+02 2.433e+02 4.361e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 15:01:42,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=619417.6666666666, ans=0.0 2024-09-17 15:01:49,510 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:01:52,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=619446.0, ans=0.125 2024-09-17 15:02:02,511 INFO [train.py:1198] (1/2) Epoch 35, batch 1350, loss[loss=0.2382, ctc_loss=0.158, cr_loss=0.4014, over 20964.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1463, cr_loss=0.3694, over 4114994.59 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:02:10,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2024-09-17 15:02:17,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=619502.6666666666, ans=0.0 2024-09-17 15:02:40,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=619531.0, ans=0.2 2024-09-17 15:03:18,023 INFO [train.py:1198] (1/2) Epoch 35, batch 1400, loss[loss=0.1982, ctc_loss=0.1297, cr_loss=0.3427, over 20809.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3723, over 4109552.11 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:03:54,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=619672.6666666666, ans=0.0 2024-09-17 15:04:03,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=619701.0, ans=0.2 2024-09-17 15:04:14,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.162e+02 2.295e+02 2.482e+02 5.420e+02, threshold=4.591e+02, percent-clipped=3.0 2024-09-17 15:04:33,859 INFO [train.py:1198] (1/2) Epoch 35, batch 1450, loss[loss=0.2135, ctc_loss=0.14, cr_loss=0.3675, over 20778.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3727, over 4093962.46 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:04:49,573 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:05:28,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=619842.6666666666, ans=0.0 2024-09-17 15:05:49,874 INFO [train.py:1198] (1/2) Epoch 35, batch 1500, loss[loss=0.2444, ctc_loss=0.1626, cr_loss=0.4091, over 20994.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3736, over 4078481.09 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:05:54,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=619899.3333333334, ans=0.125 2024-09-17 15:06:08,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619927.6666666666, ans=0.1 2024-09-17 15:06:49,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.135e+02 2.284e+02 2.403e+02 4.344e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 15:06:50,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=619984.3333333334, ans=0.5 2024-09-17 15:07:05,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=620012.6666666666, ans=0.0 2024-09-17 15:07:09,056 INFO [train.py:1198] (1/2) Epoch 35, batch 1550, loss[loss=0.2097, ctc_loss=0.1373, cr_loss=0.362, over 21050.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3733, over 4080165.49 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:07:15,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620041.0, ans=0.125 2024-09-17 15:07:48,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-17 15:07:56,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620126.0, ans=0.125 2024-09-17 15:08:05,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=620126.0, ans=0.125 2024-09-17 15:08:07,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=620126.0, ans=0.0 2024-09-17 15:08:07,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-17 15:08:17,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620154.3333333334, ans=0.1 2024-09-17 15:08:23,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=620154.3333333334, ans=0.035 2024-09-17 15:08:26,485 INFO [train.py:1198] (1/2) Epoch 35, batch 1600, loss[loss=0.2573, ctc_loss=0.1776, cr_loss=0.3983, over 18194.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1484, cr_loss=0.3729, over 4088958.23 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:08:28,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-17 15:08:36,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620182.6666666666, ans=0.125 2024-09-17 15:09:00,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=620239.3333333334, ans=0.125 2024-09-17 15:09:22,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.194e+02 2.316e+02 2.550e+02 5.334e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 15:09:42,077 INFO [train.py:1198] (1/2) Epoch 35, batch 1650, loss[loss=0.2225, ctc_loss=0.1493, cr_loss=0.3656, over 21013.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3736, over 4090254.07 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:10:04,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=620352.6666666666, ans=0.125 2024-09-17 15:10:06,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=620352.6666666666, ans=0.125 2024-09-17 15:10:26,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=620409.3333333334, ans=0.0 2024-09-17 15:10:58,399 INFO [train.py:1198] (1/2) Epoch 35, batch 1700, loss[loss=0.2325, ctc_loss=0.1541, cr_loss=0.392, over 19331.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1488, cr_loss=0.3743, over 4099336.17 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:11:12,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=620494.3333333334, ans=0.125 2024-09-17 15:11:17,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620494.3333333334, ans=0.1 2024-09-17 15:11:45,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620551.0, ans=0.1 2024-09-17 15:11:54,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.152e+02 2.290e+02 2.430e+02 3.451e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 15:11:56,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620551.0, ans=0.125 2024-09-17 15:11:59,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=620579.3333333334, ans=0.125 2024-09-17 15:12:13,975 INFO [train.py:1198] (1/2) Epoch 35, batch 1750, loss[loss=0.2547, ctc_loss=0.1695, cr_loss=0.426, over 18387.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1482, cr_loss=0.3738, over 4105444.00 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:12:23,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=620607.6666666666, ans=0.0 2024-09-17 15:13:26,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=620721.0, ans=0.2 2024-09-17 15:13:34,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620749.3333333334, ans=0.1 2024-09-17 15:13:35,328 INFO [train.py:1198] (1/2) Epoch 35, batch 1800, loss[loss=0.1817, ctc_loss=0.1209, cr_loss=0.3041, over 20978.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3725, over 4090577.84 frames. ], batch size: 51, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:13:35,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=620749.3333333334, ans=0.0 2024-09-17 15:14:25,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=620834.3333333334, ans=0.125 2024-09-17 15:14:31,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.180e+02 2.316e+02 2.502e+02 4.613e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 15:14:51,170 INFO [train.py:1198] (1/2) Epoch 35, batch 1850, loss[loss=0.2675, ctc_loss=0.1868, cr_loss=0.4037, over 14618.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3719, over 4094147.86 frames. ], batch size: 152, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:15:38,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=12.0 2024-09-17 15:15:40,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 15:16:06,339 INFO [train.py:1198] (1/2) Epoch 35, batch 1900, loss[loss=0.2491, ctc_loss=0.1665, cr_loss=0.4128, over 19466.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3734, over 4093207.84 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:16:13,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-09-17 15:16:15,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=621032.6666666666, ans=0.2 2024-09-17 15:16:20,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=621061.0, ans=0.125 2024-09-17 15:16:43,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=621089.3333333334, ans=0.0 2024-09-17 15:16:49,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=621089.3333333334, ans=0.125 2024-09-17 15:16:53,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=621117.6666666666, ans=0.125 2024-09-17 15:16:59,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2024-09-17 15:17:02,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=621117.6666666666, ans=15.0 2024-09-17 15:17:02,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.203e+02 2.324e+02 2.466e+02 3.414e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 15:17:08,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621146.0, ans=0.1 2024-09-17 15:17:22,335 INFO [train.py:1198] (1/2) Epoch 35, batch 1950, loss[loss=0.2408, ctc_loss=0.1609, cr_loss=0.3992, over 20679.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3744, over 4095865.03 frames. ], batch size: 66, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:17:22,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=621174.3333333334, ans=0.0 2024-09-17 15:17:24,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-17 15:17:37,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=621202.6666666666, ans=0.125 2024-09-17 15:17:39,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=621202.6666666666, ans=0.0 2024-09-17 15:18:12,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=621259.3333333334, ans=0.0 2024-09-17 15:18:15,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=621259.3333333334, ans=0.125 2024-09-17 15:18:24,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621287.6666666666, ans=0.1 2024-09-17 15:18:41,246 INFO [train.py:1198] (1/2) Epoch 35, batch 2000, loss[loss=0.222, ctc_loss=0.1468, cr_loss=0.3758, over 21019.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1484, cr_loss=0.3737, over 4104657.58 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:19:17,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=621372.6666666666, ans=0.125 2024-09-17 15:19:29,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=621401.0, ans=0.0 2024-09-17 15:19:32,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621401.0, ans=0.125 2024-09-17 15:19:34,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=621401.0, ans=0.0 2024-09-17 15:19:36,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=621401.0, ans=6.0 2024-09-17 15:19:41,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.171e+02 2.282e+02 2.466e+02 4.272e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 15:19:59,809 INFO [train.py:1198] (1/2) Epoch 35, batch 2050, loss[loss=0.2539, ctc_loss=0.1724, cr_loss=0.4079, over 20692.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3742, over 4112048.24 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:20:13,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621486.0, ans=0.1 2024-09-17 15:20:18,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=621486.0, ans=0.0 2024-09-17 15:20:38,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-17 15:20:53,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=621542.6666666666, ans=0.025 2024-09-17 15:21:11,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=621571.0, ans=0.125 2024-09-17 15:21:15,655 INFO [train.py:1198] (1/2) Epoch 35, batch 2100, loss[loss=0.2137, ctc_loss=0.1416, cr_loss=0.3602, over 20790.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.3729, over 4111903.92 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:21:23,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=621599.3333333334, ans=0.125 2024-09-17 15:21:27,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=621599.3333333334, ans=0.125 2024-09-17 15:21:29,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=621627.6666666666, ans=0.125 2024-09-17 15:21:30,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=621627.6666666666, ans=0.125 2024-09-17 15:21:35,409 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:21:41,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=12.0 2024-09-17 15:22:05,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-17 15:22:12,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.173e+02 2.293e+02 2.439e+02 5.713e+02, threshold=4.587e+02, percent-clipped=1.0 2024-09-17 15:22:30,926 INFO [train.py:1198] (1/2) Epoch 35, batch 2150, loss[loss=0.2361, ctc_loss=0.1588, cr_loss=0.3864, over 21036.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.3727, over 4114489.40 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:22:34,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=621741.0, ans=0.125 2024-09-17 15:22:52,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=621769.3333333334, ans=0.125 2024-09-17 15:22:59,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-09-17 15:23:49,241 INFO [train.py:1198] (1/2) Epoch 35, batch 2200, loss[loss=0.2566, ctc_loss=0.1736, cr_loss=0.415, over 20070.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3715, over 4099431.54 frames. ], batch size: 80, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:24:21,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=621939.3333333334, ans=0.0 2024-09-17 15:24:39,332 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:24:50,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.142e+02 2.263e+02 2.441e+02 4.069e+02, threshold=4.526e+02, percent-clipped=0.0 2024-09-17 15:25:00,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=621996.0, ans=0.125 2024-09-17 15:25:07,429 INFO [train.py:1198] (1/2) Epoch 35, batch 2250, loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3677, over 20891.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3722, over 4098645.79 frames. ], batch size: 54, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:25:34,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=622052.6666666666, ans=0.125 2024-09-17 15:25:39,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=622081.0, ans=0.0 2024-09-17 15:25:41,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=622081.0, ans=0.125 2024-09-17 15:25:52,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=622109.3333333334, ans=0.025 2024-09-17 15:26:02,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=15.0 2024-09-17 15:26:22,379 INFO [train.py:1198] (1/2) Epoch 35, batch 2300, loss[loss=0.2285, ctc_loss=0.1505, cr_loss=0.3899, over 21070.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3719, over 4096344.71 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:26:27,256 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:26:30,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=622166.0, ans=0.125 2024-09-17 15:26:55,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=622222.6666666666, ans=0.0 2024-09-17 15:26:58,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=622222.6666666666, ans=0.0 2024-09-17 15:27:14,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-17 15:27:21,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.148e+02 2.271e+02 2.470e+02 4.144e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-17 15:27:33,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622279.3333333334, ans=0.1 2024-09-17 15:27:37,701 INFO [train.py:1198] (1/2) Epoch 35, batch 2350, loss[loss=0.2303, ctc_loss=0.1525, cr_loss=0.3889, over 20310.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3716, over 4096914.27 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:28:06,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=622364.3333333334, ans=0.125 2024-09-17 15:28:26,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622392.6666666666, ans=0.125 2024-09-17 15:28:32,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=622392.6666666666, ans=0.0 2024-09-17 15:28:52,933 INFO [train.py:1198] (1/2) Epoch 35, batch 2400, loss[loss=0.2329, ctc_loss=0.1569, cr_loss=0.38, over 21056.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3717, over 4113300.00 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:29:02,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=622449.3333333334, ans=0.04949747468305833 2024-09-17 15:29:55,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.145e+02 2.299e+02 2.502e+02 3.196e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-17 15:30:01,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622562.6666666666, ans=0.125 2024-09-17 15:30:04,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=622562.6666666666, ans=0.0 2024-09-17 15:30:14,790 INFO [train.py:1198] (1/2) Epoch 35, batch 2450, loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3736, over 20883.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3708, over 4114634.50 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:30:15,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=622591.0, ans=0.125 2024-09-17 15:30:19,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=622591.0, ans=0.125 2024-09-17 15:30:38,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=622619.3333333334, ans=0.05 2024-09-17 15:30:52,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.90 vs. limit=10.0 2024-09-17 15:31:25,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=622704.3333333334, ans=0.0 2024-09-17 15:31:29,805 INFO [train.py:1198] (1/2) Epoch 35, batch 2500, loss[loss=0.2072, ctc_loss=0.1395, cr_loss=0.3384, over 20913.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3715, over 4112102.75 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:31:51,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622761.0, ans=0.1 2024-09-17 15:32:06,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=22.5 2024-09-17 15:32:20,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=622817.6666666666, ans=0.2 2024-09-17 15:32:28,901 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.182e+02 2.280e+02 2.396e+02 3.155e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 15:32:45,767 INFO [train.py:1198] (1/2) Epoch 35, batch 2550, loss[loss=0.2398, ctc_loss=0.1587, cr_loss=0.4055, over 20724.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3715, over 4109395.45 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:33:04,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622902.6666666666, ans=0.125 2024-09-17 15:33:15,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=622931.0, ans=0.05 2024-09-17 15:33:38,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-17 15:33:48,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=622987.6666666666, ans=0.125 2024-09-17 15:34:02,077 INFO [train.py:1198] (1/2) Epoch 35, batch 2600, loss[loss=0.2541, ctc_loss=0.1691, cr_loss=0.4254, over 21008.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3711, over 4112439.34 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:34:14,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623016.0, ans=0.1 2024-09-17 15:34:34,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=623072.6666666666, ans=0.09899494936611666 2024-09-17 15:34:50,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623101.0, ans=0.1 2024-09-17 15:34:52,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=623101.0, ans=0.125 2024-09-17 15:35:05,143 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.182e+02 2.307e+02 2.495e+02 3.528e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 15:35:05,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=623129.3333333334, ans=0.0 2024-09-17 15:35:20,431 INFO [train.py:1198] (1/2) Epoch 35, batch 2650, loss[loss=0.2233, ctc_loss=0.1474, cr_loss=0.3794, over 21072.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3724, over 4109924.30 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:36:00,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-17 15:36:39,811 INFO [train.py:1198] (1/2) Epoch 35, batch 2700, loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3686, over 20963.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3719, over 4116301.36 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:36:59,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-17 15:37:03,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=623327.6666666666, ans=0.0 2024-09-17 15:37:37,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 15:37:41,294 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.194e+02 2.296e+02 2.509e+02 4.831e+02, threshold=4.592e+02, percent-clipped=1.0 2024-09-17 15:37:56,535 INFO [train.py:1198] (1/2) Epoch 35, batch 2750, loss[loss=0.1831, ctc_loss=0.1185, cr_loss=0.3232, over 21007.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3724, over 4111016.26 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:38:03,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=623441.0, ans=10.0 2024-09-17 15:38:45,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-17 15:38:46,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=623526.0, ans=0.0 2024-09-17 15:38:48,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=623526.0, ans=10.0 2024-09-17 15:39:12,056 INFO [train.py:1198] (1/2) Epoch 35, batch 2800, loss[loss=0.2137, ctc_loss=0.1402, cr_loss=0.3674, over 20907.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3697, over 4101927.67 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:39:50,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=623639.3333333334, ans=0.07 2024-09-17 15:40:04,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-17 15:40:07,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=623667.6666666666, ans=0.0 2024-09-17 15:40:12,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.171e+02 2.308e+02 2.428e+02 3.905e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 15:40:16,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=623696.0, ans=0.125 2024-09-17 15:40:27,638 INFO [train.py:1198] (1/2) Epoch 35, batch 2850, loss[loss=0.1761, ctc_loss=0.1143, cr_loss=0.309, over 20978.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.146, cr_loss=0.37, over 4108611.84 frames. ], batch size: 48, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:41:19,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=623809.3333333334, ans=0.125 2024-09-17 15:41:31,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=623837.6666666666, ans=0.025 2024-09-17 15:41:49,029 INFO [train.py:1198] (1/2) Epoch 35, batch 2900, loss[loss=0.2262, ctc_loss=0.1498, cr_loss=0.3818, over 20994.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1457, cr_loss=0.3692, over 4111378.54 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:41:53,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623866.0, ans=0.1 2024-09-17 15:42:03,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=623894.3333333334, ans=0.125 2024-09-17 15:42:05,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=623894.3333333334, ans=15.0 2024-09-17 15:42:17,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-17 15:42:33,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=623951.0, ans=0.125 2024-09-17 15:42:49,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.198e+02 2.335e+02 2.468e+02 3.737e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-17 15:42:59,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=623979.3333333334, ans=0.0 2024-09-17 15:43:01,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=623979.3333333334, ans=0.125 2024-09-17 15:43:04,595 INFO [train.py:1198] (1/2) Epoch 35, batch 2950, loss[loss=0.2436, ctc_loss=0.1626, cr_loss=0.4051, over 21048.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1459, cr_loss=0.3693, over 4108933.02 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:43:20,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=624036.0, ans=0.0 2024-09-17 15:43:27,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=624036.0, ans=0.125 2024-09-17 15:44:13,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=624121.0, ans=0.125 2024-09-17 15:44:20,635 INFO [train.py:1198] (1/2) Epoch 35, batch 3000, loss[loss=0.2466, ctc_loss=0.166, cr_loss=0.4031, over 20681.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3699, over 4107060.47 frames. ], batch size: 71, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:44:20,636 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 15:44:40,499 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.04061, ctc_loss=0.04061, cr_loss=1.315e-14, over 944034.00 frames. 2024-09-17 15:44:40,499 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 15:44:50,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-17 15:45:00,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=624177.6666666666, ans=0.125 2024-09-17 15:45:15,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=624206.0, ans=0.0 2024-09-17 15:45:40,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.174e+02 2.285e+02 2.453e+02 8.705e+02, threshold=4.570e+02, percent-clipped=1.0 2024-09-17 15:45:55,631 INFO [train.py:1198] (1/2) Epoch 35, batch 3050, loss[loss=0.1923, ctc_loss=0.1252, cr_loss=0.3356, over 20815.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.37, over 4094737.60 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:46:11,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-17 15:46:39,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=624347.6666666666, ans=0.125 2024-09-17 15:47:01,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=624404.3333333334, ans=0.125 2024-09-17 15:47:17,360 INFO [train.py:1198] (1/2) Epoch 35, batch 3100, loss[loss=0.2876, ctc_loss=0.2006, cr_loss=0.4353, over 18338.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3726, over 4098935.86 frames. ], batch size: 108, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:47:49,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624489.3333333334, ans=0.125 2024-09-17 15:47:55,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=624489.3333333334, ans=0.025 2024-09-17 15:48:17,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.186e+02 2.317e+02 2.450e+02 4.076e+02, threshold=4.634e+02, percent-clipped=0.0 2024-09-17 15:48:19,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=624546.0, ans=0.125 2024-09-17 15:48:32,231 INFO [train.py:1198] (1/2) Epoch 35, batch 3150, loss[loss=0.2106, ctc_loss=0.1416, cr_loss=0.3446, over 20898.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3721, over 4102372.41 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:49:25,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624659.3333333334, ans=0.125 2024-09-17 15:49:47,712 INFO [train.py:1198] (1/2) Epoch 35, batch 3200, loss[loss=0.2394, ctc_loss=0.1594, cr_loss=0.3997, over 20691.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3728, over 4091483.02 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:49:49,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624716.0, ans=0.125 2024-09-17 15:49:49,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=624716.0, ans=0.2 2024-09-17 15:50:00,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2024-09-17 15:50:32,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=624801.0, ans=0.125 2024-09-17 15:50:39,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=12.0 2024-09-17 15:50:49,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.202e+02 2.301e+02 2.445e+02 3.308e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 15:50:59,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=11.10 vs. limit=10.0 2024-09-17 15:51:04,457 INFO [train.py:1198] (1/2) Epoch 35, batch 3250, loss[loss=0.2551, ctc_loss=0.172, cr_loss=0.4156, over 20337.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3732, over 4100457.06 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:51:05,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=12.0 2024-09-17 15:51:06,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=624857.6666666666, ans=0.125 2024-09-17 15:51:26,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=624886.0, ans=0.125 2024-09-17 15:51:52,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-17 15:52:13,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-17 15:52:16,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=624971.0, ans=0.0 2024-09-17 15:52:21,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=624971.0, ans=0.125 2024-09-17 15:52:23,527 INFO [train.py:1198] (1/2) Epoch 35, batch 3300, loss[loss=0.23, ctc_loss=0.1515, cr_loss=0.3926, over 21065.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.3743, over 4091992.00 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:52:35,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-17 15:52:36,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-17 15:53:04,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625056.0, ans=0.1 2024-09-17 15:53:10,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=625084.3333333334, ans=0.035 2024-09-17 15:53:27,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.205e+02 2.322e+02 2.508e+02 2.970e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-17 15:53:29,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=625112.6666666666, ans=0.125 2024-09-17 15:53:42,438 INFO [train.py:1198] (1/2) Epoch 35, batch 3350, loss[loss=0.21, ctc_loss=0.1362, cr_loss=0.3689, over 20999.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1487, cr_loss=0.3733, over 4082941.45 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:53:49,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625141.0, ans=0.125 2024-09-17 15:53:50,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=625141.0, ans=0.2 2024-09-17 15:54:17,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=625197.6666666666, ans=0.2 2024-09-17 15:54:17,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=625197.6666666666, ans=0.05 2024-09-17 15:54:28,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2024-09-17 15:54:38,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-09-17 15:54:58,024 INFO [train.py:1198] (1/2) Epoch 35, batch 3400, loss[loss=0.2661, ctc_loss=0.1753, cr_loss=0.454, over 20707.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3727, over 4083607.55 frames. ], batch size: 68, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:54:58,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=625282.6666666666, ans=12.0 2024-09-17 15:55:37,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=625339.3333333334, ans=0.125 2024-09-17 15:55:46,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=625367.6666666666, ans=0.07 2024-09-17 15:55:51,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=625367.6666666666, ans=0.125 2024-09-17 15:55:52,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625367.6666666666, ans=0.1 2024-09-17 15:55:58,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.170e+02 2.299e+02 2.445e+02 5.340e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-17 15:56:13,505 INFO [train.py:1198] (1/2) Epoch 35, batch 3450, loss[loss=0.2432, ctc_loss=0.1621, cr_loss=0.4056, over 20824.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3723, over 4087365.58 frames. ], batch size: 65, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:56:16,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=625424.3333333334, ans=0.125 2024-09-17 15:56:18,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=625424.3333333334, ans=0.125 2024-09-17 15:56:19,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=625424.3333333334, ans=0.2 2024-09-17 15:56:33,401 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:56:37,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625452.6666666666, ans=0.125 2024-09-17 15:56:39,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625452.6666666666, ans=0.125 2024-09-17 15:56:48,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=625481.0, ans=0.0 2024-09-17 15:57:00,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625509.3333333334, ans=0.1 2024-09-17 15:57:06,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625509.3333333334, ans=0.125 2024-09-17 15:57:21,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=625537.6666666666, ans=0.125 2024-09-17 15:57:29,019 INFO [train.py:1198] (1/2) Epoch 35, batch 3500, loss[loss=0.2348, ctc_loss=0.1596, cr_loss=0.3757, over 20962.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3718, over 4097504.08 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:57:32,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=625566.0, ans=0.125 2024-09-17 15:58:01,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=625622.6666666666, ans=0.125 2024-09-17 15:58:10,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=625622.6666666666, ans=0.02 2024-09-17 15:58:10,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=625622.6666666666, ans=0.125 2024-09-17 15:58:14,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=625622.6666666666, ans=0.025 2024-09-17 15:58:32,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.172e+02 2.338e+02 2.458e+02 5.405e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-17 15:58:48,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=625679.3333333334, ans=0.0 2024-09-17 15:58:49,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=625707.6666666666, ans=0.0 2024-09-17 15:58:51,006 INFO [train.py:1198] (1/2) Epoch 35, batch 3550, loss[loss=0.2132, ctc_loss=0.1392, cr_loss=0.3699, over 20842.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3723, over 4084520.08 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:59:06,416 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:59:11,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625736.0, ans=0.1 2024-09-17 16:00:06,457 INFO [train.py:1198] (1/2) Epoch 35, batch 3600, loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.3726, over 21065.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3725, over 4096124.95 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:00:34,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=625906.0, ans=0.0 2024-09-17 16:01:05,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.193e+02 2.318e+02 2.457e+02 2.955e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-17 16:01:07,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.21 vs. limit=6.0 2024-09-17 16:01:20,755 INFO [train.py:1198] (1/2) Epoch 35, batch 3650, loss[loss=0.1793, ctc_loss=0.1177, cr_loss=0.3082, over 20959.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3727, over 4104597.99 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:02:11,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=626076.0, ans=0.0 2024-09-17 16:02:36,805 INFO [train.py:1198] (1/2) Epoch 35, batch 3700, loss[loss=0.1678, ctc_loss=0.109, cr_loss=0.2941, over 20999.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3715, over 4108464.38 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:02:38,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=626132.6666666666, ans=0.0 2024-09-17 16:02:47,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=626132.6666666666, ans=0.02 2024-09-17 16:02:52,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=626161.0, ans=0.0 2024-09-17 16:03:18,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626189.3333333334, ans=0.1 2024-09-17 16:03:34,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=626217.6666666666, ans=0.125 2024-09-17 16:03:40,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.148e+02 2.309e+02 2.452e+02 3.277e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-17 16:03:55,550 INFO [train.py:1198] (1/2) Epoch 35, batch 3750, loss[loss=0.2529, ctc_loss=0.174, cr_loss=0.3944, over 14269.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3712, over 4115283.59 frames. ], batch size: 152, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:04:15,476 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:05:13,555 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:05:14,654 INFO [train.py:1198] (1/2) Epoch 35, batch 3800, loss[loss=0.1913, ctc_loss=0.1241, cr_loss=0.3361, over 20955.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3717, over 4093816.85 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:05:36,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=626444.3333333334, ans=0.025 2024-09-17 16:06:08,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626501.0, ans=0.1 2024-09-17 16:06:18,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.197e+02 2.330e+02 2.463e+02 5.185e+02, threshold=4.661e+02, percent-clipped=1.0 2024-09-17 16:06:30,867 INFO [train.py:1198] (1/2) Epoch 35, batch 3850, loss[loss=0.1991, ctc_loss=0.1309, cr_loss=0.341, over 20975.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3715, over 4097867.51 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:07:31,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=626671.0, ans=15.0 2024-09-17 16:07:38,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=626671.0, ans=0.125 2024-09-17 16:07:46,868 INFO [train.py:1198] (1/2) Epoch 35, batch 3900, loss[loss=0.2425, ctc_loss=0.1627, cr_loss=0.3993, over 19498.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.3719, over 4100167.46 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:08:49,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626812.6666666666, ans=0.1 2024-09-17 16:08:50,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.188e+02 2.328e+02 2.499e+02 4.813e+02, threshold=4.655e+02, percent-clipped=1.0 2024-09-17 16:09:02,355 INFO [train.py:1198] (1/2) Epoch 35, batch 3950, loss[loss=0.2689, ctc_loss=0.1819, cr_loss=0.4349, over 19935.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3718, over 4098541.35 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:09:04,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=626841.0, ans=0.0 2024-09-17 16:09:13,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=626841.0, ans=0.02 2024-09-17 16:09:22,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=626869.3333333334, ans=0.0 2024-09-17 16:09:31,201 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:09:56,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=626926.0, ans=0.2 2024-09-17 16:09:56,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=626926.0, ans=0.2 2024-09-17 16:10:14,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-09-17 16:10:15,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=626954.3333333334, ans=0.125 2024-09-17 16:10:21,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626954.3333333334, ans=0.125 2024-09-17 16:10:23,792 INFO [train.py:1198] (1/2) Epoch 35, batch 4000, loss[loss=0.203, ctc_loss=0.1337, cr_loss=0.3467, over 20914.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3728, over 4101148.83 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:10:37,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=627011.0, ans=0.2 2024-09-17 16:10:52,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=627039.3333333334, ans=0.04949747468305833 2024-09-17 16:11:27,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.189e+02 2.287e+02 2.402e+02 3.769e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 16:11:39,266 INFO [train.py:1198] (1/2) Epoch 35, batch 4050, loss[loss=0.251, ctc_loss=0.1663, cr_loss=0.4235, over 20954.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3726, over 4099031.78 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:12:08,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=627181.0, ans=0.2 2024-09-17 16:12:22,509 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:12:28,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627209.3333333334, ans=0.125 2024-09-17 16:12:35,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-17 16:12:49,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=627237.6666666666, ans=0.025 2024-09-17 16:12:52,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=627237.6666666666, ans=0.5 2024-09-17 16:12:55,441 INFO [train.py:1198] (1/2) Epoch 35, batch 4100, loss[loss=0.1892, ctc_loss=0.1249, cr_loss=0.3217, over 20985.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3708, over 4094805.50 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:13:29,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627322.6666666666, ans=0.125 2024-09-17 16:13:59,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.199e+02 2.308e+02 2.462e+02 3.670e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 16:14:07,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-17 16:14:11,469 INFO [train.py:1198] (1/2) Epoch 35, batch 4150, loss[loss=0.2436, ctc_loss=0.1662, cr_loss=0.3868, over 21034.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3723, over 4107222.28 frames. ], batch size: 62, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:14:33,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-09-17 16:14:37,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-17 16:14:48,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=627464.3333333334, ans=0.125 2024-09-17 16:15:17,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=627521.0, ans=0.125 2024-09-17 16:15:17,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=627521.0, ans=0.025 2024-09-17 16:15:17,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=627521.0, ans=0.2 2024-09-17 16:15:30,580 INFO [train.py:1198] (1/2) Epoch 35, batch 4200, loss[loss=0.25, ctc_loss=0.1666, cr_loss=0.4167, over 20979.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3728, over 4089042.06 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:15:35,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627549.3333333334, ans=0.125 2024-09-17 16:15:38,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=627549.3333333334, ans=0.0 2024-09-17 16:16:11,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=627606.0, ans=0.025 2024-09-17 16:16:17,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=627634.3333333334, ans=0.125 2024-09-17 16:16:27,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=627634.3333333334, ans=0.0 2024-09-17 16:16:37,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.171e+02 2.331e+02 2.481e+02 3.624e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-17 16:16:49,853 INFO [train.py:1198] (1/2) Epoch 35, batch 4250, loss[loss=0.2294, ctc_loss=0.1512, cr_loss=0.3907, over 20674.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3729, over 4093844.24 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:16:53,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=627691.0, ans=0.0 2024-09-17 16:16:56,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=627691.0, ans=0.04949747468305833 2024-09-17 16:17:25,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-17 16:17:48,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=627776.0, ans=0.125 2024-09-17 16:18:05,732 INFO [train.py:1198] (1/2) Epoch 35, batch 4300, loss[loss=0.235, ctc_loss=0.1563, cr_loss=0.3932, over 20959.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3727, over 4089532.00 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:18:27,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=627861.0, ans=0.0 2024-09-17 16:18:58,139 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:19:09,821 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.188e+02 2.309e+02 2.518e+02 1.008e+03, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 16:19:21,955 INFO [train.py:1198] (1/2) Epoch 35, batch 4350, loss[loss=0.1892, ctc_loss=0.1249, cr_loss=0.3214, over 21062.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3729, over 4096310.73 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:19:25,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-17 16:20:27,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=628087.6666666666, ans=0.125 2024-09-17 16:20:27,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628087.6666666666, ans=0.125 2024-09-17 16:20:35,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-17 16:20:37,617 INFO [train.py:1198] (1/2) Epoch 35, batch 4400, loss[loss=0.1915, ctc_loss=0.1241, cr_loss=0.3371, over 20979.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3729, over 4104999.39 frames. ], batch size: 48, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:20:47,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=628116.0, ans=0.0 2024-09-17 16:21:17,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628172.6666666666, ans=0.1 2024-09-17 16:21:33,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=628201.0, ans=0.025 2024-09-17 16:21:36,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=628201.0, ans=0.0 2024-09-17 16:21:47,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.193e+02 2.302e+02 2.500e+02 3.112e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 16:21:51,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=628229.3333333334, ans=0.125 2024-09-17 16:21:59,277 INFO [train.py:1198] (1/2) Epoch 35, batch 4450, loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3713, over 21038.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3737, over 4099839.52 frames. ], batch size: 63, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:22:13,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=628286.0, ans=0.125 2024-09-17 16:22:29,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628314.3333333334, ans=0.1 2024-09-17 16:23:14,763 INFO [train.py:1198] (1/2) Epoch 35, batch 4500, loss[loss=0.1955, ctc_loss=0.1275, cr_loss=0.3397, over 20354.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1482, cr_loss=0.3732, over 4096345.74 frames. ], batch size: 45, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:23:21,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=628399.3333333334, ans=0.2 2024-09-17 16:23:30,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628427.6666666666, ans=0.1 2024-09-17 16:23:59,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-17 16:23:59,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-17 16:24:18,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.150e+02 2.320e+02 2.465e+02 3.181e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 16:24:30,734 INFO [train.py:1198] (1/2) Epoch 35, batch 4550, loss[loss=0.248, ctc_loss=0.1658, cr_loss=0.4111, over 21072.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1486, cr_loss=0.3741, over 4100752.43 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:24:38,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=628541.0, ans=0.025 2024-09-17 16:25:31,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=628654.3333333334, ans=0.125 2024-09-17 16:25:46,960 INFO [train.py:1198] (1/2) Epoch 35, batch 4600, loss[loss=0.2253, ctc_loss=0.1496, cr_loss=0.3786, over 21034.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1489, cr_loss=0.3748, over 4107472.65 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:25:59,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=628682.6666666666, ans=0.2 2024-09-17 16:26:17,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628739.3333333334, ans=0.1 2024-09-17 16:26:26,438 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:26:53,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.147e+02 2.321e+02 2.529e+02 3.159e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 16:27:05,500 INFO [train.py:1198] (1/2) Epoch 35, batch 4650, loss[loss=0.2313, ctc_loss=0.1568, cr_loss=0.3726, over 19336.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1489, cr_loss=0.375, over 4106159.29 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:27:26,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=628852.6666666666, ans=0.0 2024-09-17 16:27:29,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=628852.6666666666, ans=0.0 2024-09-17 16:27:40,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=628881.0, ans=0.125 2024-09-17 16:27:45,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=628881.0, ans=0.125 2024-09-17 16:28:23,756 INFO [train.py:1198] (1/2) Epoch 35, batch 4700, loss[loss=0.2326, ctc_loss=0.1556, cr_loss=0.385, over 20973.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3744, over 4105984.44 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:28:49,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628994.3333333334, ans=0.125 2024-09-17 16:29:27,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.147e+02 2.302e+02 2.455e+02 3.164e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 16:29:39,473 INFO [train.py:1198] (1/2) Epoch 35, batch 4750, loss[loss=0.2519, ctc_loss=0.1699, cr_loss=0.4102, over 20843.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1487, cr_loss=0.3739, over 4098522.67 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:29:44,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-17 16:30:01,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-17 16:30:20,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=629164.3333333334, ans=0.125 2024-09-17 16:30:26,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=629192.6666666666, ans=0.125 2024-09-17 16:30:33,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629192.6666666666, ans=0.1 2024-09-17 16:30:36,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629192.6666666666, ans=0.0 2024-09-17 16:30:51,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=629221.0, ans=10.0 2024-09-17 16:30:54,656 INFO [train.py:1198] (1/2) Epoch 35, batch 4800, loss[loss=0.17, ctc_loss=0.1078, cr_loss=0.311, over 20960.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3726, over 4101608.33 frames. ], batch size: 50, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:31:07,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=629249.3333333334, ans=0.125 2024-09-17 16:31:29,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=629306.0, ans=0.2 2024-09-17 16:31:35,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=629306.0, ans=0.0 2024-09-17 16:31:50,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=629334.3333333334, ans=22.5 2024-09-17 16:31:54,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=629362.6666666666, ans=0.09899494936611666 2024-09-17 16:31:58,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.176e+02 2.327e+02 2.496e+02 3.136e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 16:32:10,407 INFO [train.py:1198] (1/2) Epoch 35, batch 4850, loss[loss=0.244, ctc_loss=0.1637, cr_loss=0.4014, over 20865.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3729, over 4080809.02 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:32:29,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=629419.3333333334, ans=0.0 2024-09-17 16:32:40,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=629419.3333333334, ans=10.0 2024-09-17 16:32:45,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-09-17 16:32:46,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=629447.6666666666, ans=0.125 2024-09-17 16:33:32,395 INFO [train.py:1198] (1/2) Epoch 35, batch 4900, loss[loss=0.2187, ctc_loss=0.1445, cr_loss=0.3706, over 20765.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1484, cr_loss=0.3729, over 4082414.90 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:33:54,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=629561.0, ans=0.125 2024-09-17 16:33:54,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=629561.0, ans=0.0 2024-09-17 16:34:09,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=629589.3333333334, ans=0.0 2024-09-17 16:34:22,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=629617.6666666666, ans=0.125 2024-09-17 16:34:35,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.199e+02 2.313e+02 2.504e+02 3.110e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-17 16:34:37,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=629646.0, ans=10.0 2024-09-17 16:34:47,121 INFO [train.py:1198] (1/2) Epoch 35, batch 4950, loss[loss=0.2354, ctc_loss=0.1574, cr_loss=0.3902, over 20705.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3721, over 4089763.36 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:34:55,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=629674.3333333334, ans=0.125 2024-09-17 16:35:11,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=629702.6666666666, ans=0.125 2024-09-17 16:36:01,760 INFO [train.py:1198] (1/2) Epoch 35, batch 5000, loss[loss=0.2155, ctc_loss=0.1399, cr_loss=0.378, over 20984.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3719, over 4070741.10 frames. ], batch size: 51, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:36:51,676 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:36:54,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=629901.0, ans=0.125 2024-09-17 16:36:55,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=629901.0, ans=0.125 2024-09-17 16:37:04,785 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.157e+02 2.310e+02 2.447e+02 8.280e+02, threshold=4.620e+02, percent-clipped=1.0 2024-09-17 16:37:11,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-09-17 16:37:13,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629929.3333333334, ans=0.0 2024-09-17 16:37:16,753 INFO [train.py:1198] (1/2) Epoch 35, batch 5050, loss[loss=0.2111, ctc_loss=0.137, cr_loss=0.3708, over 20296.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3713, over 4083593.60 frames. ], batch size: 45, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:37:31,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=629986.0, ans=0.025 2024-09-17 16:37:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=629986.0, ans=0.0 2024-09-17 16:38:06,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.15 vs. limit=10.0 2024-09-17 16:38:31,015 INFO [train.py:1198] (1/2) Epoch 35, batch 5100, loss[loss=0.1842, ctc_loss=0.1189, cr_loss=0.3267, over 20968.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1481, cr_loss=0.3736, over 4081220.02 frames. ], batch size: 51, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:38:33,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.84 vs. limit=10.0 2024-09-17 16:39:01,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=630156.0, ans=0.0 2024-09-17 16:39:11,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=630156.0, ans=0.0 2024-09-17 16:39:16,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=630184.3333333334, ans=0.2 2024-09-17 16:39:24,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=630184.3333333334, ans=0.125 2024-09-17 16:39:32,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 16:39:33,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.233e+02 2.373e+02 2.558e+02 1.219e+03, threshold=4.746e+02, percent-clipped=1.0 2024-09-17 16:39:43,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-17 16:39:45,568 INFO [train.py:1198] (1/2) Epoch 35, batch 5150, loss[loss=0.2156, ctc_loss=0.1423, cr_loss=0.3665, over 20825.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3734, over 4086698.34 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:40:00,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630269.3333333334, ans=0.1 2024-09-17 16:40:01,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630269.3333333334, ans=0.0 2024-09-17 16:41:00,148 INFO [train.py:1198] (1/2) Epoch 35, batch 5200, loss[loss=0.2377, ctc_loss=0.1583, cr_loss=0.397, over 20981.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3724, over 4081139.92 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:41:58,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=630467.6666666666, ans=0.2 2024-09-17 16:42:03,221 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:42:04,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=630496.0, ans=0.025 2024-09-17 16:42:08,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.162e+02 2.301e+02 2.494e+02 3.436e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 16:42:13,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=630496.0, ans=0.125 2024-09-17 16:42:20,582 INFO [train.py:1198] (1/2) Epoch 35, batch 5250, loss[loss=0.2084, ctc_loss=0.139, cr_loss=0.3471, over 20967.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1474, cr_loss=0.373, over 4097315.61 frames. ], batch size: 50, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:42:31,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=630524.3333333334, ans=0.0 2024-09-17 16:42:46,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=630552.6666666666, ans=0.125 2024-09-17 16:42:55,317 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:43:35,001 INFO [train.py:1198] (1/2) Epoch 35, batch 5300, loss[loss=0.2252, ctc_loss=0.1499, cr_loss=0.3767, over 20439.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1479, cr_loss=0.3735, over 4100137.72 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:43:36,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=630666.0, ans=0.125 2024-09-17 16:43:48,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=630694.3333333334, ans=0.0 2024-09-17 16:44:09,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-17 16:44:18,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630751.0, ans=0.1 2024-09-17 16:44:39,668 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.153e+02 2.272e+02 2.432e+02 4.479e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 16:44:39,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=630779.3333333334, ans=10.0 2024-09-17 16:44:45,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=630779.3333333334, ans=0.125 2024-09-17 16:44:50,145 INFO [train.py:1198] (1/2) Epoch 35, batch 5350, loss[loss=0.2052, ctc_loss=0.1357, cr_loss=0.3474, over 20961.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.3716, over 4103171.32 frames. ], batch size: 50, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:45:07,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=630836.0, ans=0.2 2024-09-17 16:45:35,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=630892.6666666666, ans=0.125 2024-09-17 16:45:41,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=630892.6666666666, ans=0.07 2024-09-17 16:46:03,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=630949.3333333334, ans=0.125 2024-09-17 16:46:04,715 INFO [train.py:1198] (1/2) Epoch 35, batch 5400, loss[loss=0.2547, ctc_loss=0.1721, cr_loss=0.4129, over 19490.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3719, over 4110742.69 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:46:16,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=22.5 2024-09-17 16:46:20,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=630977.6666666666, ans=0.0 2024-09-17 16:46:48,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=631034.3333333334, ans=0.5 2024-09-17 16:47:08,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.185e+02 2.325e+02 2.494e+02 2.975e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 16:47:15,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631062.6666666666, ans=0.125 2024-09-17 16:47:18,617 INFO [train.py:1198] (1/2) Epoch 35, batch 5450, loss[loss=0.2229, ctc_loss=0.1481, cr_loss=0.3738, over 20984.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3714, over 4114600.22 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:48:10,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631176.0, ans=0.125 2024-09-17 16:48:32,703 INFO [train.py:1198] (1/2) Epoch 35, batch 5500, loss[loss=0.2629, ctc_loss=0.1796, cr_loss=0.4166, over 19384.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3712, over 4111158.40 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:48:40,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631232.6666666666, ans=0.125 2024-09-17 16:49:36,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.176e+02 2.294e+02 2.454e+02 3.147e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 16:49:47,152 INFO [train.py:1198] (1/2) Epoch 35, batch 5550, loss[loss=0.1953, ctc_loss=0.1288, cr_loss=0.3328, over 20981.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3718, over 4096064.50 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:50:03,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=631402.6666666666, ans=0.0 2024-09-17 16:50:16,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=631431.0, ans=0.0 2024-09-17 16:50:30,088 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:50:30,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631431.0, ans=0.1 2024-09-17 16:51:05,961 INFO [train.py:1198] (1/2) Epoch 35, batch 5600, loss[loss=0.2264, ctc_loss=0.1507, cr_loss=0.3787, over 21087.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3728, over 4087854.81 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:51:08,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-17 16:51:41,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631572.6666666666, ans=0.125 2024-09-17 16:51:45,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=631572.6666666666, ans=0.125 2024-09-17 16:51:53,339 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:51:57,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=631601.0, ans=0.125 2024-09-17 16:52:09,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.232e+02 2.365e+02 2.537e+02 6.394e+02, threshold=4.729e+02, percent-clipped=2.0 2024-09-17 16:52:16,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=631629.3333333334, ans=0.025 2024-09-17 16:52:19,472 INFO [train.py:1198] (1/2) Epoch 35, batch 5650, loss[loss=0.2541, ctc_loss=0.1717, cr_loss=0.4122, over 20820.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3724, over 4096705.19 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:52:29,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=631657.6666666666, ans=0.0 2024-09-17 16:53:30,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631771.0, ans=0.1 2024-09-17 16:53:32,857 INFO [train.py:1198] (1/2) Epoch 35, batch 5700, loss[loss=0.2391, ctc_loss=0.1613, cr_loss=0.3892, over 19315.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3724, over 4085776.64 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:53:51,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631827.6666666666, ans=0.1 2024-09-17 16:53:58,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631827.6666666666, ans=0.125 2024-09-17 16:54:00,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=631827.6666666666, ans=0.2 2024-09-17 16:54:10,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=631856.0, ans=0.125 2024-09-17 16:54:28,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=631884.3333333334, ans=0.125 2024-09-17 16:54:38,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.180e+02 2.294e+02 2.417e+02 3.007e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 16:54:38,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=631912.6666666666, ans=0.2 2024-09-17 16:54:47,646 INFO [train.py:1198] (1/2) Epoch 35, batch 5750, loss[loss=0.1914, ctc_loss=0.1272, cr_loss=0.3213, over 20985.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3738, over 4075536.66 frames. ], batch size: 52, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:54:56,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=631941.0, ans=0.02 2024-09-17 16:55:01,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=631969.3333333334, ans=0.2 2024-09-17 16:55:17,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=631997.6666666666, ans=0.125 2024-09-17 16:55:23,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=631997.6666666666, ans=0.0 2024-09-17 16:55:24,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=631997.6666666666, ans=0.0 2024-09-17 16:56:01,912 INFO [train.py:1198] (1/2) Epoch 35, batch 5800, loss[loss=0.2343, ctc_loss=0.1558, cr_loss=0.3928, over 20266.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1487, cr_loss=0.3742, over 4068120.01 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:56:21,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=632111.0, ans=0.2 2024-09-17 16:56:27,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=632111.0, ans=0.07 2024-09-17 16:57:07,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.185e+02 2.320e+02 2.430e+02 3.087e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 16:57:16,070 INFO [train.py:1198] (1/2) Epoch 35, batch 5850, loss[loss=0.2126, ctc_loss=0.1397, cr_loss=0.3644, over 21048.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1487, cr_loss=0.3736, over 4080233.36 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:57:31,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-17 16:57:43,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=632252.6666666666, ans=0.0 2024-09-17 16:58:22,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=632337.6666666666, ans=0.125 2024-09-17 16:58:30,150 INFO [train.py:1198] (1/2) Epoch 35, batch 5900, loss[loss=0.2639, ctc_loss=0.1794, cr_loss=0.4223, over 18177.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.149, cr_loss=0.3742, over 4084881.09 frames. ], batch size: 108, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:58:52,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-17 16:59:24,426 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:59:40,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.193e+02 2.324e+02 2.486e+02 3.833e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-17 16:59:42,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=632479.3333333334, ans=0.0 2024-09-17 16:59:49,374 INFO [train.py:1198] (1/2) Epoch 35, batch 5950, loss[loss=0.2509, ctc_loss=0.1778, cr_loss=0.3658, over 14910.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3739, over 4093587.11 frames. ], batch size: 150, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 17:00:18,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=632564.3333333334, ans=0.125 2024-09-17 17:00:21,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632564.3333333334, ans=0.1 2024-09-17 17:00:23,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=632564.3333333334, ans=0.5 2024-09-17 17:00:37,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=632592.6666666666, ans=0.025 2024-09-17 17:00:39,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=22.5 2024-09-17 17:00:43,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632592.6666666666, ans=0.1 2024-09-17 17:00:44,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=632592.6666666666, ans=0.0 2024-09-17 17:01:03,875 INFO [train.py:1198] (1/2) Epoch 35, batch 6000, loss[loss=0.2125, ctc_loss=0.1424, cr_loss=0.3508, over 19947.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3737, over 4093317.84 frames. ], batch size: 44, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 17:01:03,875 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:01:24,938 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.04069, ctc_loss=0.04069, cr_loss=1.323e-14, over 944034.00 frames. 2024-09-17 17:01:24,939 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 17:01:28,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-17 17:02:09,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-09-17 17:02:31,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.221e+02 2.307e+02 2.507e+02 5.871e+02, threshold=4.614e+02, percent-clipped=1.0 2024-09-17 17:02:39,827 INFO [train.py:1198] (1/2) Epoch 35, batch 6050, loss[loss=0.2346, ctc_loss=0.1568, cr_loss=0.3892, over 19328.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1487, cr_loss=0.3738, over 4085285.52 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:02:50,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632791.0, ans=0.1 2024-09-17 17:02:50,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=632791.0, ans=0.125 2024-09-17 17:03:22,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=632847.6666666666, ans=0.125 2024-09-17 17:03:24,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=632876.0, ans=0.2 2024-09-17 17:03:43,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632904.3333333334, ans=0.1 2024-09-17 17:03:54,856 INFO [train.py:1198] (1/2) Epoch 35, batch 6100, loss[loss=0.252, ctc_loss=0.169, cr_loss=0.4148, over 20614.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.373, over 4091983.96 frames. ], batch size: 68, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:04:29,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=632989.3333333334, ans=0.0 2024-09-17 17:04:34,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=632989.3333333334, ans=15.0 2024-09-17 17:05:00,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.197e+02 2.290e+02 2.426e+02 3.058e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-17 17:05:09,292 INFO [train.py:1198] (1/2) Epoch 35, batch 6150, loss[loss=0.2349, ctc_loss=0.1557, cr_loss=0.396, over 18158.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.372, over 4091652.06 frames. ], batch size: 108, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:05:14,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633074.3333333334, ans=0.1 2024-09-17 17:05:35,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=633102.6666666666, ans=0.035 2024-09-17 17:05:35,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633102.6666666666, ans=0.0 2024-09-17 17:06:22,247 INFO [train.py:1198] (1/2) Epoch 35, batch 6200, loss[loss=0.2458, ctc_loss=0.1605, cr_loss=0.4266, over 20005.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1469, cr_loss=0.3702, over 4084035.31 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:06:24,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633216.0, ans=0.1 2024-09-17 17:06:35,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=633244.3333333334, ans=0.125 2024-09-17 17:06:53,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633272.6666666666, ans=0.0 2024-09-17 17:07:04,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-17 17:07:07,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633301.0, ans=0.1 2024-09-17 17:07:28,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.164e+02 2.343e+02 2.543e+02 3.749e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-17 17:07:36,814 INFO [train.py:1198] (1/2) Epoch 35, batch 6250, loss[loss=0.2328, ctc_loss=0.1549, cr_loss=0.3892, over 20331.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1479, cr_loss=0.3704, over 4022910.74 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:07:53,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=633386.0, ans=0.2 2024-09-17 17:07:58,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-17 17:08:36,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=633471.0, ans=0.125 2024-09-17 17:08:46,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=15.0 2024-09-17 17:08:47,276 INFO [train.py:1198] (1/2) Epoch 35, batch 6300, loss[loss=0.2523, ctc_loss=0.1745, cr_loss=0.3891, over 14880.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1493, cr_loss=0.3708, over 3970365.97 frames. ], batch size: 150, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:08:53,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633499.3333333334, ans=0.125 2024-09-17 17:08:57,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-17 17:08:59,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633499.3333333334, ans=0.1 2024-09-17 17:09:23,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-17 17:09:50,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.337e+02 2.653e+02 2.907e+02 7.605e+02, threshold=5.307e+02, percent-clipped=2.0 2024-09-17 17:09:58,966 INFO [train.py:1198] (1/2) Epoch 35, batch 6350, loss[loss=0.2528, ctc_loss=0.1731, cr_loss=0.3982, over 14081.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1554, cr_loss=0.3756, over 3750436.08 frames. ], batch size: 149, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:10:03,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=633641.0, ans=0.125 2024-09-17 17:10:06,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=633641.0, ans=0.0 2024-09-17 17:10:21,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=633669.3333333334, ans=0.125 2024-09-17 17:10:45,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=633726.0, ans=0.0 2024-09-17 17:11:45,697 INFO [train.py:1198] (1/2) Epoch 36, batch 0, loss[loss=0.219, ctc_loss=0.1458, cr_loss=0.3662, over 21021.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1458, cr_loss=0.3662, over 21021.00 frames. ], batch size: 63, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:11:45,697 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:12:04,086 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.03966, ctc_loss=0.03966, cr_loss=1.315e-14, over 944034.00 frames. 2024-09-17 17:12:04,086 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 17:12:04,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=633757.1666666666, ans=0.0 2024-09-17 17:12:22,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=633785.5, ans=0.04949747468305833 2024-09-17 17:12:28,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=633785.5, ans=0.2 2024-09-17 17:12:30,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=633785.5, ans=0.0 2024-09-17 17:13:00,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 17:13:22,592 INFO [train.py:1198] (1/2) Epoch 36, batch 50, loss[loss=0.2367, ctc_loss=0.1589, cr_loss=0.3891, over 20855.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1442, cr_loss=0.3664, over 928253.63 frames. ], batch size: 65, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:13:27,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.249e+02 2.539e+02 2.784e+02 3.715e+02, threshold=5.077e+02, percent-clipped=0.0 2024-09-17 17:13:28,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-09-17 17:13:47,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633927.1666666666, ans=0.1 2024-09-17 17:13:57,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=633955.5, ans=0.125 2024-09-17 17:14:28,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=634012.1666666666, ans=0.125 2024-09-17 17:14:30,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-17 17:14:31,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634012.1666666666, ans=0.1 2024-09-17 17:14:41,116 INFO [train.py:1198] (1/2) Epoch 36, batch 100, loss[loss=0.2083, ctc_loss=0.1351, cr_loss=0.3662, over 21030.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.146, cr_loss=0.3696, over 1622062.76 frames. ], batch size: 62, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:15:09,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=634068.8333333334, ans=0.125 2024-09-17 17:15:24,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=634097.1666666666, ans=0.0 2024-09-17 17:15:53,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=634153.8333333334, ans=0.125 2024-09-17 17:15:57,382 INFO [train.py:1198] (1/2) Epoch 36, batch 150, loss[loss=0.2109, ctc_loss=0.141, cr_loss=0.3497, over 20802.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.3727, over 2165673.82 frames. ], batch size: 53, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:16:01,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.167e+02 2.323e+02 2.461e+02 3.967e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 17:16:09,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634182.1666666666, ans=0.1 2024-09-17 17:16:12,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=634210.5, ans=0.125 2024-09-17 17:16:42,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-17 17:16:46,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=634267.1666666666, ans=0.125 2024-09-17 17:17:12,034 INFO [train.py:1198] (1/2) Epoch 36, batch 200, loss[loss=0.2305, ctc_loss=0.152, cr_loss=0.3927, over 20834.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1482, cr_loss=0.3733, over 2596150.90 frames. ], batch size: 59, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:17:24,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=634323.8333333334, ans=0.0 2024-09-17 17:17:59,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=634408.8333333334, ans=0.125 2024-09-17 17:18:27,600 INFO [train.py:1198] (1/2) Epoch 36, batch 250, loss[loss=0.1981, ctc_loss=0.1263, cr_loss=0.3591, over 20987.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3715, over 2938940.76 frames. ], batch size: 51, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:18:32,107 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.213e+02 2.286e+02 2.476e+02 3.170e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 17:18:33,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634465.5, ans=0.1 2024-09-17 17:19:24,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-17 17:19:50,546 INFO [train.py:1198] (1/2) Epoch 36, batch 300, loss[loss=0.2345, ctc_loss=0.154, cr_loss=0.4026, over 20842.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3726, over 3196989.79 frames. ], batch size: 65, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:20:19,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=634663.8333333334, ans=0.125 2024-09-17 17:21:07,293 INFO [train.py:1198] (1/2) Epoch 36, batch 350, loss[loss=0.2388, ctc_loss=0.1602, cr_loss=0.3934, over 20671.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3709, over 3398490.45 frames. ], batch size: 68, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:21:11,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.178e+02 2.288e+02 2.402e+02 5.671e+02, threshold=4.576e+02, percent-clipped=2.0 2024-09-17 17:21:32,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=12.0 2024-09-17 17:22:03,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-09-17 17:22:09,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-17 17:22:09,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=634862.1666666666, ans=0.2 2024-09-17 17:22:17,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634862.1666666666, ans=0.1 2024-09-17 17:22:23,214 INFO [train.py:1198] (1/2) Epoch 36, batch 400, loss[loss=0.2243, ctc_loss=0.1475, cr_loss=0.384, over 20785.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3714, over 3544598.47 frames. ], batch size: 56, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:22:51,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=634947.1666666666, ans=0.0 2024-09-17 17:22:52,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-17 17:22:56,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=634947.1666666666, ans=0.0 2024-09-17 17:23:23,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=635003.8333333334, ans=0.2 2024-09-17 17:23:38,086 INFO [train.py:1198] (1/2) Epoch 36, batch 450, loss[loss=0.2535, ctc_loss=0.1723, cr_loss=0.4057, over 20871.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3716, over 3669019.78 frames. ], batch size: 65, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:23:42,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.198e+02 2.332e+02 2.435e+02 3.393e+02, threshold=4.663e+02, percent-clipped=0.0 2024-09-17 17:24:01,222 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:24:30,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=635117.1666666666, ans=0.0 2024-09-17 17:24:33,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=635117.1666666666, ans=0.025 2024-09-17 17:24:45,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=635145.5, ans=0.025 2024-09-17 17:24:57,173 INFO [train.py:1198] (1/2) Epoch 36, batch 500, loss[loss=0.2601, ctc_loss=0.1859, cr_loss=0.3709, over 14539.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3714, over 3752990.84 frames. ], batch size: 149, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:25:13,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.69 vs. limit=15.0 2024-09-17 17:25:24,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=635202.1666666666, ans=0.125 2024-09-17 17:26:16,180 INFO [train.py:1198] (1/2) Epoch 36, batch 550, loss[loss=0.1978, ctc_loss=0.1309, cr_loss=0.3348, over 20972.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3723, over 3823039.30 frames. ], batch size: 51, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:26:20,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-09-17 17:26:20,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.232e+02 2.420e+02 2.696e+02 5.768e+02, threshold=4.841e+02, percent-clipped=1.0 2024-09-17 17:26:35,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2024-09-17 17:26:51,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=635372.1666666666, ans=0.125 2024-09-17 17:27:09,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=635400.5, ans=0.025 2024-09-17 17:27:18,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-17 17:27:22,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=635428.8333333334, ans=0.0 2024-09-17 17:27:31,069 INFO [train.py:1198] (1/2) Epoch 36, batch 600, loss[loss=0.2018, ctc_loss=0.1336, cr_loss=0.341, over 20949.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3721, over 3875480.44 frames. ], batch size: 49, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:27:32,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635457.1666666666, ans=0.125 2024-09-17 17:27:33,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=22.5 2024-09-17 17:27:53,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635485.5, ans=0.1 2024-09-17 17:28:06,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635513.8333333334, ans=0.125 2024-09-17 17:28:27,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=635542.1666666666, ans=0.0 2024-09-17 17:28:46,639 INFO [train.py:1198] (1/2) Epoch 36, batch 650, loss[loss=0.245, ctc_loss=0.1639, cr_loss=0.4055, over 19445.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3729, over 3916686.78 frames. ], batch size: 90, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:28:48,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=635598.8333333334, ans=0.125 2024-09-17 17:28:51,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.204e+02 2.299e+02 2.418e+02 3.202e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 17:28:54,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635598.8333333334, ans=0.1 2024-09-17 17:29:05,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=635627.1666666666, ans=0.125 2024-09-17 17:29:23,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635655.5, ans=0.1 2024-09-17 17:29:29,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=635655.5, ans=0.2 2024-09-17 17:29:40,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=635683.8333333334, ans=0.2 2024-09-17 17:29:54,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635712.1666666666, ans=0.1 2024-09-17 17:29:56,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=635712.1666666666, ans=15.0 2024-09-17 17:29:59,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=635712.1666666666, ans=0.125 2024-09-17 17:30:02,054 INFO [train.py:1198] (1/2) Epoch 36, batch 700, loss[loss=0.1867, ctc_loss=0.1224, cr_loss=0.3214, over 20947.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3722, over 3959223.80 frames. ], batch size: 49, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:30:15,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=635740.5, ans=0.5 2024-09-17 17:30:37,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=635797.1666666666, ans=0.2 2024-09-17 17:31:24,110 INFO [train.py:1198] (1/2) Epoch 36, batch 750, loss[loss=0.2041, ctc_loss=0.1329, cr_loss=0.3562, over 20998.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3725, over 3987989.50 frames. ], batch size: 63, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:31:28,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.158e+02 2.270e+02 2.492e+02 3.770e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-17 17:31:32,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-09-17 17:32:07,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=635967.1666666666, ans=0.0 2024-09-17 17:32:39,658 INFO [train.py:1198] (1/2) Epoch 36, batch 800, loss[loss=0.2542, ctc_loss=0.1706, cr_loss=0.4183, over 20679.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3723, over 4017269.67 frames. ], batch size: 71, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:32:40,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-17 17:32:45,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=636023.8333333334, ans=0.05 2024-09-17 17:32:53,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=636052.1666666666, ans=0.0 2024-09-17 17:33:07,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=636052.1666666666, ans=0.125 2024-09-17 17:33:36,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=22.5 2024-09-17 17:33:55,073 INFO [train.py:1198] (1/2) Epoch 36, batch 850, loss[loss=0.25, ctc_loss=0.1682, cr_loss=0.4091, over 20886.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3726, over 4039664.64 frames. ], batch size: 57, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:33:59,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.282e+02 2.370e+02 2.547e+02 3.476e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-17 17:34:03,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2024-09-17 17:34:10,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=636193.8333333334, ans=0.2 2024-09-17 17:34:13,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=636193.8333333334, ans=0.5 2024-09-17 17:34:16,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=636193.8333333334, ans=15.0 2024-09-17 17:35:09,563 INFO [train.py:1198] (1/2) Epoch 36, batch 900, loss[loss=0.2012, ctc_loss=0.1327, cr_loss=0.3426, over 20781.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3728, over 4049756.14 frames. ], batch size: 53, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:35:13,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=636307.1666666666, ans=15.0 2024-09-17 17:35:53,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=636363.8333333334, ans=0.0 2024-09-17 17:35:56,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=636392.1666666666, ans=0.0 2024-09-17 17:35:58,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636392.1666666666, ans=0.125 2024-09-17 17:36:16,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=636420.5, ans=0.035 2024-09-17 17:36:29,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636448.8333333334, ans=0.1 2024-09-17 17:36:31,062 INFO [train.py:1198] (1/2) Epoch 36, batch 950, loss[loss=0.2416, ctc_loss=0.16, cr_loss=0.408, over 20321.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3721, over 4065499.29 frames. ], batch size: 74, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:36:33,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=22.5 2024-09-17 17:36:35,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.181e+02 2.297e+02 2.436e+02 3.143e+02, threshold=4.594e+02, percent-clipped=0.0 2024-09-17 17:36:37,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636448.8333333334, ans=0.125 2024-09-17 17:37:19,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636533.8333333334, ans=0.1 2024-09-17 17:37:46,169 INFO [train.py:1198] (1/2) Epoch 36, batch 1000, loss[loss=0.2059, ctc_loss=0.1351, cr_loss=0.3538, over 20784.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3733, over 4050016.02 frames. ], batch size: 53, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:37:49,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=636590.5, ans=0.0 2024-09-17 17:38:08,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=636618.8333333334, ans=0.125 2024-09-17 17:38:08,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=636618.8333333334, ans=0.09899494936611666 2024-09-17 17:38:17,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=636647.1666666666, ans=0.125 2024-09-17 17:38:26,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=636647.1666666666, ans=0.2 2024-09-17 17:38:39,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=636675.5, ans=0.125 2024-09-17 17:38:55,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.29 vs. limit=6.0 2024-09-17 17:39:00,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=636732.1666666666, ans=0.125 2024-09-17 17:39:02,026 INFO [train.py:1198] (1/2) Epoch 36, batch 1050, loss[loss=0.191, ctc_loss=0.1241, cr_loss=0.3347, over 20943.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.373, over 4061731.60 frames. ], batch size: 49, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:39:05,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=636732.1666666666, ans=0.04949747468305833 2024-09-17 17:39:06,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.166e+02 2.296e+02 2.442e+02 3.647e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-17 17:39:08,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=22.5 2024-09-17 17:39:58,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636817.1666666666, ans=0.1 2024-09-17 17:40:17,851 INFO [train.py:1198] (1/2) Epoch 36, batch 1100, loss[loss=0.2057, ctc_loss=0.1342, cr_loss=0.3576, over 20798.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3712, over 4080751.75 frames. ], batch size: 53, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:40:58,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=636930.5, ans=0.125 2024-09-17 17:41:14,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=636958.8333333334, ans=10.0 2024-09-17 17:41:36,398 INFO [train.py:1198] (1/2) Epoch 36, batch 1150, loss[loss=0.251, ctc_loss=0.1718, cr_loss=0.3962, over 20014.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3726, over 4078780.57 frames. ], batch size: 80, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:41:41,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.186e+02 2.348e+02 2.544e+02 5.644e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-17 17:42:10,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=637072.1666666666, ans=0.025 2024-09-17 17:42:35,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-17 17:42:54,661 INFO [train.py:1198] (1/2) Epoch 36, batch 1200, loss[loss=0.2174, ctc_loss=0.1425, cr_loss=0.3745, over 20717.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.372, over 4091159.48 frames. ], batch size: 68, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:43:15,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=637185.5, ans=0.0 2024-09-17 17:43:35,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=637213.8333333334, ans=0.125 2024-09-17 17:43:47,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-17 17:44:10,022 INFO [train.py:1198] (1/2) Epoch 36, batch 1250, loss[loss=0.1913, ctc_loss=0.1236, cr_loss=0.3384, over 19965.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3703, over 4091139.23 frames. ], batch size: 44, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:44:15,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.209e+02 2.351e+02 2.569e+02 5.360e+02, threshold=4.702e+02, percent-clipped=1.0 2024-09-17 17:45:04,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=637383.8333333334, ans=0.0 2024-09-17 17:45:21,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=637412.1666666666, ans=0.0 2024-09-17 17:45:25,427 INFO [train.py:1198] (1/2) Epoch 36, batch 1300, loss[loss=0.2516, ctc_loss=0.1692, cr_loss=0.4124, over 20695.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3716, over 4089531.08 frames. ], batch size: 68, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:45:57,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=637497.1666666666, ans=0.2 2024-09-17 17:46:31,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-17 17:46:34,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-17 17:46:40,603 INFO [train.py:1198] (1/2) Epoch 36, batch 1350, loss[loss=0.2304, ctc_loss=0.1506, cr_loss=0.399, over 20763.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.147, cr_loss=0.3725, over 4082107.69 frames. ], batch size: 56, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:46:43,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=637582.1666666666, ans=0.125 2024-09-17 17:46:46,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.218e+02 2.362e+02 2.528e+02 8.573e+02, threshold=4.725e+02, percent-clipped=2.0 2024-09-17 17:46:48,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637582.1666666666, ans=0.1 2024-09-17 17:46:53,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=637582.1666666666, ans=0.0 2024-09-17 17:47:23,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637638.8333333334, ans=0.0 2024-09-17 17:47:23,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-17 17:47:55,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=637695.5, ans=0.2 2024-09-17 17:48:02,120 INFO [train.py:1198] (1/2) Epoch 36, batch 1400, loss[loss=0.2304, ctc_loss=0.1556, cr_loss=0.3739, over 20582.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3711, over 4091287.12 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:48:09,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=637723.8333333334, ans=0.125 2024-09-17 17:48:34,316 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:48:41,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2024-09-17 17:48:44,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=637780.5, ans=0.125 2024-09-17 17:49:03,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=637837.1666666666, ans=0.0 2024-09-17 17:49:03,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=22.5 2024-09-17 17:49:10,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=637837.1666666666, ans=0.0 2024-09-17 17:49:18,009 INFO [train.py:1198] (1/2) Epoch 36, batch 1450, loss[loss=0.2306, ctc_loss=0.1505, cr_loss=0.4005, over 20957.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3698, over 4105395.99 frames. ], batch size: 58, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:49:23,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.197e+02 2.319e+02 2.514e+02 4.162e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 17:50:18,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=637978.8333333334, ans=0.0 2024-09-17 17:50:31,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-09-17 17:50:32,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=638007.1666666666, ans=0.125 2024-09-17 17:50:33,610 INFO [train.py:1198] (1/2) Epoch 36, batch 1500, loss[loss=0.2199, ctc_loss=0.1486, cr_loss=0.3567, over 21043.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3708, over 4087111.15 frames. ], batch size: 62, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:50:58,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=638035.5, ans=0.2 2024-09-17 17:50:59,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=638035.5, ans=0.2 2024-09-17 17:51:13,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=638063.8333333334, ans=0.2 2024-09-17 17:51:33,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=638120.5, ans=0.015 2024-09-17 17:51:33,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=638120.5, ans=0.0 2024-09-17 17:51:36,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=638120.5, ans=0.125 2024-09-17 17:51:43,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=638120.5, ans=0.0 2024-09-17 17:51:49,377 INFO [train.py:1198] (1/2) Epoch 36, batch 1550, loss[loss=0.2724, ctc_loss=0.1936, cr_loss=0.3936, over 14161.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3722, over 4092646.80 frames. ], batch size: 150, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:51:55,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.219e+02 2.367e+02 2.541e+02 4.127e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-17 17:51:55,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=638148.8333333334, ans=0.0 2024-09-17 17:52:07,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=638177.1666666666, ans=0.0 2024-09-17 17:52:13,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=638177.1666666666, ans=0.125 2024-09-17 17:53:08,201 INFO [train.py:1198] (1/2) Epoch 36, batch 1600, loss[loss=0.1988, ctc_loss=0.1288, cr_loss=0.3501, over 20962.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3721, over 4080697.80 frames. ], batch size: 51, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:53:20,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638290.5, ans=0.1 2024-09-17 17:53:27,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-17 17:53:35,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=638318.8333333334, ans=0.125 2024-09-17 17:53:35,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=638318.8333333334, ans=0.125 2024-09-17 17:53:43,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638347.1666666666, ans=0.1 2024-09-17 17:54:05,637 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:54:13,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638403.8333333334, ans=0.1 2024-09-17 17:54:26,792 INFO [train.py:1198] (1/2) Epoch 36, batch 1650, loss[loss=0.2582, ctc_loss=0.1745, cr_loss=0.4186, over 20851.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3724, over 4095102.45 frames. ], batch size: 65, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:54:34,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.176e+02 2.298e+02 2.478e+02 3.076e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-17 17:54:51,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638460.5, ans=0.1 2024-09-17 17:55:02,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=638488.8333333334, ans=0.025 2024-09-17 17:55:12,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=638517.1666666666, ans=0.125 2024-09-17 17:55:18,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=638517.1666666666, ans=0.2 2024-09-17 17:55:26,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=22.5 2024-09-17 17:55:42,277 INFO [train.py:1198] (1/2) Epoch 36, batch 1700, loss[loss=0.2536, ctc_loss=0.1701, cr_loss=0.4176, over 20830.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1477, cr_loss=0.3728, over 4097205.17 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:56:01,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-17 17:56:57,792 INFO [train.py:1198] (1/2) Epoch 36, batch 1750, loss[loss=0.2394, ctc_loss=0.1593, cr_loss=0.4006, over 21022.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.3723, over 4080582.30 frames. ], batch size: 62, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:56:58,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-09-17 17:57:04,998 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.182e+02 2.352e+02 2.498e+02 4.393e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-17 17:57:40,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=22.5 2024-09-17 17:57:49,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=638800.5, ans=0.125 2024-09-17 17:58:12,674 INFO [train.py:1198] (1/2) Epoch 36, batch 1800, loss[loss=0.2022, ctc_loss=0.1317, cr_loss=0.3524, over 21051.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3732, over 4082630.70 frames. ], batch size: 56, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:58:14,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638857.1666666666, ans=0.125 2024-09-17 17:58:25,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=638857.1666666666, ans=0.125 2024-09-17 17:58:57,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=638913.8333333334, ans=0.125 2024-09-17 17:59:33,946 INFO [train.py:1198] (1/2) Epoch 36, batch 1850, loss[loss=0.1629, ctc_loss=0.1059, cr_loss=0.2846, over 20971.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4098899.04 frames. ], batch size: 51, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:59:36,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-09-17 17:59:41,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.201e+02 2.316e+02 2.445e+02 2.993e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-17 18:00:14,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-17 18:00:18,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639083.8333333334, ans=0.125 2024-09-17 18:00:31,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=639083.8333333334, ans=0.125 2024-09-17 18:00:49,626 INFO [train.py:1198] (1/2) Epoch 36, batch 1900, loss[loss=0.2408, ctc_loss=0.1582, cr_loss=0.4128, over 20251.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3735, over 4104107.32 frames. ], batch size: 74, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 18:01:01,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639140.5, ans=0.1 2024-09-17 18:01:03,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=639168.8333333334, ans=0.0 2024-09-17 18:01:06,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=639168.8333333334, ans=0.125 2024-09-17 18:01:15,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=639168.8333333334, ans=0.0 2024-09-17 18:01:21,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639197.1666666666, ans=0.1 2024-09-17 18:01:36,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=639225.5, ans=0.125 2024-09-17 18:02:05,112 INFO [train.py:1198] (1/2) Epoch 36, batch 1950, loss[loss=0.2411, ctc_loss=0.1611, cr_loss=0.3996, over 20222.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3726, over 4102347.23 frames. ], batch size: 80, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 18:02:12,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.217e+02 2.351e+02 2.530e+02 3.464e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-17 18:02:21,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=639310.5, ans=0.125 2024-09-17 18:02:56,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639367.1666666666, ans=0.1 2024-09-17 18:02:56,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=639367.1666666666, ans=0.2 2024-09-17 18:03:20,128 INFO [train.py:1198] (1/2) Epoch 36, batch 2000, loss[loss=0.1946, ctc_loss=0.1272, cr_loss=0.3367, over 21069.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.3728, over 4091459.82 frames. ], batch size: 62, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:03:35,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=639452.1666666666, ans=0.125 2024-09-17 18:04:05,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639508.8333333334, ans=0.1 2024-09-17 18:04:25,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 18:04:38,329 INFO [train.py:1198] (1/2) Epoch 36, batch 2050, loss[loss=0.2322, ctc_loss=0.155, cr_loss=0.3862, over 20971.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1482, cr_loss=0.3737, over 4100425.15 frames. ], batch size: 58, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:04:45,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.231e+02 2.362e+02 2.541e+02 3.290e+02, threshold=4.725e+02, percent-clipped=0.0 2024-09-17 18:04:59,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639593.8333333334, ans=0.1 2024-09-17 18:05:05,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=639593.8333333334, ans=0.125 2024-09-17 18:05:21,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=639622.1666666666, ans=0.0 2024-09-17 18:05:36,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=639650.5, ans=15.0 2024-09-17 18:05:43,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639678.8333333334, ans=0.125 2024-09-17 18:05:53,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=639678.8333333334, ans=0.125 2024-09-17 18:05:56,844 INFO [train.py:1198] (1/2) Epoch 36, batch 2100, loss[loss=0.2325, ctc_loss=0.1517, cr_loss=0.4038, over 20998.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1487, cr_loss=0.3749, over 4092883.92 frames. ], batch size: 63, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:06:37,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=639763.8333333334, ans=0.0 2024-09-17 18:06:37,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=639763.8333333334, ans=0.125 2024-09-17 18:07:01,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=639820.5, ans=0.125 2024-09-17 18:07:13,013 INFO [train.py:1198] (1/2) Epoch 36, batch 2150, loss[loss=0.2403, ctc_loss=0.1624, cr_loss=0.3898, over 20065.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1486, cr_loss=0.3744, over 4090322.48 frames. ], batch size: 80, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:07:19,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=639848.8333333334, ans=0.125 2024-09-17 18:07:20,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.192e+02 2.324e+02 2.492e+02 3.311e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 18:07:22,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=639848.8333333334, ans=0.125 2024-09-17 18:07:32,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2024-09-17 18:07:34,680 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:07:40,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=639877.1666666666, ans=0.125 2024-09-17 18:07:40,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=639877.1666666666, ans=0.025 2024-09-17 18:07:45,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=639905.5, ans=0.0 2024-09-17 18:08:16,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=639962.1666666666, ans=0.125 2024-09-17 18:08:28,381 INFO [train.py:1198] (1/2) Epoch 36, batch 2200, loss[loss=0.2023, ctc_loss=0.1321, cr_loss=0.3514, over 20983.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3733, over 4086876.47 frames. ], batch size: 55, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:08:39,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639990.5, ans=0.1 2024-09-17 18:09:13,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-17 18:09:45,576 INFO [train.py:1198] (1/2) Epoch 36, batch 2250, loss[loss=0.2293, ctc_loss=0.1521, cr_loss=0.3863, over 21008.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1487, cr_loss=0.3755, over 4092475.55 frames. ], batch size: 61, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:09:50,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=640132.1666666666, ans=0.2 2024-09-17 18:09:53,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.185e+02 2.390e+02 2.556e+02 6.882e+02, threshold=4.780e+02, percent-clipped=1.0 2024-09-17 18:10:31,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640217.1666666666, ans=0.1 2024-09-17 18:10:45,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640217.1666666666, ans=0.1 2024-09-17 18:10:50,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-17 18:11:03,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640273.8333333334, ans=0.1 2024-09-17 18:11:04,373 INFO [train.py:1198] (1/2) Epoch 36, batch 2300, loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3848, over 19382.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1484, cr_loss=0.3741, over 4092940.26 frames. ], batch size: 90, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:12:20,032 INFO [train.py:1198] (1/2) Epoch 36, batch 2350, loss[loss=0.2509, ctc_loss=0.1721, cr_loss=0.3935, over 20971.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1486, cr_loss=0.3741, over 4085231.03 frames. ], batch size: 64, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:12:27,636 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.254e+02 2.364e+02 2.535e+02 4.607e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-17 18:12:40,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=640443.8333333334, ans=0.125 2024-09-17 18:12:40,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=640443.8333333334, ans=0.025 2024-09-17 18:12:48,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=640443.8333333334, ans=0.0 2024-09-17 18:13:00,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-17 18:13:26,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.02 vs. limit=10.0 2024-09-17 18:13:30,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640528.8333333334, ans=0.1 2024-09-17 18:13:33,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2024-09-17 18:13:35,677 INFO [train.py:1198] (1/2) Epoch 36, batch 2400, loss[loss=0.195, ctc_loss=0.1276, cr_loss=0.3372, over 20880.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3729, over 4061305.30 frames. ], batch size: 54, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:14:29,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=640642.1666666666, ans=0.2 2024-09-17 18:14:46,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=640670.5, ans=0.0 2024-09-17 18:14:51,155 INFO [train.py:1198] (1/2) Epoch 36, batch 2450, loss[loss=0.2054, ctc_loss=0.1335, cr_loss=0.3591, over 20826.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.371, over 4076886.92 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:14:58,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.183e+02 2.280e+02 2.456e+02 3.188e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-17 18:16:09,482 INFO [train.py:1198] (1/2) Epoch 36, batch 2500, loss[loss=0.1796, ctc_loss=0.1158, cr_loss=0.3187, over 21009.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3714, over 4078912.09 frames. ], batch size: 52, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:16:31,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640868.8333333334, ans=0.1 2024-09-17 18:16:46,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=640897.1666666666, ans=0.125 2024-09-17 18:17:27,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=640982.1666666666, ans=0.0 2024-09-17 18:17:28,211 INFO [train.py:1198] (1/2) Epoch 36, batch 2550, loss[loss=0.1891, ctc_loss=0.1254, cr_loss=0.3186, over 20898.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.147, cr_loss=0.3712, over 4088997.61 frames. ], batch size: 54, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:17:34,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=640982.1666666666, ans=0.125 2024-09-17 18:17:35,841 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.184e+02 2.392e+02 2.564e+02 5.258e+02, threshold=4.785e+02, percent-clipped=1.0 2024-09-17 18:18:23,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=641067.1666666666, ans=0.025 2024-09-17 18:18:32,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=641095.5, ans=0.125 2024-09-17 18:18:43,484 INFO [train.py:1198] (1/2) Epoch 36, batch 2600, loss[loss=0.2364, ctc_loss=0.1546, cr_loss=0.4088, over 20991.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3713, over 4091545.65 frames. ], batch size: 63, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:18:43,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641123.8333333334, ans=0.0 2024-09-17 18:18:55,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=641123.8333333334, ans=0.125 2024-09-17 18:19:03,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641152.1666666666, ans=0.1 2024-09-17 18:19:53,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=641237.1666666666, ans=0.125 2024-09-17 18:19:59,241 INFO [train.py:1198] (1/2) Epoch 36, batch 2650, loss[loss=0.2265, ctc_loss=0.1527, cr_loss=0.3694, over 20870.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1472, cr_loss=0.372, over 4102554.95 frames. ], batch size: 57, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:20:06,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.150e+02 2.308e+02 2.486e+02 2.992e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-17 18:20:37,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=641322.1666666666, ans=0.0 2024-09-17 18:21:18,080 INFO [train.py:1198] (1/2) Epoch 36, batch 2700, loss[loss=0.2467, ctc_loss=0.1638, cr_loss=0.4145, over 21028.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3713, over 4099064.63 frames. ], batch size: 63, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:21:21,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=641407.1666666666, ans=0.1 2024-09-17 18:21:21,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=641407.1666666666, ans=0.025 2024-09-17 18:22:12,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641492.1666666666, ans=0.1 2024-09-17 18:22:36,185 INFO [train.py:1198] (1/2) Epoch 36, batch 2750, loss[loss=0.2168, ctc_loss=0.1447, cr_loss=0.3609, over 21043.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3711, over 4087170.82 frames. ], batch size: 62, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:22:38,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=641548.8333333334, ans=0.0 2024-09-17 18:22:41,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-17 18:22:43,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.214e+02 2.373e+02 2.514e+02 4.691e+02, threshold=4.747e+02, percent-clipped=1.0 2024-09-17 18:22:52,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641577.1666666666, ans=0.1 2024-09-17 18:22:56,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=641577.1666666666, ans=0.0 2024-09-17 18:23:51,123 INFO [train.py:1198] (1/2) Epoch 36, batch 2800, loss[loss=0.2249, ctc_loss=0.1478, cr_loss=0.3854, over 20976.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1477, cr_loss=0.3728, over 4087591.08 frames. ], batch size: 55, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:24:03,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=641690.5, ans=0.125 2024-09-17 18:24:38,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=641775.5, ans=0.0 2024-09-17 18:24:48,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-17 18:25:06,323 INFO [train.py:1198] (1/2) Epoch 36, batch 2850, loss[loss=0.2487, ctc_loss=0.1613, cr_loss=0.4373, over 21016.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.373, over 4078606.62 frames. ], batch size: 61, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:25:11,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=641832.1666666666, ans=0.2 2024-09-17 18:25:14,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.193e+02 2.376e+02 2.552e+02 4.510e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-17 18:25:25,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-17 18:25:42,069 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:25:44,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=641888.8333333334, ans=0.0 2024-09-17 18:25:45,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-17 18:26:22,568 INFO [train.py:1198] (1/2) Epoch 36, batch 2900, loss[loss=0.198, ctc_loss=0.1307, cr_loss=0.3366, over 20969.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3712, over 4089055.27 frames. ], batch size: 49, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:26:39,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=642002.1666666666, ans=0.025 2024-09-17 18:26:51,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=642002.1666666666, ans=0.2 2024-09-17 18:27:23,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=642058.8333333334, ans=0.0 2024-09-17 18:27:26,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=642087.1666666666, ans=0.0 2024-09-17 18:27:44,325 INFO [train.py:1198] (1/2) Epoch 36, batch 2950, loss[loss=0.1887, ctc_loss=0.1219, cr_loss=0.334, over 21003.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3711, over 4082981.21 frames. ], batch size: 52, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:27:51,747 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.169e+02 2.279e+02 2.477e+02 2.946e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 18:27:57,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=22.5 2024-09-17 18:28:14,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=22.5 2024-09-17 18:28:32,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=642200.5, ans=0.025 2024-09-17 18:28:37,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=642200.5, ans=0.0 2024-09-17 18:28:50,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642228.8333333334, ans=0.125 2024-09-17 18:28:59,420 INFO [train.py:1198] (1/2) Epoch 36, batch 3000, loss[loss=0.2441, ctc_loss=0.1639, cr_loss=0.4007, over 20686.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3713, over 4094435.41 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:28:59,420 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 18:29:19,013 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.03997, ctc_loss=0.03997, cr_loss=1.39e-14, over 944034.00 frames. 2024-09-17 18:29:19,013 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 18:29:28,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=642257.1666666666, ans=0.0 2024-09-17 18:29:39,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-17 18:29:59,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=642313.8333333334, ans=0.0 2024-09-17 18:30:06,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=642342.1666666666, ans=0.125 2024-09-17 18:30:35,392 INFO [train.py:1198] (1/2) Epoch 36, batch 3050, loss[loss=0.2647, ctc_loss=0.1775, cr_loss=0.4364, over 18412.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3716, over 4091083.20 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:30:40,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=642398.8333333334, ans=0.125 2024-09-17 18:30:42,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.224e+02 2.334e+02 2.457e+02 5.651e+02, threshold=4.668e+02, percent-clipped=1.0 2024-09-17 18:30:58,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-09-17 18:31:25,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=642483.8333333334, ans=0.0 2024-09-17 18:31:30,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=642483.8333333334, ans=0.04949747468305833 2024-09-17 18:31:50,894 INFO [train.py:1198] (1/2) Epoch 36, batch 3100, loss[loss=0.2158, ctc_loss=0.1427, cr_loss=0.3654, over 21081.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3704, over 4090891.71 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:31:54,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642540.5, ans=0.1 2024-09-17 18:32:11,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-17 18:32:24,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=642597.1666666666, ans=0.125 2024-09-17 18:33:12,844 INFO [train.py:1198] (1/2) Epoch 36, batch 3150, loss[loss=0.1857, ctc_loss=0.1188, cr_loss=0.3342, over 20959.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1461, cr_loss=0.3696, over 4091994.55 frames. ], batch size: 49, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:33:19,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=642682.1666666666, ans=0.0 2024-09-17 18:33:20,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.192e+02 2.310e+02 2.473e+02 3.052e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 18:33:58,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642767.1666666666, ans=0.1 2024-09-17 18:34:17,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-09-17 18:34:28,459 INFO [train.py:1198] (1/2) Epoch 36, batch 3200, loss[loss=0.2385, ctc_loss=0.1593, cr_loss=0.396, over 20840.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1465, cr_loss=0.3698, over 4099582.82 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:34:56,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-09-17 18:35:04,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=642880.5, ans=0.125 2024-09-17 18:35:44,105 INFO [train.py:1198] (1/2) Epoch 36, batch 3250, loss[loss=0.2515, ctc_loss=0.173, cr_loss=0.3926, over 14412.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1464, cr_loss=0.3705, over 4101792.57 frames. ], batch size: 150, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:35:51,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.200e+02 2.311e+02 2.459e+02 3.830e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 18:35:52,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642965.5, ans=0.1 2024-09-17 18:36:40,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=643050.5, ans=0.0 2024-09-17 18:36:59,139 INFO [train.py:1198] (1/2) Epoch 36, batch 3300, loss[loss=0.2186, ctc_loss=0.1431, cr_loss=0.3775, over 20995.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1479, cr_loss=0.3737, over 4101808.48 frames. ], batch size: 55, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:36:59,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643107.1666666666, ans=0.1 2024-09-17 18:36:59,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-17 18:37:10,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=643107.1666666666, ans=0.0 2024-09-17 18:37:14,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643135.5, ans=0.125 2024-09-17 18:37:19,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-17 18:38:17,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=643248.8333333334, ans=0.125 2024-09-17 18:38:18,673 INFO [train.py:1198] (1/2) Epoch 36, batch 3350, loss[loss=0.2079, ctc_loss=0.1382, cr_loss=0.3485, over 21075.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1486, cr_loss=0.3748, over 4089170.92 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:38:26,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.174e+02 2.314e+02 2.461e+02 3.888e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-17 18:38:35,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=643277.1666666666, ans=0.125 2024-09-17 18:38:47,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=643305.5, ans=0.0 2024-09-17 18:38:49,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643305.5, ans=0.1 2024-09-17 18:39:06,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643333.8333333334, ans=0.1 2024-09-17 18:39:11,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643333.8333333334, ans=0.125 2024-09-17 18:39:13,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=643333.8333333334, ans=0.2 2024-09-17 18:39:33,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643362.1666666666, ans=0.125 2024-09-17 18:39:37,385 INFO [train.py:1198] (1/2) Epoch 36, batch 3400, loss[loss=0.2691, ctc_loss=0.1786, cr_loss=0.4526, over 20863.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1482, cr_loss=0.3748, over 4089634.39 frames. ], batch size: 65, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:39:41,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=643390.5, ans=0.0 2024-09-17 18:40:14,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 18:40:25,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=643475.5, ans=0.0 2024-09-17 18:40:47,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643503.8333333334, ans=0.125 2024-09-17 18:40:49,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=22.5 2024-09-17 18:40:52,968 INFO [train.py:1198] (1/2) Epoch 36, batch 3450, loss[loss=0.2181, ctc_loss=0.1439, cr_loss=0.3706, over 20776.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1483, cr_loss=0.3749, over 4097942.03 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:41:00,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.180e+02 2.316e+02 2.487e+02 2.991e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 18:41:11,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643560.5, ans=0.1 2024-09-17 18:41:18,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643560.5, ans=0.125 2024-09-17 18:41:20,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=643560.5, ans=0.0 2024-09-17 18:41:29,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=643588.8333333334, ans=0.025 2024-09-17 18:41:48,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-17 18:42:07,683 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:42:08,781 INFO [train.py:1198] (1/2) Epoch 36, batch 3500, loss[loss=0.2352, ctc_loss=0.1604, cr_loss=0.3739, over 20020.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1485, cr_loss=0.3744, over 4088932.01 frames. ], batch size: 80, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:42:09,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=643673.8333333334, ans=0.07 2024-09-17 18:43:13,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=643787.1666666666, ans=0.125 2024-09-17 18:43:14,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=643787.1666666666, ans=0.0 2024-09-17 18:43:15,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=12.0 2024-09-17 18:43:24,384 INFO [train.py:1198] (1/2) Epoch 36, batch 3550, loss[loss=0.192, ctc_loss=0.1242, cr_loss=0.3389, over 19851.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1483, cr_loss=0.3744, over 4101100.63 frames. ], batch size: 44, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:43:31,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.190e+02 2.348e+02 2.508e+02 4.507e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-17 18:43:43,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:44:30,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=643928.8333333334, ans=0.2 2024-09-17 18:44:31,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=12.0 2024-09-17 18:44:45,425 INFO [train.py:1198] (1/2) Epoch 36, batch 3600, loss[loss=0.2085, ctc_loss=0.1352, cr_loss=0.3665, over 21095.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1474, cr_loss=0.3727, over 4110889.70 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:44:57,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643957.1666666666, ans=0.1 2024-09-17 18:45:26,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644013.8333333334, ans=0.1 2024-09-17 18:46:00,823 INFO [train.py:1198] (1/2) Epoch 36, batch 3650, loss[loss=0.1983, ctc_loss=0.1309, cr_loss=0.3368, over 20984.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1481, cr_loss=0.3737, over 4102970.58 frames. ], batch size: 55, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:46:09,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.231e+02 2.394e+02 2.586e+02 5.344e+02, threshold=4.788e+02, percent-clipped=1.0 2024-09-17 18:46:11,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644098.8333333334, ans=0.125 2024-09-17 18:46:22,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=12.0 2024-09-17 18:46:38,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644155.5, ans=0.1 2024-09-17 18:46:46,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644183.8333333334, ans=0.0 2024-09-17 18:47:10,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-09-17 18:47:16,529 INFO [train.py:1198] (1/2) Epoch 36, batch 3700, loss[loss=0.2084, ctc_loss=0.1382, cr_loss=0.3509, over 20944.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3721, over 4093122.19 frames. ], batch size: 50, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:47:48,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=644297.1666666666, ans=0.125 2024-09-17 18:48:18,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=644353.8333333334, ans=0.035 2024-09-17 18:48:24,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=644353.8333333334, ans=0.125 2024-09-17 18:48:29,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=644353.8333333334, ans=0.0 2024-09-17 18:48:29,553 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:48:29,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-17 18:48:32,127 INFO [train.py:1198] (1/2) Epoch 36, batch 3750, loss[loss=0.2416, ctc_loss=0.1642, cr_loss=0.387, over 20312.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3716, over 4087441.61 frames. ], batch size: 74, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:48:33,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-17 18:48:36,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=644382.1666666666, ans=0.0 2024-09-17 18:48:42,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.169e+02 2.328e+02 2.594e+02 5.231e+02, threshold=4.656e+02, percent-clipped=1.0 2024-09-17 18:48:51,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=644410.5, ans=0.2 2024-09-17 18:49:11,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=644438.8333333334, ans=0.0 2024-09-17 18:49:19,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644467.1666666666, ans=0.125 2024-09-17 18:49:50,565 INFO [train.py:1198] (1/2) Epoch 36, batch 3800, loss[loss=0.1926, ctc_loss=0.1257, cr_loss=0.3345, over 20804.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1472, cr_loss=0.3705, over 4086088.56 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:51:08,938 INFO [train.py:1198] (1/2) Epoch 36, batch 3850, loss[loss=0.1807, ctc_loss=0.1171, cr_loss=0.3178, over 21005.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3717, over 4089923.66 frames. ], batch size: 52, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:51:19,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.172e+02 2.267e+02 2.420e+02 3.328e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 18:51:39,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-09-17 18:51:50,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2024-09-17 18:51:54,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644750.5, ans=0.125 2024-09-17 18:52:24,394 INFO [train.py:1198] (1/2) Epoch 36, batch 3900, loss[loss=0.2055, ctc_loss=0.1353, cr_loss=0.3511, over 20820.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3712, over 4088198.57 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:53:11,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=644892.1666666666, ans=0.125 2024-09-17 18:53:17,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=644892.1666666666, ans=0.0 2024-09-17 18:53:20,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=644892.1666666666, ans=0.2 2024-09-17 18:53:20,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=644892.1666666666, ans=0.04949747468305833 2024-09-17 18:53:23,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=644920.5, ans=0.125 2024-09-17 18:53:39,691 INFO [train.py:1198] (1/2) Epoch 36, batch 3950, loss[loss=0.2705, ctc_loss=0.1889, cr_loss=0.408, over 14536.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3713, over 4091041.58 frames. ], batch size: 149, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:53:42,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644948.8333333334, ans=0.125 2024-09-17 18:53:45,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=22.5 2024-09-17 18:53:50,101 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.216e+02 2.344e+02 2.513e+02 4.075e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-17 18:54:08,726 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:54:10,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=645005.5, ans=0.0 2024-09-17 18:54:39,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-17 18:54:57,582 INFO [train.py:1198] (1/2) Epoch 36, batch 4000, loss[loss=0.2645, ctc_loss=0.1787, cr_loss=0.4289, over 18268.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3724, over 4085848.57 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:55:11,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=645118.8333333334, ans=0.125 2024-09-17 18:55:15,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645118.8333333334, ans=0.1 2024-09-17 18:55:49,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=645175.5, ans=0.025 2024-09-17 18:56:06,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=645203.8333333334, ans=0.025 2024-09-17 18:56:15,551 INFO [train.py:1198] (1/2) Epoch 36, batch 4050, loss[loss=0.2173, ctc_loss=0.1429, cr_loss=0.3719, over 20775.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3712, over 4099204.72 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:56:26,005 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.152e+02 2.288e+02 2.439e+02 3.778e+02, threshold=4.576e+02, percent-clipped=0.0 2024-09-17 18:56:35,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645260.5, ans=0.1 2024-09-17 18:57:18,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=645345.5, ans=0.0 2024-09-17 18:57:18,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645345.5, ans=0.1 2024-09-17 18:57:19,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=645345.5, ans=0.2 2024-09-17 18:57:19,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-17 18:57:25,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=645345.5, ans=0.025 2024-09-17 18:57:31,319 INFO [train.py:1198] (1/2) Epoch 36, batch 4100, loss[loss=0.2063, ctc_loss=0.1339, cr_loss=0.3619, over 20887.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3724, over 4088889.04 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:57:45,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=645402.1666666666, ans=0.125 2024-09-17 18:57:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=645402.1666666666, ans=0.125 2024-09-17 18:58:10,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=645430.5, ans=0.125 2024-09-17 18:58:22,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645458.8333333334, ans=0.1 2024-09-17 18:58:30,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=645487.1666666666, ans=0.05 2024-09-17 18:58:31,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.29 vs. limit=6.0 2024-09-17 18:58:33,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=645487.1666666666, ans=0.2 2024-09-17 18:58:46,252 INFO [train.py:1198] (1/2) Epoch 36, batch 4150, loss[loss=0.1869, ctc_loss=0.12, cr_loss=0.3348, over 20984.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3708, over 4091950.82 frames. ], batch size: 51, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:58:52,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=645515.5, ans=0.2 2024-09-17 18:58:57,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.186e+02 2.345e+02 2.474e+02 3.263e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-17 18:59:03,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-17 18:59:20,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645572.1666666666, ans=0.1 2024-09-17 18:59:56,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645628.8333333334, ans=0.1 2024-09-17 19:00:02,515 INFO [train.py:1198] (1/2) Epoch 36, batch 4200, loss[loss=0.1722, ctc_loss=0.1113, cr_loss=0.3043, over 19879.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.37, over 4094501.63 frames. ], batch size: 44, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:00:27,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=645685.5, ans=0.125 2024-09-17 19:00:36,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=645713.8333333334, ans=0.125 2024-09-17 19:01:12,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=645770.5, ans=0.125 2024-09-17 19:01:14,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-09-17 19:01:20,893 INFO [train.py:1198] (1/2) Epoch 36, batch 4250, loss[loss=0.2057, ctc_loss=0.1355, cr_loss=0.3508, over 20988.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3691, over 4097725.24 frames. ], batch size: 55, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:01:28,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645798.8333333334, ans=0.125 2024-09-17 19:01:31,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.186e+02 2.328e+02 2.483e+02 3.231e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 19:01:33,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-17 19:01:34,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=645827.1666666666, ans=0.125 2024-09-17 19:02:05,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=8.0 2024-09-17 19:02:15,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=645883.8333333334, ans=0.0 2024-09-17 19:02:31,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-17 19:02:39,493 INFO [train.py:1198] (1/2) Epoch 36, batch 4300, loss[loss=0.2319, ctc_loss=0.1535, cr_loss=0.3919, over 20968.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3709, over 4092931.63 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:02:51,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=645940.5, ans=0.125 2024-09-17 19:03:21,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=645997.1666666666, ans=0.02 2024-09-17 19:03:56,036 INFO [train.py:1198] (1/2) Epoch 36, batch 4350, loss[loss=0.1991, ctc_loss=0.1305, cr_loss=0.3429, over 20939.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.3709, over 4087487.03 frames. ], batch size: 49, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:03:56,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=646082.1666666666, ans=0.0 2024-09-17 19:04:06,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.137e+02 2.304e+02 2.491e+02 3.547e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-17 19:04:11,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 19:04:34,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=646138.8333333334, ans=0.035 2024-09-17 19:04:44,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=646167.1666666666, ans=0.125 2024-09-17 19:05:09,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=646195.5, ans=0.0 2024-09-17 19:05:09,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-17 19:05:11,953 INFO [train.py:1198] (1/2) Epoch 36, batch 4400, loss[loss=0.1854, ctc_loss=0.1175, cr_loss=0.3396, over 21021.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1454, cr_loss=0.3694, over 4102665.87 frames. ], batch size: 52, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:05:42,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=646280.5, ans=0.1 2024-09-17 19:06:22,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=646337.1666666666, ans=0.125 2024-09-17 19:06:29,935 INFO [train.py:1198] (1/2) Epoch 36, batch 4450, loss[loss=0.2223, ctc_loss=0.1456, cr_loss=0.3833, over 20946.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3708, over 4090615.09 frames. ], batch size: 60, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:06:39,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646365.5, ans=0.125 2024-09-17 19:06:40,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.211e+02 2.356e+02 2.581e+02 3.449e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-17 19:06:53,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=646393.8333333334, ans=0.125 2024-09-17 19:07:13,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-17 19:07:17,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-17 19:07:48,784 INFO [train.py:1198] (1/2) Epoch 36, batch 4500, loss[loss=0.267, ctc_loss=0.1827, cr_loss=0.4216, over 18215.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3729, over 4077794.89 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:08:25,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646563.8333333334, ans=0.1 2024-09-17 19:08:49,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-17 19:08:50,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2024-09-17 19:08:55,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2024-09-17 19:09:04,042 INFO [train.py:1198] (1/2) Epoch 36, batch 4550, loss[loss=0.2268, ctc_loss=0.1553, cr_loss=0.3578, over 21056.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3737, over 4080449.34 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:09:14,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.211e+02 2.333e+02 2.511e+02 5.023e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-17 19:09:20,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646677.1666666666, ans=0.1 2024-09-17 19:09:27,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-09-17 19:09:33,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=646705.5, ans=0.025 2024-09-17 19:09:54,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=646733.8333333334, ans=0.09899494936611666 2024-09-17 19:10:20,145 INFO [train.py:1198] (1/2) Epoch 36, batch 4600, loss[loss=0.2309, ctc_loss=0.1545, cr_loss=0.3819, over 21049.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3737, over 4082184.59 frames. ], batch size: 62, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:10:28,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=646790.5, ans=0.025 2024-09-17 19:10:32,790 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:10:41,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=646818.8333333334, ans=0.0 2024-09-17 19:11:25,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2024-09-17 19:11:36,158 INFO [train.py:1198] (1/2) Epoch 36, batch 4650, loss[loss=0.226, ctc_loss=0.1505, cr_loss=0.3771, over 20965.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1482, cr_loss=0.3744, over 4091724.81 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:11:49,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.184e+02 2.321e+02 2.513e+02 4.292e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 19:11:54,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646960.5, ans=0.1 2024-09-17 19:12:05,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=646960.5, ans=0.0 2024-09-17 19:12:53,774 INFO [train.py:1198] (1/2) Epoch 36, batch 4700, loss[loss=0.2236, ctc_loss=0.1462, cr_loss=0.3872, over 20885.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1483, cr_loss=0.3745, over 4070865.34 frames. ], batch size: 57, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:12:54,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=647073.8333333334, ans=0.025 2024-09-17 19:13:14,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647102.1666666666, ans=0.1 2024-09-17 19:13:24,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=647130.5, ans=0.0 2024-09-17 19:13:44,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=647158.8333333334, ans=0.125 2024-09-17 19:13:47,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647158.8333333334, ans=0.0 2024-09-17 19:14:11,304 INFO [train.py:1198] (1/2) Epoch 36, batch 4750, loss[loss=0.2188, ctc_loss=0.1438, cr_loss=0.3749, over 20993.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.3726, over 4067101.65 frames. ], batch size: 61, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:14:14,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=12.0 2024-09-17 19:14:21,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.204e+02 2.343e+02 2.499e+02 3.449e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 19:14:43,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=647272.1666666666, ans=0.025 2024-09-17 19:14:48,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=647272.1666666666, ans=15.0 2024-09-17 19:15:26,934 INFO [train.py:1198] (1/2) Epoch 36, batch 4800, loss[loss=0.2195, ctc_loss=0.144, cr_loss=0.3776, over 20941.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3723, over 4070939.05 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:15:43,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=647385.5, ans=0.0 2024-09-17 19:15:47,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=12.0 2024-09-17 19:16:37,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=647470.5, ans=0.0 2024-09-17 19:16:42,766 INFO [train.py:1198] (1/2) Epoch 36, batch 4850, loss[loss=0.2188, ctc_loss=0.1456, cr_loss=0.3657, over 20689.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4077005.05 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:16:53,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.188e+02 2.296e+02 2.498e+02 4.248e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 19:17:20,661 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:18:00,926 INFO [train.py:1198] (1/2) Epoch 36, batch 4900, loss[loss=0.1777, ctc_loss=0.1154, cr_loss=0.3116, over 20991.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3728, over 4088791.95 frames. ], batch size: 51, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:18:01,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=647640.5, ans=0.025 2024-09-17 19:18:02,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647640.5, ans=0.1 2024-09-17 19:18:18,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:18:23,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=647668.8333333334, ans=0.125 2024-09-17 19:19:14,674 INFO [train.py:1198] (1/2) Epoch 36, batch 4950, loss[loss=0.19, ctc_loss=0.1243, cr_loss=0.3282, over 21059.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3726, over 4091588.45 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:19:15,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647782.1666666666, ans=0.1 2024-09-17 19:19:25,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.181e+02 2.310e+02 2.451e+02 3.277e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 19:19:33,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=647810.5, ans=0.125 2024-09-17 19:19:36,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-09-17 19:19:40,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647810.5, ans=0.125 2024-09-17 19:19:50,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=647838.8333333334, ans=0.125 2024-09-17 19:19:58,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=647867.1666666666, ans=0.0 2024-09-17 19:20:13,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=647895.5, ans=0.125 2024-09-17 19:20:24,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647895.5, ans=0.125 2024-09-17 19:20:31,770 INFO [train.py:1198] (1/2) Epoch 36, batch 5000, loss[loss=0.216, ctc_loss=0.1411, cr_loss=0.3744, over 20837.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.371, over 4094900.99 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:20:45,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=647952.1666666666, ans=0.0 2024-09-17 19:20:46,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=647952.1666666666, ans=0.0 2024-09-17 19:20:51,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-17 19:20:57,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-17 19:21:24,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=648008.8333333334, ans=0.025 2024-09-17 19:21:46,204 INFO [train.py:1198] (1/2) Epoch 36, batch 5050, loss[loss=0.2373, ctc_loss=0.1552, cr_loss=0.4106, over 20970.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3716, over 4097859.88 frames. ], batch size: 52, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:21:54,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=648065.5, ans=0.125 2024-09-17 19:21:56,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.192e+02 2.288e+02 2.393e+02 3.006e+02, threshold=4.576e+02, percent-clipped=0.0 2024-09-17 19:22:04,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=648093.8333333334, ans=0.125 2024-09-17 19:22:55,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=648178.8333333334, ans=0.125 2024-09-17 19:23:00,823 INFO [train.py:1198] (1/2) Epoch 36, batch 5100, loss[loss=0.2015, ctc_loss=0.1321, cr_loss=0.347, over 20963.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3716, over 4100798.59 frames. ], batch size: 51, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:24:11,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648320.5, ans=0.1 2024-09-17 19:24:15,752 INFO [train.py:1198] (1/2) Epoch 36, batch 5150, loss[loss=0.2538, ctc_loss=0.168, cr_loss=0.4288, over 20671.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.371, over 4104309.56 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:24:16,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=648348.8333333334, ans=0.0 2024-09-17 19:24:26,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.219e+02 2.328e+02 2.472e+02 3.334e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-17 19:24:43,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=648405.5, ans=0.125 2024-09-17 19:25:03,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=648433.8333333334, ans=0.2 2024-09-17 19:25:07,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-09-17 19:25:30,025 INFO [train.py:1198] (1/2) Epoch 36, batch 5200, loss[loss=0.2376, ctc_loss=0.158, cr_loss=0.3982, over 20674.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3718, over 4107475.02 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:26:07,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648547.1666666666, ans=0.125 2024-09-17 19:26:35,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648603.8333333334, ans=0.1 2024-09-17 19:26:44,298 INFO [train.py:1198] (1/2) Epoch 36, batch 5250, loss[loss=0.2593, ctc_loss=0.1848, cr_loss=0.3721, over 13598.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1472, cr_loss=0.372, over 4085836.63 frames. ], batch size: 150, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:26:54,613 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.172e+02 2.334e+02 2.449e+02 4.552e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 19:26:58,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-17 19:27:20,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=648688.8333333334, ans=0.0 2024-09-17 19:27:43,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-09-17 19:27:45,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2024-09-17 19:27:53,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=648745.5, ans=0.0 2024-09-17 19:28:01,147 INFO [train.py:1198] (1/2) Epoch 36, batch 5300, loss[loss=0.2251, ctc_loss=0.1495, cr_loss=0.3776, over 20703.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3708, over 4099334.63 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:28:13,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-17 19:28:17,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648802.1666666666, ans=0.1 2024-09-17 19:28:22,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=648802.1666666666, ans=0.0 2024-09-17 19:28:25,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648802.1666666666, ans=0.125 2024-09-17 19:28:29,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=648802.1666666666, ans=15.0 2024-09-17 19:28:52,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=22.5 2024-09-17 19:28:52,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-17 19:28:53,858 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:29:15,278 INFO [train.py:1198] (1/2) Epoch 36, batch 5350, loss[loss=0.2392, ctc_loss=0.1611, cr_loss=0.3909, over 20046.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3712, over 4100076.31 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:29:21,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=648915.5, ans=0.025 2024-09-17 19:29:25,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.135e+02 2.318e+02 2.451e+02 3.354e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-17 19:29:29,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=12.0 2024-09-17 19:29:43,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=648972.1666666666, ans=0.0 2024-09-17 19:29:47,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-17 19:30:04,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=649000.5, ans=0.125 2024-09-17 19:30:14,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=649000.5, ans=0.125 2024-09-17 19:30:31,934 INFO [train.py:1198] (1/2) Epoch 36, batch 5400, loss[loss=0.2217, ctc_loss=0.1465, cr_loss=0.3763, over 20879.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1463, cr_loss=0.3708, over 4108929.76 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:30:57,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=649085.5, ans=0.125 2024-09-17 19:31:08,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2024-09-17 19:31:11,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=649113.8333333334, ans=0.125 2024-09-17 19:31:17,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=649142.1666666666, ans=0.125 2024-09-17 19:31:33,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649170.5, ans=0.1 2024-09-17 19:31:46,545 INFO [train.py:1198] (1/2) Epoch 36, batch 5450, loss[loss=0.2187, ctc_loss=0.1457, cr_loss=0.3646, over 20968.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3701, over 4094674.49 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:31:56,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.180e+02 2.319e+02 2.454e+02 3.887e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 19:32:15,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=649255.5, ans=0.0 2024-09-17 19:32:17,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-17 19:32:55,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=649312.1666666666, ans=0.125 2024-09-17 19:33:00,843 INFO [train.py:1198] (1/2) Epoch 36, batch 5500, loss[loss=0.2366, ctc_loss=0.1575, cr_loss=0.3959, over 21064.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1454, cr_loss=0.3695, over 4104549.00 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:33:14,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=649368.8333333334, ans=0.125 2024-09-17 19:33:41,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-17 19:33:43,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649397.1666666666, ans=0.1 2024-09-17 19:33:49,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-17 19:34:15,394 INFO [train.py:1198] (1/2) Epoch 36, batch 5550, loss[loss=0.2692, ctc_loss=0.1908, cr_loss=0.3924, over 14092.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1459, cr_loss=0.3692, over 4084232.27 frames. ], batch size: 151, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:34:21,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-17 19:34:22,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649482.1666666666, ans=0.1 2024-09-17 19:34:25,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.181e+02 2.300e+02 2.475e+02 3.632e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 19:34:27,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-17 19:34:42,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=649510.5, ans=0.125 2024-09-17 19:35:29,358 INFO [train.py:1198] (1/2) Epoch 36, batch 5600, loss[loss=0.2126, ctc_loss=0.1394, cr_loss=0.366, over 20873.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.37, over 4070847.42 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:35:41,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=649623.8333333334, ans=0.0 2024-09-17 19:35:53,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=649652.1666666666, ans=0.0 2024-09-17 19:36:14,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-09-17 19:36:46,311 INFO [train.py:1198] (1/2) Epoch 36, batch 5650, loss[loss=0.2372, ctc_loss=0.1613, cr_loss=0.3794, over 20765.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.3699, over 4078561.86 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:36:56,843 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.184e+02 2.325e+02 2.525e+02 4.662e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-17 19:37:10,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=649793.8333333334, ans=0.2 2024-09-17 19:37:28,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=649822.1666666666, ans=0.2 2024-09-17 19:38:00,579 INFO [train.py:1198] (1/2) Epoch 36, batch 5700, loss[loss=0.2387, ctc_loss=0.1597, cr_loss=0.3948, over 21034.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.147, cr_loss=0.3706, over 4088142.80 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:38:16,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=649935.5, ans=0.015 2024-09-17 19:38:32,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=22.5 2024-09-17 19:38:45,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=649992.1666666666, ans=0.0 2024-09-17 19:38:50,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=649992.1666666666, ans=0.125 2024-09-17 19:38:52,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-09-17 19:39:17,068 INFO [train.py:1198] (1/2) Epoch 36, batch 5750, loss[loss=0.2248, ctc_loss=0.1506, cr_loss=0.3711, over 20819.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1467, cr_loss=0.3698, over 4082010.73 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:39:18,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=650048.8333333334, ans=0.0 2024-09-17 19:39:20,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650048.8333333334, ans=0.1 2024-09-17 19:39:24,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=650048.8333333334, ans=0.125 2024-09-17 19:39:27,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.195e+02 2.353e+02 2.538e+02 7.012e+02, threshold=4.707e+02, percent-clipped=1.0 2024-09-17 19:40:05,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650133.8333333334, ans=0.125 2024-09-17 19:40:27,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=650162.1666666666, ans=0.0 2024-09-17 19:40:31,178 INFO [train.py:1198] (1/2) Epoch 36, batch 5800, loss[loss=0.2536, ctc_loss=0.1713, cr_loss=0.4117, over 18391.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1466, cr_loss=0.3693, over 4072779.52 frames. ], batch size: 108, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:40:33,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=650190.5, ans=0.125 2024-09-17 19:40:52,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=650218.8333333334, ans=0.125 2024-09-17 19:40:55,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=650218.8333333334, ans=0.125 2024-09-17 19:41:44,645 INFO [train.py:1198] (1/2) Epoch 36, batch 5850, loss[loss=0.2307, ctc_loss=0.1516, cr_loss=0.3957, over 20786.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3719, over 4084672.06 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:41:56,479 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.230e+02 2.366e+02 2.519e+02 3.256e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-17 19:41:57,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2024-09-17 19:42:02,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=650360.5, ans=0.125 2024-09-17 19:42:19,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-17 19:42:59,122 INFO [train.py:1198] (1/2) Epoch 36, batch 5900, loss[loss=0.2465, ctc_loss=0.1659, cr_loss=0.4031, over 20975.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3714, over 4086417.33 frames. ], batch size: 64, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:43:13,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=650502.1666666666, ans=0.0 2024-09-17 19:43:35,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=650530.5, ans=0.0 2024-09-17 19:43:36,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650530.5, ans=0.125 2024-09-17 19:43:41,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=650530.5, ans=0.125 2024-09-17 19:43:50,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650558.8333333334, ans=0.1 2024-09-17 19:43:50,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650558.8333333334, ans=0.1 2024-09-17 19:44:03,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=650587.1666666666, ans=0.05 2024-09-17 19:44:13,533 INFO [train.py:1198] (1/2) Epoch 36, batch 5950, loss[loss=0.2044, ctc_loss=0.1348, cr_loss=0.348, over 20886.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3713, over 4078367.87 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:44:22,766 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:44:25,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.183e+02 2.325e+02 2.502e+02 4.468e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-17 19:45:08,203 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:45:24,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=22.5 2024-09-17 19:45:30,118 INFO [train.py:1198] (1/2) Epoch 36, batch 6000, loss[loss=0.2223, ctc_loss=0.1465, cr_loss=0.379, over 20885.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3712, over 4068442.45 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:45:30,118 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:45:50,934 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.04013, ctc_loss=0.04013, cr_loss=1.364e-14, over 944034.00 frames. 2024-09-17 19:45:50,935 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 19:46:18,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=650785.5, ans=0.04949747468305833 2024-09-17 19:46:25,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650813.8333333334, ans=0.125 2024-09-17 19:46:46,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650842.1666666666, ans=0.125 2024-09-17 19:46:59,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2024-09-17 19:47:02,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=650870.5, ans=0.125 2024-09-17 19:47:08,563 INFO [train.py:1198] (1/2) Epoch 36, batch 6050, loss[loss=0.2424, ctc_loss=0.1603, cr_loss=0.41, over 20996.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3716, over 4079994.70 frames. ], batch size: 64, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:47:16,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=650898.8333333334, ans=0.125 2024-09-17 19:47:20,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.238e+02 2.347e+02 2.486e+02 4.621e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-17 19:47:23,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=650927.1666666666, ans=0.035 2024-09-17 19:47:34,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-17 19:47:47,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650955.5, ans=0.125 2024-09-17 19:48:11,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=651012.1666666666, ans=0.0 2024-09-17 19:48:19,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651012.1666666666, ans=0.1 2024-09-17 19:48:23,335 INFO [train.py:1198] (1/2) Epoch 36, batch 6100, loss[loss=0.1764, ctc_loss=0.1149, cr_loss=0.3074, over 20993.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3709, over 4088435.61 frames. ], batch size: 50, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:48:44,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651068.8333333334, ans=0.0 2024-09-17 19:48:54,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=651097.1666666666, ans=0.125 2024-09-17 19:49:03,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=651097.1666666666, ans=0.2 2024-09-17 19:49:10,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651125.5, ans=0.1 2024-09-17 19:49:37,484 INFO [train.py:1198] (1/2) Epoch 36, batch 6150, loss[loss=0.1789, ctc_loss=0.1177, cr_loss=0.3061, over 20293.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3711, over 4068823.54 frames. ], batch size: 45, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:49:49,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.197e+02 2.368e+02 2.541e+02 4.603e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-17 19:49:51,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=651210.5, ans=0.0 2024-09-17 19:49:59,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=651210.5, ans=0.02 2024-09-17 19:50:08,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=651238.8333333334, ans=0.0 2024-09-17 19:50:19,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=651238.8333333334, ans=0.0 2024-09-17 19:50:21,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=651267.1666666666, ans=0.04949747468305833 2024-09-17 19:50:44,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-17 19:50:52,248 INFO [train.py:1198] (1/2) Epoch 36, batch 6200, loss[loss=0.2167, ctc_loss=0.1475, cr_loss=0.346, over 20976.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1465, cr_loss=0.3694, over 4063962.35 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:51:55,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=651437.1666666666, ans=0.0 2024-09-17 19:52:05,562 INFO [train.py:1198] (1/2) Epoch 36, batch 6250, loss[loss=0.2599, ctc_loss=0.1836, cr_loss=0.3816, over 14316.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1455, cr_loss=0.3671, over 4052484.66 frames. ], batch size: 149, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:52:17,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.178e+02 2.351e+02 2.531e+02 3.776e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-17 19:52:58,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=651550.5, ans=0.5 2024-09-17 19:53:07,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=651578.8333333334, ans=0.025 2024-09-17 19:53:19,874 INFO [train.py:1198] (1/2) Epoch 36, batch 6300, loss[loss=0.2029, ctc_loss=0.1333, cr_loss=0.3481, over 20935.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1462, cr_loss=0.3669, over 3999720.96 frames. ], batch size: 48, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:53:24,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-09-17 19:53:44,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651635.5, ans=0.125 2024-09-17 19:53:45,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-17 19:54:32,231 INFO [train.py:1198] (1/2) Epoch 36, batch 6350, loss[loss=0.2725, ctc_loss=0.1903, cr_loss=0.4108, over 14512.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1485, cr_loss=0.3675, over 3884039.74 frames. ], batch size: 149, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:54:36,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=651748.8333333334, ans=0.2 2024-09-17 19:54:40,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=651748.8333333334, ans=0.2 2024-09-17 19:54:43,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.174e+02 2.432e+02 2.753e+02 3.592e+02, threshold=4.864e+02, percent-clipped=0.0 2024-09-17 19:56:19,160 INFO [train.py:1198] (1/2) Epoch 37, batch 0, loss[loss=0.2832, ctc_loss=0.1894, cr_loss=0.469, over 18158.00 frames. ], tot_loss[loss=0.2832, ctc_loss=0.1894, cr_loss=0.469, over 18158.00 frames. ], batch size: 108, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:56:19,161 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:56:31,743 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7912, 4.4647, 3.3927, 3.9318], device='cuda:1') 2024-09-17 19:56:37,677 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.04008, ctc_loss=0.04008, cr_loss=1.353e-14, over 944034.00 frames. 2024-09-17 19:56:37,678 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 19:56:41,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651865.0, ans=0.1 2024-09-17 19:56:54,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=651893.3333333334, ans=0.04949747468305833 2024-09-17 19:57:14,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651921.6666666666, ans=0.125 2024-09-17 19:57:43,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-17 19:57:47,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=651978.3333333334, ans=0.0 2024-09-17 19:57:53,597 INFO [train.py:1198] (1/2) Epoch 37, batch 50, loss[loss=0.2581, ctc_loss=0.1819, cr_loss=0.3809, over 14538.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.147, cr_loss=0.3725, over 930666.60 frames. ], batch size: 149, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:58:07,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=652035.0, ans=0.125 2024-09-17 19:58:15,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=652035.0, ans=0.125 2024-09-17 19:58:19,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.284e+02 2.559e+02 2.893e+02 3.893e+02, threshold=5.119e+02, percent-clipped=0.0 2024-09-17 19:58:19,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=652035.0, ans=0.0 2024-09-17 19:58:25,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=652063.3333333334, ans=0.025 2024-09-17 19:58:41,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652091.6666666666, ans=0.1 2024-09-17 19:58:54,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=652120.0, ans=0.125 2024-09-17 19:59:09,298 INFO [train.py:1198] (1/2) Epoch 37, batch 100, loss[loss=0.2118, ctc_loss=0.1393, cr_loss=0.3629, over 20972.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1488, cr_loss=0.3769, over 1639875.88 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:59:42,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=652205.0, ans=0.0 2024-09-17 19:59:56,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=652233.3333333334, ans=0.125 2024-09-17 20:00:27,930 INFO [train.py:1198] (1/2) Epoch 37, batch 150, loss[loss=0.2404, ctc_loss=0.1612, cr_loss=0.3956, over 21024.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1471, cr_loss=0.374, over 2184287.85 frames. ], batch size: 63, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 20:00:28,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652290.0, ans=0.1 2024-09-17 20:00:56,605 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.208e+02 2.307e+02 2.466e+02 3.531e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-17 20:01:46,637 INFO [train.py:1198] (1/2) Epoch 37, batch 200, loss[loss=0.2143, ctc_loss=0.1413, cr_loss=0.3651, over 20896.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1468, cr_loss=0.3729, over 2609253.52 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 20:03:01,066 INFO [train.py:1198] (1/2) Epoch 37, batch 250, loss[loss=0.21, ctc_loss=0.1358, cr_loss=0.3707, over 21056.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1476, cr_loss=0.3742, over 2936162.87 frames. ], batch size: 56, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:03:27,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.192e+02 2.327e+02 2.518e+02 3.615e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 20:03:52,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=652658.3333333334, ans=0.125 2024-09-17 20:03:59,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=652686.6666666666, ans=0.0 2024-09-17 20:04:16,333 INFO [train.py:1198] (1/2) Epoch 37, batch 300, loss[loss=0.2284, ctc_loss=0.151, cr_loss=0.3873, over 20656.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1473, cr_loss=0.3729, over 3185490.36 frames. ], batch size: 66, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:05:02,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652800.0, ans=0.1 2024-09-17 20:05:19,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652828.3333333334, ans=0.125 2024-09-17 20:05:31,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=652828.3333333334, ans=0.05 2024-09-17 20:05:35,360 INFO [train.py:1198] (1/2) Epoch 37, batch 350, loss[loss=0.2538, ctc_loss=0.1699, cr_loss=0.4196, over 20945.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3711, over 3393991.40 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:05:38,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=652856.6666666666, ans=0.0 2024-09-17 20:05:49,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=652885.0, ans=0.125 2024-09-17 20:06:02,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.180e+02 2.290e+02 2.473e+02 3.463e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 20:06:22,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=652941.6666666666, ans=0.025 2024-09-17 20:06:31,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=652941.6666666666, ans=0.125 2024-09-17 20:06:54,636 INFO [train.py:1198] (1/2) Epoch 37, batch 400, loss[loss=0.2261, ctc_loss=0.1509, cr_loss=0.376, over 20827.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3715, over 3560002.04 frames. ], batch size: 65, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:07:08,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=653026.6666666666, ans=0.125 2024-09-17 20:07:14,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=653026.6666666666, ans=0.0 2024-09-17 20:07:21,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=653026.6666666666, ans=0.0 2024-09-17 20:07:26,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=653055.0, ans=15.0 2024-09-17 20:07:49,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=653083.3333333334, ans=0.125 2024-09-17 20:08:09,980 INFO [train.py:1198] (1/2) Epoch 37, batch 450, loss[loss=0.2055, ctc_loss=0.1362, cr_loss=0.3463, over 20951.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.3729, over 3673431.60 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:08:13,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-17 20:08:21,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=653140.0, ans=0.05 2024-09-17 20:08:37,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.130e+02 2.307e+02 2.474e+02 3.560e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-17 20:08:39,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=653196.6666666666, ans=0.0 2024-09-17 20:09:11,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=653253.3333333334, ans=0.0 2024-09-17 20:09:17,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=22.5 2024-09-17 20:09:17,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-09-17 20:09:25,259 INFO [train.py:1198] (1/2) Epoch 37, batch 500, loss[loss=0.2447, ctc_loss=0.162, cr_loss=0.4137, over 20940.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3718, over 3773208.19 frames. ], batch size: 64, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:09:34,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=653281.6666666666, ans=0.09899494936611666 2024-09-17 20:09:39,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653310.0, ans=0.1 2024-09-17 20:09:59,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-17 20:10:06,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=653338.3333333334, ans=0.0 2024-09-17 20:10:18,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=653366.6666666666, ans=0.0 2024-09-17 20:10:40,961 INFO [train.py:1198] (1/2) Epoch 37, batch 550, loss[loss=0.2265, ctc_loss=0.1504, cr_loss=0.3801, over 20874.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3703, over 3855755.13 frames. ], batch size: 57, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:10:56,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653451.6666666666, ans=0.1 2024-09-17 20:11:11,745 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.154e+02 2.258e+02 2.391e+02 5.312e+02, threshold=4.516e+02, percent-clipped=1.0 2024-09-17 20:11:13,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=653480.0, ans=0.0 2024-09-17 20:11:16,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=653480.0, ans=0.025 2024-09-17 20:11:40,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=653508.3333333334, ans=0.0 2024-09-17 20:11:55,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=653536.6666666666, ans=0.125 2024-09-17 20:11:59,948 INFO [train.py:1198] (1/2) Epoch 37, batch 600, loss[loss=0.239, ctc_loss=0.1593, cr_loss=0.3989, over 20825.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3682, over 3918064.26 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:12:13,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=653593.3333333334, ans=0.125 2024-09-17 20:12:30,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=653593.3333333334, ans=0.0 2024-09-17 20:12:50,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653650.0, ans=0.1 2024-09-17 20:13:18,710 INFO [train.py:1198] (1/2) Epoch 37, batch 650, loss[loss=0.2836, ctc_loss=0.1996, cr_loss=0.4196, over 14143.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1455, cr_loss=0.3691, over 3946256.99 frames. ], batch size: 149, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:13:45,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.215e+02 2.336e+02 2.471e+02 4.298e+02, threshold=4.672e+02, percent-clipped=0.0 2024-09-17 20:14:34,154 INFO [train.py:1198] (1/2) Epoch 37, batch 700, loss[loss=0.225, ctc_loss=0.15, cr_loss=0.3752, over 20874.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1454, cr_loss=0.3693, over 3978897.68 frames. ], batch size: 57, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:14:56,982 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:15:04,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=653905.0, ans=0.07 2024-09-17 20:15:26,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=653933.3333333334, ans=0.0 2024-09-17 20:15:48,985 INFO [train.py:1198] (1/2) Epoch 37, batch 750, loss[loss=0.2723, ctc_loss=0.1918, cr_loss=0.4025, over 14540.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3718, over 3997114.64 frames. ], batch size: 149, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:15:55,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=653990.0, ans=0.0 2024-09-17 20:16:13,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=654018.3333333334, ans=0.125 2024-09-17 20:16:16,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=654018.3333333334, ans=0.125 2024-09-17 20:16:17,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.205e+02 2.295e+02 2.454e+02 5.146e+02, threshold=4.589e+02, percent-clipped=1.0 2024-09-17 20:16:28,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654046.6666666666, ans=0.125 2024-09-17 20:16:37,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=654075.0, ans=0.125 2024-09-17 20:17:07,412 INFO [train.py:1198] (1/2) Epoch 37, batch 800, loss[loss=0.2254, ctc_loss=0.1508, cr_loss=0.3731, over 20671.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.371, over 4004802.25 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:17:10,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=654131.6666666666, ans=0.025 2024-09-17 20:17:11,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-17 20:17:20,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-09-17 20:17:57,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654216.6666666666, ans=0.1 2024-09-17 20:18:13,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=654245.0, ans=0.125 2024-09-17 20:18:23,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=654245.0, ans=0.125 2024-09-17 20:18:23,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654245.0, ans=0.1 2024-09-17 20:18:25,924 INFO [train.py:1198] (1/2) Epoch 37, batch 850, loss[loss=0.1943, ctc_loss=0.1251, cr_loss=0.3462, over 19574.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1472, cr_loss=0.3705, over 4013327.39 frames. ], batch size: 43, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:18:29,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=654273.3333333334, ans=0.125 2024-09-17 20:18:54,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.215e+02 2.340e+02 2.529e+02 4.240e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-17 20:19:41,330 INFO [train.py:1198] (1/2) Epoch 37, batch 900, loss[loss=0.2312, ctc_loss=0.1527, cr_loss=0.3927, over 20740.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.372, over 4015675.32 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:20:17,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=654471.6666666666, ans=10.0 2024-09-17 20:20:20,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654471.6666666666, ans=0.1 2024-09-17 20:20:35,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-17 20:20:43,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=654528.3333333334, ans=0.0 2024-09-17 20:20:56,698 INFO [train.py:1198] (1/2) Epoch 37, batch 950, loss[loss=0.2399, ctc_loss=0.1604, cr_loss=0.3977, over 20430.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3722, over 4020941.01 frames. ], batch size: 74, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:21:25,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.144e+02 2.286e+02 2.467e+02 2.816e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 20:21:30,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=654613.3333333334, ans=0.0 2024-09-17 20:21:40,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-17 20:22:12,418 INFO [train.py:1198] (1/2) Epoch 37, batch 1000, loss[loss=0.2249, ctc_loss=0.1495, cr_loss=0.3769, over 20938.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4036988.74 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:22:23,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=654698.3333333334, ans=0.0 2024-09-17 20:22:23,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=654698.3333333334, ans=22.5 2024-09-17 20:22:46,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.95 vs. limit=5.0 2024-09-17 20:22:49,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=654755.0, ans=0.125 2024-09-17 20:23:34,154 INFO [train.py:1198] (1/2) Epoch 37, batch 1050, loss[loss=0.2086, ctc_loss=0.1369, cr_loss=0.3588, over 20837.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3729, over 4047636.29 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:23:54,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=654868.3333333334, ans=0.0 2024-09-17 20:24:02,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.234e+02 2.323e+02 2.517e+02 3.469e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 20:24:06,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 20:24:42,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=654953.3333333334, ans=0.0 2024-09-17 20:24:49,676 INFO [train.py:1198] (1/2) Epoch 37, batch 1100, loss[loss=0.1902, ctc_loss=0.1235, cr_loss=0.3337, over 20215.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3722, over 4061990.69 frames. ], batch size: 45, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:24:53,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=654981.6666666666, ans=0.2 2024-09-17 20:24:54,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=654981.6666666666, ans=0.025 2024-09-17 20:24:56,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-17 20:25:38,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-17 20:25:59,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=655095.0, ans=0.125 2024-09-17 20:26:01,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-09-17 20:26:04,719 INFO [train.py:1198] (1/2) Epoch 37, batch 1150, loss[loss=0.2594, ctc_loss=0.18, cr_loss=0.397, over 14336.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3735, over 4059191.52 frames. ], batch size: 149, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:26:06,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655123.3333333334, ans=0.1 2024-09-17 20:26:25,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655151.6666666666, ans=0.1 2024-09-17 20:26:33,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.202e+02 2.384e+02 2.534e+02 4.350e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-17 20:26:58,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2024-09-17 20:27:14,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-17 20:27:19,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-17 20:27:20,052 INFO [train.py:1198] (1/2) Epoch 37, batch 1200, loss[loss=0.2433, ctc_loss=0.1641, cr_loss=0.3961, over 18525.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.147, cr_loss=0.3722, over 4061637.15 frames. ], batch size: 108, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:27:31,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=655265.0, ans=0.125 2024-09-17 20:27:48,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=22.5 2024-09-17 20:27:49,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655321.6666666666, ans=0.125 2024-09-17 20:28:10,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=655350.0, ans=0.04949747468305833 2024-09-17 20:28:13,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=655350.0, ans=0.05 2024-09-17 20:28:38,809 INFO [train.py:1198] (1/2) Epoch 37, batch 1250, loss[loss=0.1826, ctc_loss=0.1207, cr_loss=0.3095, over 20966.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3707, over 4081912.25 frames. ], batch size: 49, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:29:03,318 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:29:07,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.214e+02 2.300e+02 2.435e+02 5.575e+02, threshold=4.600e+02, percent-clipped=1.0 2024-09-17 20:29:45,994 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.472e-03 2024-09-17 20:29:57,252 INFO [train.py:1198] (1/2) Epoch 37, batch 1300, loss[loss=0.2141, ctc_loss=0.1399, cr_loss=0.3709, over 20883.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3709, over 4086377.34 frames. ], batch size: 54, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:30:09,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-17 20:30:45,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=655633.3333333334, ans=0.125 2024-09-17 20:31:12,403 INFO [train.py:1198] (1/2) Epoch 37, batch 1350, loss[loss=0.2099, ctc_loss=0.1419, cr_loss=0.3399, over 19464.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3716, over 4080321.23 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:31:12,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=655690.0, ans=0.125 2024-09-17 20:31:42,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.207e+02 2.389e+02 2.536e+02 3.344e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-17 20:32:27,672 INFO [train.py:1198] (1/2) Epoch 37, batch 1400, loss[loss=0.1964, ctc_loss=0.1261, cr_loss=0.3517, over 20995.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3718, over 4093631.12 frames. ], batch size: 52, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:33:04,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=655888.3333333334, ans=0.0 2024-09-17 20:33:40,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-17 20:33:46,944 INFO [train.py:1198] (1/2) Epoch 37, batch 1450, loss[loss=0.2395, ctc_loss=0.1595, cr_loss=0.3999, over 18602.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3711, over 4099575.01 frames. ], batch size: 108, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:33:51,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=655973.3333333334, ans=0.125 2024-09-17 20:34:13,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-17 20:34:17,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.190e+02 2.304e+02 2.486e+02 3.171e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-17 20:34:20,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=656030.0, ans=0.0 2024-09-17 20:34:47,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656086.6666666666, ans=0.1 2024-09-17 20:35:05,321 INFO [train.py:1198] (1/2) Epoch 37, batch 1500, loss[loss=0.2893, ctc_loss=0.2084, cr_loss=0.4042, over 14463.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3709, over 4090581.86 frames. ], batch size: 149, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:35:07,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=656115.0, ans=0.2 2024-09-17 20:35:20,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=656143.3333333334, ans=0.125 2024-09-17 20:35:37,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656171.6666666666, ans=0.1 2024-09-17 20:36:20,678 INFO [train.py:1198] (1/2) Epoch 37, batch 1550, loss[loss=0.2284, ctc_loss=0.1534, cr_loss=0.375, over 21025.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1462, cr_loss=0.3699, over 4082520.85 frames. ], batch size: 61, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:36:25,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656256.6666666666, ans=0.1 2024-09-17 20:36:41,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-17 20:36:41,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-09-17 20:36:45,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=656285.0, ans=0.125 2024-09-17 20:36:47,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=656285.0, ans=0.125 2024-09-17 20:36:51,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.227e+02 2.432e+02 2.612e+02 3.752e+02, threshold=4.865e+02, percent-clipped=0.0 2024-09-17 20:37:01,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=656313.3333333334, ans=0.125 2024-09-17 20:37:36,384 INFO [train.py:1198] (1/2) Epoch 37, batch 1600, loss[loss=0.2143, ctc_loss=0.1403, cr_loss=0.3698, over 20957.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.37, over 4091506.32 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:37:39,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=656398.3333333334, ans=0.0 2024-09-17 20:37:56,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=656426.6666666666, ans=0.125 2024-09-17 20:38:19,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=656455.0, ans=0.2 2024-09-17 20:38:23,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=656483.3333333334, ans=0.015 2024-09-17 20:38:26,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=656483.3333333334, ans=0.125 2024-09-17 20:38:45,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=656511.6666666666, ans=0.2 2024-09-17 20:38:52,167 INFO [train.py:1198] (1/2) Epoch 37, batch 1650, loss[loss=0.2941, ctc_loss=0.2099, cr_loss=0.4208, over 13807.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.3701, over 4073027.04 frames. ], batch size: 149, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:39:13,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=656568.3333333334, ans=0.2 2024-09-17 20:39:24,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.218e+02 2.352e+02 2.526e+02 4.592e+02, threshold=4.705e+02, percent-clipped=0.0 2024-09-17 20:39:25,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=656596.6666666666, ans=0.125 2024-09-17 20:39:41,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=656625.0, ans=0.07 2024-09-17 20:39:41,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-17 20:39:42,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-17 20:40:10,000 INFO [train.py:1198] (1/2) Epoch 37, batch 1700, loss[loss=0.1819, ctc_loss=0.1183, cr_loss=0.3182, over 20985.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3718, over 4067926.58 frames. ], batch size: 48, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:40:11,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=656681.6666666666, ans=0.025 2024-09-17 20:40:12,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-17 20:40:24,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-17 20:40:52,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656738.3333333334, ans=0.1 2024-09-17 20:41:28,582 INFO [train.py:1198] (1/2) Epoch 37, batch 1750, loss[loss=0.2099, ctc_loss=0.1387, cr_loss=0.3557, over 20970.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1487, cr_loss=0.3742, over 4068643.55 frames. ], batch size: 55, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:41:51,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-17 20:41:57,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656880.0, ans=0.1 2024-09-17 20:41:58,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.174e+02 2.346e+02 2.490e+02 4.246e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 20:42:12,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656908.3333333334, ans=0.125 2024-09-17 20:42:29,070 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:42:33,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=656936.6666666666, ans=0.0 2024-09-17 20:42:43,876 INFO [train.py:1198] (1/2) Epoch 37, batch 1800, loss[loss=0.2171, ctc_loss=0.142, cr_loss=0.3753, over 20824.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.3732, over 4085241.73 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:42:52,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-09-17 20:42:57,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=656993.3333333334, ans=0.2 2024-09-17 20:43:04,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=656993.3333333334, ans=0.125 2024-09-17 20:43:33,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-17 20:43:59,032 INFO [train.py:1198] (1/2) Epoch 37, batch 1850, loss[loss=0.2184, ctc_loss=0.1484, cr_loss=0.3502, over 19496.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4079344.77 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:44:06,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=657106.6666666666, ans=0.2 2024-09-17 20:44:11,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=657106.6666666666, ans=0.07 2024-09-17 20:44:17,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=657135.0, ans=0.125 2024-09-17 20:44:28,970 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.214e+02 2.294e+02 2.446e+02 2.982e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 20:44:32,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=657163.3333333334, ans=0.0 2024-09-17 20:45:08,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=657220.0, ans=0.0 2024-09-17 20:45:17,143 INFO [train.py:1198] (1/2) Epoch 37, batch 1900, loss[loss=0.2197, ctc_loss=0.144, cr_loss=0.3785, over 20945.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.3729, over 4075397.61 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:45:39,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2024-09-17 20:45:44,878 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:45:55,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=657305.0, ans=0.0 2024-09-17 20:46:37,005 INFO [train.py:1198] (1/2) Epoch 37, batch 1950, loss[loss=0.199, ctc_loss=0.132, cr_loss=0.3349, over 20968.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4081308.53 frames. ], batch size: 52, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:46:52,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 20:46:53,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657418.3333333334, ans=0.125 2024-09-17 20:47:06,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.153e+02 2.309e+02 2.482e+02 3.673e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 20:47:23,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=657475.0, ans=0.125 2024-09-17 20:47:31,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-17 20:47:51,814 INFO [train.py:1198] (1/2) Epoch 37, batch 2000, loss[loss=0.2251, ctc_loss=0.1505, cr_loss=0.3731, over 21018.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1477, cr_loss=0.3738, over 4088619.64 frames. ], batch size: 61, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:48:02,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=657531.6666666666, ans=0.125 2024-09-17 20:48:19,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=657560.0, ans=0.2 2024-09-17 20:48:36,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-09-17 20:48:46,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=657616.6666666666, ans=0.125 2024-09-17 20:48:50,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-17 20:48:52,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=657645.0, ans=0.04949747468305833 2024-09-17 20:48:59,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=657645.0, ans=0.125 2024-09-17 20:49:07,949 INFO [train.py:1198] (1/2) Epoch 37, batch 2050, loss[loss=0.2477, ctc_loss=0.1685, cr_loss=0.3958, over 19374.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1476, cr_loss=0.3738, over 4092762.57 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:49:38,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.185e+02 2.299e+02 2.491e+02 3.230e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 20:49:52,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=657758.3333333334, ans=0.125 2024-09-17 20:50:03,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=657758.3333333334, ans=0.125 2024-09-17 20:50:09,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=657786.6666666666, ans=0.0 2024-09-17 20:50:12,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=657786.6666666666, ans=0.0 2024-09-17 20:50:24,429 INFO [train.py:1198] (1/2) Epoch 37, batch 2100, loss[loss=0.2241, ctc_loss=0.1474, cr_loss=0.3834, over 21076.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3716, over 4101200.94 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:51:34,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=657928.3333333334, ans=0.125 2024-09-17 20:51:43,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=657956.6666666666, ans=0.05 2024-09-17 20:51:44,757 INFO [train.py:1198] (1/2) Epoch 37, batch 2150, loss[loss=0.2223, ctc_loss=0.1464, cr_loss=0.3797, over 20868.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1471, cr_loss=0.3731, over 4108515.19 frames. ], batch size: 57, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:51:46,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=657956.6666666666, ans=0.5 2024-09-17 20:52:18,843 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.204e+02 2.342e+02 2.507e+02 8.272e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-17 20:52:28,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=22.5 2024-09-17 20:53:03,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=658098.3333333334, ans=0.125 2024-09-17 20:53:04,904 INFO [train.py:1198] (1/2) Epoch 37, batch 2200, loss[loss=0.2146, ctc_loss=0.1422, cr_loss=0.3619, over 20840.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3722, over 4111945.10 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:53:05,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658098.3333333334, ans=0.1 2024-09-17 20:53:29,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=658126.6666666666, ans=0.05 2024-09-17 20:53:33,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-17 20:53:34,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=658155.0, ans=0.2 2024-09-17 20:54:10,843 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:54:21,131 INFO [train.py:1198] (1/2) Epoch 37, batch 2250, loss[loss=0.2435, ctc_loss=0.1633, cr_loss=0.4012, over 19500.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3721, over 4113413.87 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:54:51,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.176e+02 2.296e+02 2.495e+02 3.496e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 20:55:28,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-17 20:55:36,877 INFO [train.py:1198] (1/2) Epoch 37, batch 2300, loss[loss=0.1976, ctc_loss=0.1292, cr_loss=0.3421, over 20967.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.3701, over 4115213.71 frames. ], batch size: 48, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:56:11,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=658438.3333333334, ans=0.125 2024-09-17 20:56:16,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 20:56:21,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=658466.6666666666, ans=0.05 2024-09-17 20:56:55,770 INFO [train.py:1198] (1/2) Epoch 37, batch 2350, loss[loss=0.2164, ctc_loss=0.1433, cr_loss=0.3656, over 20996.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3694, over 4107608.58 frames. ], batch size: 55, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 20:57:27,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.176e+02 2.299e+02 2.422e+02 3.074e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-17 20:57:32,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=658580.0, ans=0.2 2024-09-17 20:57:40,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-17 20:58:14,450 INFO [train.py:1198] (1/2) Epoch 37, batch 2400, loss[loss=0.205, ctc_loss=0.1326, cr_loss=0.3623, over 20999.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3696, over 4103487.06 frames. ], batch size: 51, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:58:36,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658693.3333333334, ans=0.1 2024-09-17 20:58:40,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=658693.3333333334, ans=0.125 2024-09-17 20:58:51,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=658721.6666666666, ans=0.2 2024-09-17 20:59:21,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=658778.3333333334, ans=0.125 2024-09-17 20:59:30,679 INFO [train.py:1198] (1/2) Epoch 37, batch 2450, loss[loss=0.2145, ctc_loss=0.145, cr_loss=0.3476, over 21009.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3702, over 4110859.46 frames. ], batch size: 63, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:59:54,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-17 20:59:57,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=658835.0, ans=0.125 2024-09-17 21:00:02,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.183e+02 2.326e+02 2.502e+02 3.624e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-17 21:00:12,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=658863.3333333334, ans=0.0 2024-09-17 21:00:47,115 INFO [train.py:1198] (1/2) Epoch 37, batch 2500, loss[loss=0.2316, ctc_loss=0.1541, cr_loss=0.3875, over 19985.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3693, over 4119622.16 frames. ], batch size: 80, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:00:50,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=658948.3333333334, ans=0.025 2024-09-17 21:00:52,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658948.3333333334, ans=0.125 2024-09-17 21:01:49,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659061.6666666666, ans=0.125 2024-09-17 21:01:53,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-17 21:01:54,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=659061.6666666666, ans=0.125 2024-09-17 21:01:58,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=659061.6666666666, ans=0.2 2024-09-17 21:02:03,051 INFO [train.py:1198] (1/2) Epoch 37, batch 2550, loss[loss=0.178, ctc_loss=0.1181, cr_loss=0.2992, over 20959.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3695, over 4113886.28 frames. ], batch size: 48, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:02:09,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=659090.0, ans=0.125 2024-09-17 21:02:32,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=659118.3333333334, ans=0.125 2024-09-17 21:02:38,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.191e+02 2.323e+02 2.469e+02 4.775e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-17 21:02:53,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659175.0, ans=0.1 2024-09-17 21:03:05,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659203.3333333334, ans=0.1 2024-09-17 21:03:15,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=659203.3333333334, ans=0.07 2024-09-17 21:03:21,493 INFO [train.py:1198] (1/2) Epoch 37, batch 2600, loss[loss=0.2111, ctc_loss=0.139, cr_loss=0.3609, over 20786.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3697, over 4113886.07 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:03:38,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=659260.0, ans=0.0 2024-09-17 21:03:53,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=659288.3333333334, ans=0.0 2024-09-17 21:04:01,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=659288.3333333334, ans=0.0 2024-09-17 21:04:09,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659316.6666666666, ans=0.1 2024-09-17 21:04:11,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659316.6666666666, ans=0.1 2024-09-17 21:04:39,734 INFO [train.py:1198] (1/2) Epoch 37, batch 2650, loss[loss=0.2384, ctc_loss=0.1615, cr_loss=0.3844, over 19378.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3708, over 4115721.07 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:04:52,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659373.3333333334, ans=0.125 2024-09-17 21:05:00,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-17 21:05:11,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.203e+02 2.339e+02 2.518e+02 3.190e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-17 21:05:55,940 INFO [train.py:1198] (1/2) Epoch 37, batch 2700, loss[loss=0.2377, ctc_loss=0.1565, cr_loss=0.4058, over 20853.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3701, over 4114691.86 frames. ], batch size: 65, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:06:32,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=659571.6666666666, ans=0.0 2024-09-17 21:06:42,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=12.0 2024-09-17 21:06:47,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-17 21:06:50,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659600.0, ans=0.125 2024-09-17 21:07:10,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=659656.6666666666, ans=0.04949747468305833 2024-09-17 21:07:11,181 INFO [train.py:1198] (1/2) Epoch 37, batch 2750, loss[loss=0.1986, ctc_loss=0.1308, cr_loss=0.3393, over 20801.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.147, cr_loss=0.3725, over 4091439.04 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:07:17,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=659656.6666666666, ans=0.125 2024-09-17 21:07:19,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=659656.6666666666, ans=0.0 2024-09-17 21:07:42,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.226e+02 2.361e+02 2.574e+02 3.530e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 21:08:17,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-17 21:08:27,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=659770.0, ans=0.125 2024-09-17 21:08:29,836 INFO [train.py:1198] (1/2) Epoch 37, batch 2800, loss[loss=0.2362, ctc_loss=0.1547, cr_loss=0.4071, over 20667.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3727, over 4094404.75 frames. ], batch size: 66, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:08:48,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-17 21:08:54,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=659826.6666666666, ans=0.2 2024-09-17 21:09:25,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=659883.3333333334, ans=0.0 2024-09-17 21:09:35,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-17 21:09:48,087 INFO [train.py:1198] (1/2) Epoch 37, batch 2850, loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3782, over 19296.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3724, over 4097242.41 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:09:57,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=659940.0, ans=0.0 2024-09-17 21:10:07,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-17 21:10:20,128 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.206e+02 2.347e+02 2.475e+02 3.246e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-17 21:10:22,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659996.6666666666, ans=0.1 2024-09-17 21:10:50,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=660053.3333333334, ans=0.125 2024-09-17 21:11:04,114 INFO [train.py:1198] (1/2) Epoch 37, batch 2900, loss[loss=0.208, ctc_loss=0.1389, cr_loss=0.3452, over 20879.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3725, over 4094367.35 frames. ], batch size: 57, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:11:26,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=660110.0, ans=0.0 2024-09-17 21:11:29,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=660110.0, ans=0.125 2024-09-17 21:11:32,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=660138.3333333334, ans=0.2 2024-09-17 21:11:41,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=660138.3333333334, ans=0.0 2024-09-17 21:12:19,589 INFO [train.py:1198] (1/2) Epoch 37, batch 2950, loss[loss=0.189, ctc_loss=0.1237, cr_loss=0.3265, over 21035.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3725, over 4096611.16 frames. ], batch size: 62, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:12:32,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660223.3333333334, ans=0.1 2024-09-17 21:12:36,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=660251.6666666666, ans=0.035 2024-09-17 21:12:50,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660280.0, ans=0.1 2024-09-17 21:12:51,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.251e+02 2.373e+02 2.516e+02 7.516e+02, threshold=4.746e+02, percent-clipped=1.0 2024-09-17 21:13:21,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=660336.6666666666, ans=22.5 2024-09-17 21:13:36,215 INFO [train.py:1198] (1/2) Epoch 37, batch 3000, loss[loss=0.2127, ctc_loss=0.1388, cr_loss=0.3698, over 20785.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.147, cr_loss=0.3721, over 4096598.66 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:13:36,216 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 21:13:50,909 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4390, 5.5425, 4.9195, 5.2199], device='cuda:1') 2024-09-17 21:13:59,220 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.04025, ctc_loss=0.04025, cr_loss=1.378e-14, over 944034.00 frames. 2024-09-17 21:13:59,221 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 21:14:07,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=660365.0, ans=0.125 2024-09-17 21:14:17,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=660393.3333333334, ans=0.125 2024-09-17 21:14:20,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=660393.3333333334, ans=0.025 2024-09-17 21:14:31,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=660421.6666666666, ans=0.2 2024-09-17 21:14:40,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660421.6666666666, ans=0.125 2024-09-17 21:14:49,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=660450.0, ans=0.125 2024-09-17 21:14:58,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=660450.0, ans=0.2 2024-09-17 21:15:15,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=660478.3333333334, ans=0.025 2024-09-17 21:15:17,864 INFO [train.py:1198] (1/2) Epoch 37, batch 3050, loss[loss=0.2508, ctc_loss=0.1757, cr_loss=0.3754, over 14365.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3714, over 4088543.19 frames. ], batch size: 149, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:15:41,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=660535.0, ans=0.02 2024-09-17 21:15:49,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.195e+02 2.292e+02 2.443e+02 3.169e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-17 21:15:55,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=660563.3333333334, ans=0.0 2024-09-17 21:16:30,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=660620.0, ans=0.125 2024-09-17 21:16:33,021 INFO [train.py:1198] (1/2) Epoch 37, batch 3100, loss[loss=0.2286, ctc_loss=0.1523, cr_loss=0.3816, over 21006.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3707, over 4076999.88 frames. ], batch size: 61, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:16:47,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=660676.6666666666, ans=0.125 2024-09-17 21:16:47,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=660676.6666666666, ans=0.0 2024-09-17 21:16:53,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-09-17 21:16:54,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=660676.6666666666, ans=0.125 2024-09-17 21:17:02,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=660705.0, ans=0.025 2024-09-17 21:17:29,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=660733.3333333334, ans=0.025 2024-09-17 21:17:47,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=660790.0, ans=0.125 2024-09-17 21:17:47,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=660790.0, ans=0.125 2024-09-17 21:17:48,598 INFO [train.py:1198] (1/2) Epoch 37, batch 3150, loss[loss=0.2336, ctc_loss=0.157, cr_loss=0.3835, over 21064.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.371, over 4077874.11 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:17:56,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=660790.0, ans=0.125 2024-09-17 21:18:20,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.248e+02 2.337e+02 2.522e+02 3.537e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-17 21:18:40,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-17 21:18:53,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660903.3333333334, ans=0.125 2024-09-17 21:19:03,924 INFO [train.py:1198] (1/2) Epoch 37, batch 3200, loss[loss=0.1911, ctc_loss=0.1253, cr_loss=0.3289, over 20981.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3712, over 4082353.96 frames. ], batch size: 49, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:19:18,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=660960.0, ans=0.125 2024-09-17 21:19:22,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660960.0, ans=0.1 2024-09-17 21:19:50,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-09-17 21:19:53,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=661016.6666666666, ans=0.2 2024-09-17 21:20:08,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=661016.6666666666, ans=0.125 2024-09-17 21:20:08,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=22.5 2024-09-17 21:20:12,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=661045.0, ans=0.2 2024-09-17 21:20:25,877 INFO [train.py:1198] (1/2) Epoch 37, batch 3250, loss[loss=0.219, ctc_loss=0.147, cr_loss=0.3598, over 21075.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3727, over 4081861.57 frames. ], batch size: 59, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:20:26,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=661073.3333333334, ans=0.0 2024-09-17 21:20:57,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.220e+02 2.360e+02 2.568e+02 3.393e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 21:21:27,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=661186.6666666666, ans=0.2 2024-09-17 21:21:40,926 INFO [train.py:1198] (1/2) Epoch 37, batch 3300, loss[loss=0.2291, ctc_loss=0.1512, cr_loss=0.3897, over 20883.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3717, over 4085630.42 frames. ], batch size: 57, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:21:44,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=661215.0, ans=0.07 2024-09-17 21:21:57,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=661243.3333333334, ans=0.125 2024-09-17 21:22:09,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-17 21:22:40,682 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:22:40,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-17 21:22:57,083 INFO [train.py:1198] (1/2) Epoch 37, batch 3350, loss[loss=0.2381, ctc_loss=0.158, cr_loss=0.4007, over 20728.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3724, over 4085320.62 frames. ], batch size: 71, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:23:28,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.209e+02 2.367e+02 2.646e+02 4.774e+02, threshold=4.733e+02, percent-clipped=1.0 2024-09-17 21:23:28,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=661413.3333333334, ans=0.025 2024-09-17 21:23:50,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661441.6666666666, ans=0.0 2024-09-17 21:24:08,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=661470.0, ans=0.0 2024-09-17 21:24:12,336 INFO [train.py:1198] (1/2) Epoch 37, batch 3400, loss[loss=0.1885, ctc_loss=0.1245, cr_loss=0.32, over 20986.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3711, over 4088391.49 frames. ], batch size: 48, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:25:31,122 INFO [train.py:1198] (1/2) Epoch 37, batch 3450, loss[loss=0.2177, ctc_loss=0.1457, cr_loss=0.3598, over 20993.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3717, over 4094915.06 frames. ], batch size: 61, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:25:32,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=661640.0, ans=0.0 2024-09-17 21:25:50,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=661668.3333333334, ans=0.0 2024-09-17 21:26:05,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.160e+02 2.319e+02 2.439e+02 5.227e+02, threshold=4.638e+02, percent-clipped=1.0 2024-09-17 21:26:45,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=661753.3333333334, ans=0.0 2024-09-17 21:26:49,572 INFO [train.py:1198] (1/2) Epoch 37, batch 3500, loss[loss=0.1918, ctc_loss=0.1242, cr_loss=0.338, over 20995.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1466, cr_loss=0.3722, over 4101544.64 frames. ], batch size: 51, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:26:59,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661781.6666666666, ans=0.125 2024-09-17 21:26:59,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=661781.6666666666, ans=0.025 2024-09-17 21:27:42,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=661866.6666666666, ans=0.04949747468305833 2024-09-17 21:28:05,443 INFO [train.py:1198] (1/2) Epoch 37, batch 3550, loss[loss=0.2234, ctc_loss=0.1484, cr_loss=0.375, over 21060.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3708, over 4101843.97 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:28:20,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=661951.6666666666, ans=0.125 2024-09-17 21:28:22,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2024-09-17 21:28:36,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.214e+02 2.311e+02 2.470e+02 4.177e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 21:28:50,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662008.3333333334, ans=0.125 2024-09-17 21:28:55,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662008.3333333334, ans=0.1 2024-09-17 21:29:02,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662008.3333333334, ans=0.125 2024-09-17 21:29:20,394 INFO [train.py:1198] (1/2) Epoch 37, batch 3600, loss[loss=0.2371, ctc_loss=0.16, cr_loss=0.3856, over 20654.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3715, over 4085148.33 frames. ], batch size: 68, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:29:45,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=662093.3333333334, ans=0.2 2024-09-17 21:29:52,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=662121.6666666666, ans=0.125 2024-09-17 21:30:28,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-09-17 21:30:36,586 INFO [train.py:1198] (1/2) Epoch 37, batch 3650, loss[loss=0.2223, ctc_loss=0.1473, cr_loss=0.3748, over 20955.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3706, over 4079600.90 frames. ], batch size: 58, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:30:56,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=662235.0, ans=0.025 2024-09-17 21:31:07,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=662235.0, ans=0.125 2024-09-17 21:31:12,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.218e+02 2.366e+02 2.463e+02 4.063e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-17 21:31:58,432 INFO [train.py:1198] (1/2) Epoch 37, batch 3700, loss[loss=0.2351, ctc_loss=0.1556, cr_loss=0.3977, over 20975.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.371, over 4085411.95 frames. ], batch size: 55, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:32:24,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662376.6666666666, ans=0.1 2024-09-17 21:32:25,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=662376.6666666666, ans=0.125 2024-09-17 21:32:26,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662376.6666666666, ans=0.125 2024-09-17 21:32:41,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=662405.0, ans=0.0 2024-09-17 21:33:14,502 INFO [train.py:1198] (1/2) Epoch 37, batch 3750, loss[loss=0.2287, ctc_loss=0.1538, cr_loss=0.3744, over 19421.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3703, over 4098323.73 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:33:16,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=22.5 2024-09-17 21:33:49,432 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.215e+02 2.319e+02 2.460e+02 3.182e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-17 21:33:55,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662546.6666666666, ans=0.1 2024-09-17 21:34:01,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662575.0, ans=0.125 2024-09-17 21:34:10,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-17 21:34:19,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=662603.3333333334, ans=0.125 2024-09-17 21:34:29,813 INFO [train.py:1198] (1/2) Epoch 37, batch 3800, loss[loss=0.1918, ctc_loss=0.1256, cr_loss=0.3309, over 21007.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3709, over 4095159.07 frames. ], batch size: 50, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:34:33,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-17 21:34:34,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=662631.6666666666, ans=0.125 2024-09-17 21:34:54,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=662660.0, ans=0.125 2024-09-17 21:35:07,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=662688.3333333334, ans=0.0 2024-09-17 21:35:16,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=662716.6666666666, ans=0.1 2024-09-17 21:35:21,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=662716.6666666666, ans=0.0 2024-09-17 21:35:24,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=662716.6666666666, ans=0.025 2024-09-17 21:35:26,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=662716.6666666666, ans=0.0 2024-09-17 21:35:29,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=662745.0, ans=0.0 2024-09-17 21:35:30,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662745.0, ans=0.0 2024-09-17 21:35:34,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-17 21:35:35,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662745.0, ans=0.1 2024-09-17 21:35:44,516 INFO [train.py:1198] (1/2) Epoch 37, batch 3850, loss[loss=0.2208, ctc_loss=0.1459, cr_loss=0.3745, over 20786.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3709, over 4095837.91 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:35:50,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=662773.3333333334, ans=0.125 2024-09-17 21:36:12,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=662801.6666666666, ans=0.125 2024-09-17 21:36:14,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-17 21:36:22,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.195e+02 2.352e+02 2.487e+02 3.809e+02, threshold=4.703e+02, percent-clipped=0.0 2024-09-17 21:37:03,977 INFO [train.py:1198] (1/2) Epoch 37, batch 3900, loss[loss=0.2385, ctc_loss=0.1594, cr_loss=0.3955, over 20830.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.3703, over 4102493.08 frames. ], batch size: 65, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:37:15,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-17 21:37:39,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=662971.6666666666, ans=10.0 2024-09-17 21:37:43,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=662971.6666666666, ans=0.2 2024-09-17 21:38:03,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=663000.0, ans=0.125 2024-09-17 21:38:04,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=663000.0, ans=0.025 2024-09-17 21:38:22,841 INFO [train.py:1198] (1/2) Epoch 37, batch 3950, loss[loss=0.207, ctc_loss=0.1379, cr_loss=0.3455, over 20927.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3707, over 4115598.53 frames. ], batch size: 60, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:38:39,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=663085.0, ans=0.2 2024-09-17 21:38:39,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=663085.0, ans=0.2 2024-09-17 21:38:53,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663113.3333333334, ans=0.1 2024-09-17 21:38:57,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.207e+02 2.311e+02 2.528e+02 5.049e+02, threshold=4.622e+02, percent-clipped=1.0 2024-09-17 21:39:35,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663170.0, ans=0.1 2024-09-17 21:39:38,194 INFO [train.py:1198] (1/2) Epoch 37, batch 4000, loss[loss=0.217, ctc_loss=0.1441, cr_loss=0.3648, over 21073.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3704, over 4110746.56 frames. ], batch size: 59, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:40:07,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=663255.0, ans=0.0 2024-09-17 21:40:36,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=663283.3333333334, ans=0.0 2024-09-17 21:40:49,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=663311.6666666666, ans=0.0 2024-09-17 21:40:53,750 INFO [train.py:1198] (1/2) Epoch 37, batch 4050, loss[loss=0.217, ctc_loss=0.1463, cr_loss=0.3534, over 20926.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3713, over 4102972.80 frames. ], batch size: 60, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:40:55,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-17 21:41:16,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=663368.3333333334, ans=0.2 2024-09-17 21:41:22,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=663396.6666666666, ans=0.04949747468305833 2024-09-17 21:41:28,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.220e+02 2.429e+02 2.674e+02 7.764e+02, threshold=4.858e+02, percent-clipped=1.0 2024-09-17 21:41:30,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=663396.6666666666, ans=0.2 2024-09-17 21:41:51,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=663425.0, ans=0.0 2024-09-17 21:41:54,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=663453.3333333334, ans=0.125 2024-09-17 21:42:11,853 INFO [train.py:1198] (1/2) Epoch 37, batch 4100, loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.367, over 20963.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.373, over 4101439.30 frames. ], batch size: 52, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:42:36,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=663510.0, ans=0.2 2024-09-17 21:43:30,513 INFO [train.py:1198] (1/2) Epoch 37, batch 4150, loss[loss=0.2499, ctc_loss=0.1682, cr_loss=0.4084, over 18368.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.3728, over 4106085.81 frames. ], batch size: 108, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:44:05,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.187e+02 2.367e+02 2.509e+02 3.633e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-17 21:44:07,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=663680.0, ans=0.125 2024-09-17 21:44:08,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=22.5 2024-09-17 21:44:46,558 INFO [train.py:1198] (1/2) Epoch 37, batch 4200, loss[loss=0.1621, ctc_loss=0.103, cr_loss=0.2954, over 20957.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3716, over 4116155.26 frames. ], batch size: 51, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:44:57,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=663765.0, ans=0.04949747468305833 2024-09-17 21:45:06,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=663793.3333333334, ans=0.125 2024-09-17 21:45:33,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663850.0, ans=0.1 2024-09-17 21:45:34,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-17 21:46:02,042 INFO [train.py:1198] (1/2) Epoch 37, batch 4250, loss[loss=0.2154, ctc_loss=0.1403, cr_loss=0.3754, over 21044.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3713, over 4124079.42 frames. ], batch size: 63, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:46:18,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=663935.0, ans=0.0 2024-09-17 21:46:20,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=663935.0, ans=0.125 2024-09-17 21:46:20,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=663935.0, ans=12.0 2024-09-17 21:46:23,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=663935.0, ans=0.125 2024-09-17 21:46:29,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-17 21:46:36,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.178e+02 2.281e+02 2.405e+02 4.909e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-17 21:47:06,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=664020.0, ans=0.125 2024-09-17 21:47:12,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=664020.0, ans=0.0 2024-09-17 21:47:16,616 INFO [train.py:1198] (1/2) Epoch 37, batch 4300, loss[loss=0.2212, ctc_loss=0.1425, cr_loss=0.3934, over 21083.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1469, cr_loss=0.373, over 4116834.92 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:47:32,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=664076.6666666666, ans=0.07 2024-09-17 21:47:33,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=664076.6666666666, ans=0.2 2024-09-17 21:47:40,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-09-17 21:47:50,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=664105.0, ans=0.025 2024-09-17 21:48:03,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664133.3333333334, ans=0.1 2024-09-17 21:48:12,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=664133.3333333334, ans=0.0 2024-09-17 21:48:37,351 INFO [train.py:1198] (1/2) Epoch 37, batch 4350, loss[loss=0.2272, ctc_loss=0.1488, cr_loss=0.3921, over 20755.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1474, cr_loss=0.3734, over 4110692.93 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:49:03,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=664218.3333333334, ans=0.125 2024-09-17 21:49:11,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.242e+02 2.354e+02 2.561e+02 8.205e+02, threshold=4.709e+02, percent-clipped=1.0 2024-09-17 21:49:52,836 INFO [train.py:1198] (1/2) Epoch 37, batch 4400, loss[loss=0.2142, ctc_loss=0.1433, cr_loss=0.3545, over 20603.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.3729, over 4113135.13 frames. ], batch size: 71, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:50:24,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-17 21:50:31,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664388.3333333334, ans=0.1 2024-09-17 21:50:58,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664445.0, ans=0.0 2024-09-17 21:51:09,067 INFO [train.py:1198] (1/2) Epoch 37, batch 4450, loss[loss=0.2444, ctc_loss=0.1623, cr_loss=0.4103, over 20886.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.3729, over 4105727.76 frames. ], batch size: 54, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:51:27,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=664501.6666666666, ans=0.125 2024-09-17 21:51:33,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=664501.6666666666, ans=0.2 2024-09-17 21:51:43,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.217e+02 2.386e+02 2.583e+02 3.614e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-17 21:52:06,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=664558.3333333334, ans=0.2 2024-09-17 21:52:24,296 INFO [train.py:1198] (1/2) Epoch 37, batch 4500, loss[loss=0.2578, ctc_loss=0.1712, cr_loss=0.433, over 19473.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3737, over 4108169.90 frames. ], batch size: 90, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:53:03,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664671.6666666666, ans=0.125 2024-09-17 21:53:42,692 INFO [train.py:1198] (1/2) Epoch 37, batch 4550, loss[loss=0.2414, ctc_loss=0.1631, cr_loss=0.3915, over 20951.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1477, cr_loss=0.3737, over 4102993.99 frames. ], batch size: 64, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:53:49,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:54:16,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=664813.3333333334, ans=0.125 2024-09-17 21:54:20,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.176e+02 2.287e+02 2.438e+02 2.934e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 21:54:31,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=664841.6666666666, ans=0.2 2024-09-17 21:54:38,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=664841.6666666666, ans=0.125 2024-09-17 21:54:53,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=664870.0, ans=0.0 2024-09-17 21:55:01,073 INFO [train.py:1198] (1/2) Epoch 37, batch 4600, loss[loss=0.1763, ctc_loss=0.1143, cr_loss=0.3098, over 20942.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3717, over 4106224.55 frames. ], batch size: 50, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:55:33,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=664955.0, ans=0.125 2024-09-17 21:55:38,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=664955.0, ans=0.025 2024-09-17 21:55:52,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=664983.3333333334, ans=0.125 2024-09-17 21:56:10,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-17 21:56:16,000 INFO [train.py:1198] (1/2) Epoch 37, batch 4650, loss[loss=0.226, ctc_loss=0.1463, cr_loss=0.3981, over 20682.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3721, over 4106123.40 frames. ], batch size: 68, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:56:51,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.161e+02 2.292e+02 2.525e+02 5.728e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-17 21:57:32,639 INFO [train.py:1198] (1/2) Epoch 37, batch 4700, loss[loss=0.1874, ctc_loss=0.1215, cr_loss=0.3294, over 20956.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3707, over 4114009.25 frames. ], batch size: 49, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:57:51,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=665210.0, ans=0.0 2024-09-17 21:58:01,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=665238.3333333334, ans=0.1 2024-09-17 21:58:20,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=665266.6666666666, ans=0.125 2024-09-17 21:58:48,810 INFO [train.py:1198] (1/2) Epoch 37, batch 4750, loss[loss=0.2312, ctc_loss=0.1506, cr_loss=0.4032, over 20967.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3715, over 4113940.88 frames. ], batch size: 64, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:58:49,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-17 21:58:56,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=665323.3333333334, ans=0.125 2024-09-17 21:59:14,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.34 vs. limit=22.5 2024-09-17 21:59:26,974 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.152e+02 2.270e+02 2.499e+02 4.006e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 21:59:37,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.43 vs. limit=10.0 2024-09-17 21:59:42,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665408.3333333334, ans=0.1 2024-09-17 22:00:00,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=665436.6666666666, ans=0.07 2024-09-17 22:00:11,169 INFO [train.py:1198] (1/2) Epoch 37, batch 4800, loss[loss=0.2423, ctc_loss=0.1634, cr_loss=0.3944, over 19463.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3721, over 4109809.22 frames. ], batch size: 90, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:00:11,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=665465.0, ans=0.125 2024-09-17 22:00:11,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=665465.0, ans=0.125 2024-09-17 22:00:20,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=665465.0, ans=0.5 2024-09-17 22:00:48,072 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:01:16,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-17 22:01:21,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=665578.3333333334, ans=0.125 2024-09-17 22:01:27,384 INFO [train.py:1198] (1/2) Epoch 37, batch 4850, loss[loss=0.2134, ctc_loss=0.1426, cr_loss=0.3538, over 21041.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1474, cr_loss=0.3731, over 4096395.61 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:01:29,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=665606.6666666666, ans=0.2 2024-09-17 22:01:39,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=665606.6666666666, ans=0.2 2024-09-17 22:02:01,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.203e+02 2.325e+02 2.450e+02 3.800e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 22:02:21,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-17 22:02:41,862 INFO [train.py:1198] (1/2) Epoch 37, batch 4900, loss[loss=0.2495, ctc_loss=0.169, cr_loss=0.4024, over 19951.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1486, cr_loss=0.3751, over 4093376.03 frames. ], batch size: 80, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:02:54,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665748.3333333334, ans=0.0 2024-09-17 22:02:57,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=665776.6666666666, ans=0.2 2024-09-17 22:03:17,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=665805.0, ans=0.125 2024-09-17 22:03:55,956 INFO [train.py:1198] (1/2) Epoch 37, batch 4950, loss[loss=0.2465, ctc_loss=0.1663, cr_loss=0.4006, over 20642.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1479, cr_loss=0.3735, over 4092426.49 frames. ], batch size: 68, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:04:30,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.177e+02 2.285e+02 2.547e+02 3.070e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 22:04:41,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=665975.0, ans=0.125 2024-09-17 22:04:58,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.09 vs. limit=10.0 2024-09-17 22:05:02,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=666003.3333333334, ans=0.0 2024-09-17 22:05:08,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=666003.3333333334, ans=0.125 2024-09-17 22:05:11,174 INFO [train.py:1198] (1/2) Epoch 37, batch 5000, loss[loss=0.2322, ctc_loss=0.1543, cr_loss=0.3897, over 20830.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1469, cr_loss=0.3724, over 4103189.92 frames. ], batch size: 59, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:05:17,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=666031.6666666666, ans=0.125 2024-09-17 22:05:42,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666088.3333333334, ans=0.1 2024-09-17 22:05:44,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=666088.3333333334, ans=0.0 2024-09-17 22:06:05,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=666116.6666666666, ans=15.0 2024-09-17 22:06:25,057 INFO [train.py:1198] (1/2) Epoch 37, batch 5050, loss[loss=0.2173, ctc_loss=0.1433, cr_loss=0.3701, over 20973.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3721, over 4099817.60 frames. ], batch size: 55, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:06:53,912 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:06:59,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.161e+02 2.332e+02 2.471e+02 3.064e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-17 22:07:09,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=666230.0, ans=0.125 2024-09-17 22:07:23,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=666258.3333333334, ans=0.09899494936611666 2024-09-17 22:07:42,250 INFO [train.py:1198] (1/2) Epoch 37, batch 5100, loss[loss=0.219, ctc_loss=0.1439, cr_loss=0.3756, over 20985.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3703, over 4106234.29 frames. ], batch size: 64, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:07:43,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=12.0 2024-09-17 22:08:30,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666400.0, ans=0.1 2024-09-17 22:08:58,272 INFO [train.py:1198] (1/2) Epoch 37, batch 5150, loss[loss=0.2523, ctc_loss=0.1715, cr_loss=0.4042, over 18505.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1462, cr_loss=0.3708, over 4088393.70 frames. ], batch size: 108, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:09:01,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=666456.6666666666, ans=0.05 2024-09-17 22:09:17,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=666485.0, ans=0.125 2024-09-17 22:09:26,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666513.3333333334, ans=0.125 2024-09-17 22:09:33,861 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.160e+02 2.290e+02 2.458e+02 3.166e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 22:09:36,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=666513.3333333334, ans=0.0 2024-09-17 22:09:48,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-09-17 22:09:59,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=666570.0, ans=0.125 2024-09-17 22:10:09,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2024-09-17 22:10:12,628 INFO [train.py:1198] (1/2) Epoch 37, batch 5200, loss[loss=0.1866, ctc_loss=0.123, cr_loss=0.3181, over 20989.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3703, over 4086038.74 frames. ], batch size: 51, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:10:50,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=666655.0, ans=0.125 2024-09-17 22:10:54,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666655.0, ans=0.0 2024-09-17 22:11:27,348 INFO [train.py:1198] (1/2) Epoch 37, batch 5250, loss[loss=0.2696, ctc_loss=0.1935, cr_loss=0.3807, over 14401.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3698, over 4093276.62 frames. ], batch size: 150, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:11:48,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=666768.3333333334, ans=0.2 2024-09-17 22:11:54,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=666768.3333333334, ans=0.125 2024-09-17 22:12:02,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.131e+02 2.273e+02 2.440e+02 3.297e+02, threshold=4.545e+02, percent-clipped=0.0 2024-09-17 22:12:26,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666853.3333333334, ans=0.1 2024-09-17 22:12:28,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=666853.3333333334, ans=0.0 2024-09-17 22:12:28,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666853.3333333334, ans=0.0 2024-09-17 22:12:41,256 INFO [train.py:1198] (1/2) Epoch 37, batch 5300, loss[loss=0.1942, ctc_loss=0.127, cr_loss=0.3356, over 20956.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.37, over 4079435.00 frames. ], batch size: 58, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:13:19,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-17 22:13:56,346 INFO [train.py:1198] (1/2) Epoch 37, batch 5350, loss[loss=0.2529, ctc_loss=0.1729, cr_loss=0.4001, over 20671.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1442, cr_loss=0.3673, over 4089327.74 frames. ], batch size: 71, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:14:32,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.196e+02 2.345e+02 2.554e+02 4.902e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-17 22:15:11,268 INFO [train.py:1198] (1/2) Epoch 37, batch 5400, loss[loss=0.2441, ctc_loss=0.1672, cr_loss=0.3846, over 19508.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1447, cr_loss=0.3676, over 4076707.30 frames. ], batch size: 90, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:15:16,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-17 22:15:23,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=667165.0, ans=0.0 2024-09-17 22:15:30,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=667193.3333333334, ans=0.2 2024-09-17 22:15:39,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=667221.6666666666, ans=0.0 2024-09-17 22:15:40,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-17 22:16:00,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=667250.0, ans=0.0 2024-09-17 22:16:06,337 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:16:12,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=667278.3333333334, ans=0.125 2024-09-17 22:16:22,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=667278.3333333334, ans=0.2 2024-09-17 22:16:27,630 INFO [train.py:1198] (1/2) Epoch 37, batch 5450, loss[loss=0.2204, ctc_loss=0.1452, cr_loss=0.3764, over 20772.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.37, over 4080593.38 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:16:30,934 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:16:46,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667335.0, ans=0.1 2024-09-17 22:17:02,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=667363.3333333334, ans=0.125 2024-09-17 22:17:03,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.225e+02 2.339e+02 2.539e+02 5.737e+02, threshold=4.679e+02, percent-clipped=1.0 2024-09-17 22:17:13,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=667391.6666666666, ans=0.0 2024-09-17 22:17:19,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-09-17 22:17:40,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 22:17:42,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-09-17 22:17:44,292 INFO [train.py:1198] (1/2) Epoch 37, batch 5500, loss[loss=0.2488, ctc_loss=0.1672, cr_loss=0.4081, over 21039.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1459, cr_loss=0.3697, over 4086996.16 frames. ], batch size: 62, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:18:27,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=667533.3333333334, ans=0.125 2024-09-17 22:18:58,544 INFO [train.py:1198] (1/2) Epoch 37, batch 5550, loss[loss=0.2187, ctc_loss=0.1426, cr_loss=0.3807, over 20989.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3711, over 4091157.47 frames. ], batch size: 55, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:19:07,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=667590.0, ans=0.04949747468305833 2024-09-17 22:19:16,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=667618.3333333334, ans=0.0 2024-09-17 22:19:34,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.193e+02 2.325e+02 2.504e+02 3.181e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 22:20:08,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=667703.3333333334, ans=0.0 2024-09-17 22:20:13,120 INFO [train.py:1198] (1/2) Epoch 37, batch 5600, loss[loss=0.2243, ctc_loss=0.1479, cr_loss=0.3817, over 21060.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.37, over 4103755.97 frames. ], batch size: 62, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:20:18,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=22.5 2024-09-17 22:20:32,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667760.0, ans=0.1 2024-09-17 22:20:42,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-09-17 22:20:46,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=667788.3333333334, ans=0.125 2024-09-17 22:21:07,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=667816.6666666666, ans=0.0 2024-09-17 22:21:25,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-09-17 22:21:25,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-09-17 22:21:27,417 INFO [train.py:1198] (1/2) Epoch 37, batch 5650, loss[loss=0.2411, ctc_loss=0.1624, cr_loss=0.3932, over 19964.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3709, over 4094927.78 frames. ], batch size: 80, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:21:30,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=667873.3333333334, ans=0.0 2024-09-17 22:21:36,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=667873.3333333334, ans=0.125 2024-09-17 22:21:46,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=667901.6666666666, ans=0.2 2024-09-17 22:22:02,975 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.187e+02 2.295e+02 2.506e+02 3.366e+02, threshold=4.589e+02, percent-clipped=0.0 2024-09-17 22:22:28,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=667986.6666666666, ans=0.0 2024-09-17 22:22:33,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=667986.6666666666, ans=0.125 2024-09-17 22:22:42,200 INFO [train.py:1198] (1/2) Epoch 37, batch 5700, loss[loss=0.2662, ctc_loss=0.1877, cr_loss=0.3927, over 15048.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1468, cr_loss=0.3706, over 4084267.75 frames. ], batch size: 150, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:22:57,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=668043.3333333334, ans=0.2 2024-09-17 22:23:12,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=668071.6666666666, ans=0.2 2024-09-17 22:23:54,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-09-17 22:23:57,262 INFO [train.py:1198] (1/2) Epoch 37, batch 5750, loss[loss=0.2384, ctc_loss=0.1597, cr_loss=0.3938, over 20294.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3708, over 4093764.06 frames. ], batch size: 74, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:24:31,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=668213.3333333334, ans=0.125 2024-09-17 22:24:34,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.198e+02 2.336e+02 2.545e+02 7.095e+02, threshold=4.671e+02, percent-clipped=1.0 2024-09-17 22:24:42,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=668241.6666666666, ans=0.2 2024-09-17 22:24:53,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=668241.6666666666, ans=0.125 2024-09-17 22:24:56,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=668241.6666666666, ans=0.0 2024-09-17 22:25:06,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-17 22:25:12,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=668298.3333333334, ans=0.125 2024-09-17 22:25:13,538 INFO [train.py:1198] (1/2) Epoch 37, batch 5800, loss[loss=0.2103, ctc_loss=0.1395, cr_loss=0.3539, over 21075.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3698, over 4094394.67 frames. ], batch size: 53, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:25:13,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=668298.3333333334, ans=0.0 2024-09-17 22:25:18,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=668298.3333333334, ans=0.125 2024-09-17 22:25:24,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=668298.3333333334, ans=0.0 2024-09-17 22:26:16,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=668411.6666666666, ans=0.0 2024-09-17 22:26:30,766 INFO [train.py:1198] (1/2) Epoch 37, batch 5850, loss[loss=0.2312, ctc_loss=0.1527, cr_loss=0.3925, over 21067.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3706, over 4090184.61 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:26:34,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=668440.0, ans=0.0 2024-09-17 22:26:35,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=668440.0, ans=0.125 2024-09-17 22:26:46,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668468.3333333334, ans=0.125 2024-09-17 22:26:52,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=668468.3333333334, ans=0.125 2024-09-17 22:27:08,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.226e+02 2.364e+02 2.504e+02 4.614e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-17 22:27:11,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668496.6666666666, ans=0.125 2024-09-17 22:27:15,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668525.0, ans=0.125 2024-09-17 22:27:27,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=668525.0, ans=0.125 2024-09-17 22:27:31,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.87 vs. limit=15.0 2024-09-17 22:27:33,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=668553.3333333334, ans=0.0 2024-09-17 22:27:39,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=668553.3333333334, ans=0.025 2024-09-17 22:27:45,345 INFO [train.py:1198] (1/2) Epoch 37, batch 5900, loss[loss=0.2128, ctc_loss=0.1418, cr_loss=0.3549, over 21046.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3713, over 4077879.47 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:28:02,043 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:28:31,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=87.72 vs. limit=15.0 2024-09-17 22:28:41,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668666.6666666666, ans=0.1 2024-09-17 22:28:42,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=668666.6666666666, ans=0.125 2024-09-17 22:28:44,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=668695.0, ans=0.0 2024-09-17 22:28:54,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668695.0, ans=0.125 2024-09-17 22:29:00,184 INFO [train.py:1198] (1/2) Epoch 37, batch 5950, loss[loss=0.233, ctc_loss=0.1568, cr_loss=0.3811, over 20866.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3732, over 4083916.06 frames. ], batch size: 57, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:29:04,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=668723.3333333334, ans=0.07 2024-09-17 22:29:12,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=668723.3333333334, ans=0.2 2024-09-17 22:29:37,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.245e+02 2.386e+02 2.631e+02 5.409e+02, threshold=4.771e+02, percent-clipped=1.0 2024-09-17 22:30:04,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=668836.6666666666, ans=0.05 2024-09-17 22:30:15,177 INFO [train.py:1198] (1/2) Epoch 37, batch 6000, loss[loss=0.2137, ctc_loss=0.1425, cr_loss=0.3563, over 21015.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.372, over 4085256.74 frames. ], batch size: 61, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:30:15,177 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 22:30:38,358 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.03929, ctc_loss=0.03929, cr_loss=1.435e-14, over 944034.00 frames. 2024-09-17 22:30:38,359 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 22:30:54,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=22.5 2024-09-17 22:31:41,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=668978.3333333334, ans=0.2 2024-09-17 22:31:53,371 INFO [train.py:1198] (1/2) Epoch 37, batch 6050, loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3805, over 20691.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3719, over 4087976.62 frames. ], batch size: 66, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:31:55,317 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:32:13,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=669035.0, ans=10.0 2024-09-17 22:32:32,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.234e+02 2.378e+02 2.546e+02 3.378e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-17 22:32:32,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=669063.3333333334, ans=0.125 2024-09-17 22:32:34,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.68 vs. limit=10.0 2024-09-17 22:33:09,903 INFO [train.py:1198] (1/2) Epoch 37, batch 6100, loss[loss=0.1977, ctc_loss=0.1317, cr_loss=0.3296, over 20975.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3707, over 4092971.49 frames. ], batch size: 52, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:33:13,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=669148.3333333334, ans=15.0 2024-09-17 22:33:21,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669148.3333333334, ans=0.125 2024-09-17 22:33:30,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=669176.6666666666, ans=0.025 2024-09-17 22:33:55,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669233.3333333334, ans=0.125 2024-09-17 22:33:55,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=669233.3333333334, ans=0.07 2024-09-17 22:34:25,358 INFO [train.py:1198] (1/2) Epoch 37, batch 6150, loss[loss=0.2349, ctc_loss=0.1554, cr_loss=0.3974, over 20866.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3715, over 4089242.48 frames. ], batch size: 57, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:34:30,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669290.0, ans=0.125 2024-09-17 22:34:43,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669318.3333333334, ans=0.125 2024-09-17 22:34:51,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=669318.3333333334, ans=0.125 2024-09-17 22:35:02,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.191e+02 2.355e+02 2.514e+02 3.195e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-17 22:35:06,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=669346.6666666666, ans=0.0 2024-09-17 22:35:19,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=669375.0, ans=0.0 2024-09-17 22:35:24,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669403.3333333334, ans=0.125 2024-09-17 22:35:40,040 INFO [train.py:1198] (1/2) Epoch 37, batch 6200, loss[loss=0.2679, ctc_loss=0.1825, cr_loss=0.4267, over 18651.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3705, over 4066794.53 frames. ], batch size: 108, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:36:32,702 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-17 22:36:44,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=669545.0, ans=0.0 2024-09-17 22:36:45,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-09-17 22:36:46,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=669545.0, ans=0.125 2024-09-17 22:36:55,267 INFO [train.py:1198] (1/2) Epoch 37, batch 6250, loss[loss=0.2462, ctc_loss=0.1658, cr_loss=0.4024, over 18256.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3711, over 4025256.17 frames. ], batch size: 109, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:37:21,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=669601.6666666666, ans=0.09899494936611666 2024-09-17 22:37:25,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=669630.0, ans=0.05 2024-09-17 22:37:33,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.166e+02 2.359e+02 2.612e+02 4.310e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 22:38:10,721 INFO [train.py:1198] (1/2) Epoch 37, batch 6300, loss[loss=0.2726, ctc_loss=0.1866, cr_loss=0.4301, over 14302.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1467, cr_loss=0.3692, over 3987694.34 frames. ], batch size: 149, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:39:01,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=669800.0, ans=0.2 2024-09-17 22:39:23,804 INFO [train.py:1198] (1/2) Epoch 37, batch 6350, loss[loss=0.2545, ctc_loss=0.175, cr_loss=0.3974, over 14713.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1479, cr_loss=0.3702, over 3926917.93 frames. ], batch size: 149, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:39:49,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=22.5 2024-09-17 22:39:53,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=669913.3333333334, ans=0.0 2024-09-17 22:40:00,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.350e+02 2.481e+02 2.801e+02 3.678e+02, threshold=4.962e+02, percent-clipped=0.0 2024-09-17 22:40:12,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 22:40:13,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=669941.6666666666, ans=0.125 2024-09-17 22:40:16,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-17 22:41:13,632 INFO [train.py:1198] (1/2) Epoch 38, batch 0, loss[loss=0.2227, ctc_loss=0.1442, cr_loss=0.3925, over 21056.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1442, cr_loss=0.3925, over 21056.00 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:41:13,632 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 22:41:32,178 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.03921, ctc_loss=0.03921, cr_loss=1.412e-14, over 944034.00 frames. 2024-09-17 22:41:32,179 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 22:41:32,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-17 22:41:44,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669972.8333333334, ans=0.1 2024-09-17 22:41:46,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670001.1666666666, ans=0.0 2024-09-17 22:41:55,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=670001.1666666666, ans=0.2 2024-09-17 22:42:01,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670029.5, ans=0.0 2024-09-17 22:42:10,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=670029.5, ans=0.125 2024-09-17 22:42:47,764 INFO [train.py:1198] (1/2) Epoch 38, batch 50, loss[loss=0.2623, ctc_loss=0.1768, cr_loss=0.4275, over 18268.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1437, cr_loss=0.3658, over 913228.50 frames. ], batch size: 108, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:42:58,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2024-09-17 22:43:06,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=670142.8333333334, ans=0.2 2024-09-17 22:43:07,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=670142.8333333334, ans=0.2 2024-09-17 22:43:15,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670142.8333333334, ans=0.1 2024-09-17 22:43:38,568 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.211e+02 2.385e+02 2.550e+02 3.140e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-17 22:43:47,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=670227.8333333334, ans=0.035 2024-09-17 22:44:00,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=670227.8333333334, ans=0.125 2024-09-17 22:44:02,853 INFO [train.py:1198] (1/2) Epoch 38, batch 100, loss[loss=0.2265, ctc_loss=0.1493, cr_loss=0.3863, over 20653.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3724, over 1602014.67 frames. ], batch size: 66, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:44:25,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670284.5, ans=0.1 2024-09-17 22:44:33,584 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:44:35,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-17 22:44:51,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=670341.1666666666, ans=0.0 2024-09-17 22:45:21,397 INFO [train.py:1198] (1/2) Epoch 38, batch 150, loss[loss=0.2011, ctc_loss=0.1323, cr_loss=0.3437, over 20781.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3726, over 2166936.16 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:45:21,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=670397.8333333334, ans=0.05 2024-09-17 22:46:01,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=670454.5, ans=0.125 2024-09-17 22:46:10,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=670482.8333333334, ans=0.0 2024-09-17 22:46:16,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.222e+02 2.379e+02 2.547e+02 3.122e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-17 22:46:26,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-17 22:46:30,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670511.1666666666, ans=0.0 2024-09-17 22:46:40,675 INFO [train.py:1198] (1/2) Epoch 38, batch 200, loss[loss=0.2128, ctc_loss=0.1388, cr_loss=0.3702, over 20996.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1463, cr_loss=0.3721, over 2602084.49 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:47:26,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670624.5, ans=0.1 2024-09-17 22:47:32,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=670624.5, ans=0.0 2024-09-17 22:47:56,783 INFO [train.py:1198] (1/2) Epoch 38, batch 250, loss[loss=0.2077, ctc_loss=0.1377, cr_loss=0.3504, over 21011.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3693, over 2947125.03 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:48:25,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=670737.8333333334, ans=0.1 2024-09-17 22:48:36,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=670737.8333333334, ans=0.125 2024-09-17 22:48:40,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=670766.1666666666, ans=0.125 2024-09-17 22:48:48,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.159e+02 2.295e+02 2.480e+02 4.048e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 22:49:12,043 INFO [train.py:1198] (1/2) Epoch 38, batch 300, loss[loss=0.2235, ctc_loss=0.1454, cr_loss=0.3906, over 20971.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3714, over 3204090.06 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:49:15,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=670822.8333333334, ans=0.025 2024-09-17 22:49:39,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=670851.1666666666, ans=0.125 2024-09-17 22:49:42,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=670879.5, ans=0.125 2024-09-17 22:49:53,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-17 22:50:00,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.94 vs. limit=10.0 2024-09-17 22:50:09,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-17 22:50:26,989 INFO [train.py:1198] (1/2) Epoch 38, batch 350, loss[loss=0.2213, ctc_loss=0.1445, cr_loss=0.3838, over 20790.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.3719, over 3397203.78 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:50:36,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=670964.5, ans=0.125 2024-09-17 22:50:38,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-09-17 22:51:09,885 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:51:24,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.251e+02 2.372e+02 2.515e+02 3.125e+02, threshold=4.745e+02, percent-clipped=0.0 2024-09-17 22:51:26,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671049.5, ans=0.1 2024-09-17 22:51:49,265 INFO [train.py:1198] (1/2) Epoch 38, batch 400, loss[loss=0.2568, ctc_loss=0.1732, cr_loss=0.4181, over 20863.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1469, cr_loss=0.3721, over 3559667.54 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:52:00,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=671106.1666666666, ans=0.0 2024-09-17 22:52:04,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671134.5, ans=0.125 2024-09-17 22:52:12,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=671134.5, ans=0.2 2024-09-17 22:52:48,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671219.5, ans=0.1 2024-09-17 22:52:50,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=671219.5, ans=0.125 2024-09-17 22:52:54,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=671219.5, ans=0.2 2024-09-17 22:53:02,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=671219.5, ans=0.125 2024-09-17 22:53:04,956 INFO [train.py:1198] (1/2) Epoch 38, batch 450, loss[loss=0.2328, ctc_loss=0.1587, cr_loss=0.3709, over 20026.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.372, over 3673633.01 frames. ], batch size: 80, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:53:22,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-17 22:53:26,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=671276.1666666666, ans=0.0 2024-09-17 22:53:29,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=671276.1666666666, ans=0.2 2024-09-17 22:53:31,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-09-17 22:53:56,314 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.208e+02 2.337e+02 2.516e+02 3.232e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-17 22:53:59,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=671332.8333333334, ans=0.125 2024-09-17 22:54:20,278 INFO [train.py:1198] (1/2) Epoch 38, batch 500, loss[loss=0.2142, ctc_loss=0.139, cr_loss=0.3756, over 20783.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3729, over 3774397.46 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:54:28,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=671389.5, ans=0.07 2024-09-17 22:54:55,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=671446.1666666666, ans=0.1 2024-09-17 22:55:04,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=671474.5, ans=0.2 2024-09-17 22:55:19,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=671502.8333333334, ans=0.125 2024-09-17 22:55:35,785 INFO [train.py:1198] (1/2) Epoch 38, batch 550, loss[loss=0.225, ctc_loss=0.1511, cr_loss=0.3693, over 19452.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.148, cr_loss=0.3747, over 3852056.42 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:56:04,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671587.8333333334, ans=0.1 2024-09-17 22:56:26,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.208e+02 2.346e+02 2.515e+02 4.379e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 22:56:48,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=671644.5, ans=0.0 2024-09-17 22:56:53,721 INFO [train.py:1198] (1/2) Epoch 38, batch 600, loss[loss=0.1798, ctc_loss=0.1187, cr_loss=0.3056, over 20971.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1482, cr_loss=0.3752, over 3900542.90 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:57:19,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=671701.1666666666, ans=0.125 2024-09-17 22:57:22,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=671701.1666666666, ans=0.0 2024-09-17 22:58:12,047 INFO [train.py:1198] (1/2) Epoch 38, batch 650, loss[loss=0.2321, ctc_loss=0.1529, cr_loss=0.3959, over 20686.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.148, cr_loss=0.3752, over 3954721.02 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:58:12,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=671814.5, ans=0.125 2024-09-17 22:58:28,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=671842.8333333334, ans=0.125 2024-09-17 22:58:52,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=671871.1666666666, ans=0.2 2024-09-17 22:58:57,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=671899.5, ans=0.125 2024-09-17 22:59:02,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.175e+02 2.346e+02 2.642e+02 3.414e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-17 22:59:27,137 INFO [train.py:1198] (1/2) Epoch 38, batch 700, loss[loss=0.2278, ctc_loss=0.1502, cr_loss=0.388, over 20798.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1474, cr_loss=0.3738, over 3981494.78 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:59:44,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=671984.5, ans=0.04949747468305833 2024-09-17 22:59:59,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=22.5 2024-09-17 23:00:26,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=672069.5, ans=0.2 2024-09-17 23:00:43,059 INFO [train.py:1198] (1/2) Epoch 38, batch 750, loss[loss=0.2197, ctc_loss=0.1446, cr_loss=0.3754, over 20836.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3735, over 4016444.49 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 16.0 2024-09-17 23:01:10,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=672126.1666666666, ans=10.0 2024-09-17 23:01:16,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=672154.5, ans=0.0 2024-09-17 23:01:35,916 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.174e+02 2.297e+02 2.482e+02 4.051e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 23:01:39,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=672182.8333333334, ans=0.125 2024-09-17 23:01:43,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=672211.1666666666, ans=0.0 2024-09-17 23:01:58,541 INFO [train.py:1198] (1/2) Epoch 38, batch 800, loss[loss=0.2279, ctc_loss=0.1515, cr_loss=0.3819, over 20652.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1474, cr_loss=0.3735, over 4035883.49 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:03:03,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.30 vs. limit=22.5 2024-09-17 23:03:06,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-09-17 23:03:17,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=672352.8333333334, ans=0.2 2024-09-17 23:03:20,746 INFO [train.py:1198] (1/2) Epoch 38, batch 850, loss[loss=0.234, ctc_loss=0.1535, cr_loss=0.4025, over 20856.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.373, over 4052866.45 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:03:24,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=672381.1666666666, ans=0.0 2024-09-17 23:03:27,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=12.0 2024-09-17 23:03:34,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=672409.5, ans=0.0 2024-09-17 23:03:36,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=672409.5, ans=0.125 2024-09-17 23:03:49,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=672437.8333333334, ans=0.125 2024-09-17 23:04:14,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.187e+02 2.328e+02 2.452e+02 3.011e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-17 23:04:20,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=672494.5, ans=0.125 2024-09-17 23:04:23,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2024-09-17 23:04:36,903 INFO [train.py:1198] (1/2) Epoch 38, batch 900, loss[loss=0.2177, ctc_loss=0.1417, cr_loss=0.3797, over 20826.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3718, over 4053297.59 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:04:40,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=672522.8333333334, ans=0.0 2024-09-17 23:05:05,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=22.5 2024-09-17 23:05:52,250 INFO [train.py:1198] (1/2) Epoch 38, batch 950, loss[loss=0.1883, ctc_loss=0.1211, cr_loss=0.3361, over 21015.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.3709, over 4071101.30 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:05:54,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672664.5, ans=0.1 2024-09-17 23:06:21,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-17 23:06:40,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=672749.5, ans=0.0 2024-09-17 23:06:45,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.170e+02 2.334e+02 2.474e+02 2.835e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 23:07:07,981 INFO [train.py:1198] (1/2) Epoch 38, batch 1000, loss[loss=0.1782, ctc_loss=0.1157, cr_loss=0.3124, over 20953.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.371, over 4081148.01 frames. ], batch size: 49, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:07:12,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=672806.1666666666, ans=0.125 2024-09-17 23:07:30,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=672834.5, ans=0.125 2024-09-17 23:07:42,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=672862.8333333334, ans=0.0 2024-09-17 23:07:55,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=672891.1666666666, ans=0.125 2024-09-17 23:08:00,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=672891.1666666666, ans=0.125 2024-09-17 23:08:03,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=672891.1666666666, ans=0.125 2024-09-17 23:08:28,689 INFO [train.py:1198] (1/2) Epoch 38, batch 1050, loss[loss=0.2342, ctc_loss=0.1552, cr_loss=0.3952, over 20948.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1469, cr_loss=0.3725, over 4075908.72 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:08:32,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=672947.8333333334, ans=0.07 2024-09-17 23:08:50,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=672976.1666666666, ans=0.125 2024-09-17 23:08:50,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2024-09-17 23:09:20,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=673032.8333333334, ans=0.2 2024-09-17 23:09:21,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.197e+02 2.300e+02 2.459e+02 4.453e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 23:09:32,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673061.1666666666, ans=0.1 2024-09-17 23:09:38,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-17 23:09:43,975 INFO [train.py:1198] (1/2) Epoch 38, batch 1100, loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.358, over 21006.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1476, cr_loss=0.374, over 4083301.64 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:09:46,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-17 23:10:07,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-17 23:10:20,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=673146.1666666666, ans=0.0 2024-09-17 23:10:23,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=673146.1666666666, ans=0.0 2024-09-17 23:10:38,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=673174.5, ans=0.2 2024-09-17 23:10:49,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=673202.8333333334, ans=0.125 2024-09-17 23:10:55,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=673202.8333333334, ans=0.0 2024-09-17 23:10:59,317 INFO [train.py:1198] (1/2) Epoch 38, batch 1150, loss[loss=0.1994, ctc_loss=0.1306, cr_loss=0.3439, over 20973.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3737, over 4086521.34 frames. ], batch size: 48, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:11:12,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-17 23:11:17,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673259.5, ans=0.1 2024-09-17 23:11:52,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.203e+02 2.311e+02 2.439e+02 3.422e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 23:12:05,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=673344.5, ans=0.04949747468305833 2024-09-17 23:12:08,209 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:12:15,313 INFO [train.py:1198] (1/2) Epoch 38, batch 1200, loss[loss=0.1885, ctc_loss=0.1237, cr_loss=0.324, over 20968.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1471, cr_loss=0.3733, over 4095316.66 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:12:28,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-17 23:12:47,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=673429.5, ans=0.025 2024-09-17 23:13:03,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673457.8333333334, ans=0.1 2024-09-17 23:13:29,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=673514.5, ans=0.0 2024-09-17 23:13:30,918 INFO [train.py:1198] (1/2) Epoch 38, batch 1250, loss[loss=0.2247, ctc_loss=0.1492, cr_loss=0.3776, over 20960.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1466, cr_loss=0.3728, over 4107171.17 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:13:31,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=673514.5, ans=0.0 2024-09-17 23:13:50,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=673542.8333333334, ans=0.125 2024-09-17 23:13:56,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673542.8333333334, ans=0.125 2024-09-17 23:14:04,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673542.8333333334, ans=0.1 2024-09-17 23:14:07,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=673571.1666666666, ans=0.2 2024-09-17 23:14:10,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=673571.1666666666, ans=0.0 2024-09-17 23:14:28,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=673599.5, ans=0.02 2024-09-17 23:14:29,619 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.193e+02 2.361e+02 2.546e+02 3.304e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 23:14:52,187 INFO [train.py:1198] (1/2) Epoch 38, batch 1300, loss[loss=0.1703, ctc_loss=0.1113, cr_loss=0.2947, over 20972.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1467, cr_loss=0.3724, over 4095542.80 frames. ], batch size: 48, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:15:04,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:15:56,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=673769.5, ans=0.2 2024-09-17 23:16:08,415 INFO [train.py:1198] (1/2) Epoch 38, batch 1350, loss[loss=0.224, ctc_loss=0.1488, cr_loss=0.3758, over 20970.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3716, over 4096364.01 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:16:14,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-17 23:16:44,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-09-17 23:16:46,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673854.5, ans=0.1 2024-09-17 23:16:52,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673882.8333333334, ans=0.1 2024-09-17 23:16:58,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-17 23:17:00,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-17 23:17:01,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.188e+02 2.301e+02 2.421e+02 3.432e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 23:17:03,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=673882.8333333334, ans=0.0 2024-09-17 23:17:09,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=673911.1666666666, ans=0.125 2024-09-17 23:17:24,003 INFO [train.py:1198] (1/2) Epoch 38, batch 1400, loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3728, over 20865.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3727, over 4103052.36 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:17:30,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=673939.5, ans=0.125 2024-09-17 23:17:50,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-17 23:17:54,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=673996.1666666666, ans=0.125 2024-09-17 23:17:59,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=673996.1666666666, ans=0.0 2024-09-17 23:18:38,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=674081.1666666666, ans=0.0 2024-09-17 23:18:39,998 INFO [train.py:1198] (1/2) Epoch 38, batch 1450, loss[loss=0.216, ctc_loss=0.1427, cr_loss=0.3668, over 20935.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3719, over 4113980.22 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:18:49,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=674081.1666666666, ans=0.025 2024-09-17 23:19:31,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=674166.1666666666, ans=0.0 2024-09-17 23:19:35,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.223e+02 2.375e+02 2.504e+02 3.299e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-17 23:20:01,249 INFO [train.py:1198] (1/2) Epoch 38, batch 1500, loss[loss=0.2329, ctc_loss=0.1544, cr_loss=0.3923, over 21022.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.373, over 4116947.39 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:20:15,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-17 23:20:18,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2024-09-17 23:20:34,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=674279.5, ans=0.0 2024-09-17 23:20:35,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-17 23:20:42,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674279.5, ans=0.125 2024-09-17 23:20:46,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=674307.8333333334, ans=0.125 2024-09-17 23:21:16,790 INFO [train.py:1198] (1/2) Epoch 38, batch 1550, loss[loss=0.239, ctc_loss=0.1599, cr_loss=0.3953, over 20027.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1472, cr_loss=0.3734, over 4109938.33 frames. ], batch size: 80, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:21:52,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=674421.1666666666, ans=0.0 2024-09-17 23:21:54,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=674421.1666666666, ans=0.125 2024-09-17 23:22:08,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.216e+02 2.344e+02 2.544e+02 4.001e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-17 23:22:31,594 INFO [train.py:1198] (1/2) Epoch 38, batch 1600, loss[loss=0.256, ctc_loss=0.1705, cr_loss=0.4275, over 20988.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1476, cr_loss=0.3737, over 4111542.81 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:23:45,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=674619.5, ans=0.2 2024-09-17 23:23:47,694 INFO [train.py:1198] (1/2) Epoch 38, batch 1650, loss[loss=0.2227, ctc_loss=0.1472, cr_loss=0.3771, over 20992.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1472, cr_loss=0.3734, over 4109220.26 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:24:07,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=674676.1666666666, ans=0.0 2024-09-17 23:24:40,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.138e+02 2.273e+02 2.414e+02 3.350e+02, threshold=4.546e+02, percent-clipped=0.0 2024-09-17 23:24:45,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=674732.8333333334, ans=0.2 2024-09-17 23:25:05,859 INFO [train.py:1198] (1/2) Epoch 38, batch 1700, loss[loss=0.2716, ctc_loss=0.1913, cr_loss=0.4013, over 14040.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1473, cr_loss=0.3734, over 4110984.56 frames. ], batch size: 149, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:25:30,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:25:33,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=674817.8333333334, ans=0.0 2024-09-17 23:26:25,386 INFO [train.py:1198] (1/2) Epoch 38, batch 1750, loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3728, over 20984.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3724, over 4118819.13 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:26:30,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=674931.1666666666, ans=0.0 2024-09-17 23:26:57,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=674987.8333333334, ans=0.2 2024-09-17 23:27:18,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.188e+02 2.341e+02 2.472e+02 5.035e+02, threshold=4.682e+02, percent-clipped=1.0 2024-09-17 23:27:40,712 INFO [train.py:1198] (1/2) Epoch 38, batch 1800, loss[loss=0.1784, ctc_loss=0.1157, cr_loss=0.3133, over 20981.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3715, over 4107584.28 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:27:45,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=675072.8333333334, ans=0.0 2024-09-17 23:27:48,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675072.8333333334, ans=0.0 2024-09-17 23:27:59,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=22.5 2024-09-17 23:28:02,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-17 23:28:06,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=675101.1666666666, ans=0.015 2024-09-17 23:28:10,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-17 23:28:41,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=675186.1666666666, ans=0.0 2024-09-17 23:28:54,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-09-17 23:28:55,174 INFO [train.py:1198] (1/2) Epoch 38, batch 1850, loss[loss=0.2199, ctc_loss=0.1455, cr_loss=0.3722, over 21098.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3695, over 4111300.18 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:28:55,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=675214.5, ans=0.0 2024-09-17 23:28:58,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-17 23:29:22,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=675242.8333333334, ans=0.125 2024-09-17 23:29:31,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=675271.1666666666, ans=0.025 2024-09-17 23:29:47,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.370e+02 2.537e+02 3.248e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-17 23:30:10,455 INFO [train.py:1198] (1/2) Epoch 38, batch 1900, loss[loss=0.2573, ctc_loss=0.1761, cr_loss=0.406, over 20676.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3709, over 4094066.61 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:30:21,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=675356.1666666666, ans=0.125 2024-09-17 23:30:33,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=675384.5, ans=0.125 2024-09-17 23:31:08,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=675441.1666666666, ans=0.125 2024-09-17 23:31:29,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=675469.5, ans=0.125 2024-09-17 23:31:32,016 INFO [train.py:1198] (1/2) Epoch 38, batch 1950, loss[loss=0.2275, ctc_loss=0.1527, cr_loss=0.3743, over 20339.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3702, over 4103646.93 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:31:52,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=22.5 2024-09-17 23:31:55,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=675526.1666666666, ans=0.125 2024-09-17 23:32:10,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=675554.5, ans=0.0 2024-09-17 23:32:11,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=675554.5, ans=0.125 2024-09-17 23:32:13,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=675554.5, ans=0.2 2024-09-17 23:32:25,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.215e+02 2.295e+02 2.440e+02 6.100e+02, threshold=4.591e+02, percent-clipped=1.0 2024-09-17 23:32:47,622 INFO [train.py:1198] (1/2) Epoch 38, batch 2000, loss[loss=0.2282, ctc_loss=0.1522, cr_loss=0.3804, over 21045.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3706, over 4100362.20 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:33:05,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675667.8333333334, ans=0.1 2024-09-17 23:33:58,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-09-17 23:34:03,522 INFO [train.py:1198] (1/2) Epoch 38, batch 2050, loss[loss=0.2186, ctc_loss=0.1435, cr_loss=0.3756, over 20973.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3696, over 4086705.62 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:34:14,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675781.1666666666, ans=0.0 2024-09-17 23:34:32,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675837.8333333334, ans=0.1 2024-09-17 23:34:36,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=675837.8333333334, ans=0.125 2024-09-17 23:34:56,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.185e+02 2.316e+02 2.490e+02 4.004e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 23:35:15,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=675894.5, ans=0.125 2024-09-17 23:35:16,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675894.5, ans=0.1 2024-09-17 23:35:18,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=675922.8333333334, ans=0.125 2024-09-17 23:35:19,511 INFO [train.py:1198] (1/2) Epoch 38, batch 2100, loss[loss=0.2115, ctc_loss=0.1385, cr_loss=0.3648, over 20782.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3705, over 4088276.92 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:35:30,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675922.8333333334, ans=0.1 2024-09-17 23:35:38,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=675951.1666666666, ans=0.125 2024-09-17 23:35:41,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=675951.1666666666, ans=0.125 2024-09-17 23:35:42,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=675951.1666666666, ans=0.0 2024-09-17 23:36:20,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=676036.1666666666, ans=0.0 2024-09-17 23:36:32,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=676036.1666666666, ans=0.125 2024-09-17 23:36:37,871 INFO [train.py:1198] (1/2) Epoch 38, batch 2150, loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3696, over 20755.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3702, over 4093976.73 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:37:14,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=676121.1666666666, ans=0.125 2024-09-17 23:37:33,794 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.250e+02 2.382e+02 2.570e+02 3.088e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-17 23:37:56,973 INFO [train.py:1198] (1/2) Epoch 38, batch 2200, loss[loss=0.2305, ctc_loss=0.152, cr_loss=0.3924, over 21008.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3704, over 4097861.54 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:38:09,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676206.1666666666, ans=0.1 2024-09-17 23:38:31,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-17 23:38:39,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676262.8333333334, ans=0.125 2024-09-17 23:39:11,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-17 23:39:12,699 INFO [train.py:1198] (1/2) Epoch 38, batch 2250, loss[loss=0.2215, ctc_loss=0.1463, cr_loss=0.3759, over 20999.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.37, over 4107730.95 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:39:13,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=676347.8333333334, ans=0.0 2024-09-17 23:39:26,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=676376.1666666666, ans=0.125 2024-09-17 23:39:57,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-17 23:40:05,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.204e+02 2.333e+02 2.495e+02 3.028e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-17 23:40:20,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=676461.1666666666, ans=0.0 2024-09-17 23:40:25,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=676461.1666666666, ans=0.0 2024-09-17 23:40:28,120 INFO [train.py:1198] (1/2) Epoch 38, batch 2300, loss[loss=0.2677, ctc_loss=0.1804, cr_loss=0.4362, over 18595.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.37, over 4110996.47 frames. ], batch size: 108, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:41:01,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=676546.1666666666, ans=0.0 2024-09-17 23:41:14,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-17 23:41:16,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=676574.5, ans=0.0 2024-09-17 23:41:33,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=676602.8333333334, ans=0.125 2024-09-17 23:41:43,472 INFO [train.py:1198] (1/2) Epoch 38, batch 2350, loss[loss=0.188, ctc_loss=0.1221, cr_loss=0.3296, over 20970.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3684, over 4105759.60 frames. ], batch size: 48, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:41:54,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=676631.1666666666, ans=0.125 2024-09-17 23:42:12,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=676687.8333333334, ans=0.125 2024-09-17 23:42:23,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2024-09-17 23:42:29,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676687.8333333334, ans=0.125 2024-09-17 23:42:35,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=676716.1666666666, ans=0.025 2024-09-17 23:42:42,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.161e+02 2.302e+02 2.540e+02 3.831e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 23:42:42,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=676716.1666666666, ans=0.2 2024-09-17 23:42:56,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=676744.5, ans=0.0 2024-09-17 23:43:05,092 INFO [train.py:1198] (1/2) Epoch 38, batch 2400, loss[loss=0.1927, ctc_loss=0.1239, cr_loss=0.3439, over 21001.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.144, cr_loss=0.3678, over 4118795.09 frames. ], batch size: 48, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:43:23,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676801.1666666666, ans=0.125 2024-09-17 23:43:25,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-09-17 23:44:20,322 INFO [train.py:1198] (1/2) Epoch 38, batch 2450, loss[loss=0.2252, ctc_loss=0.146, cr_loss=0.3962, over 21069.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1452, cr_loss=0.3689, over 4085898.65 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:44:42,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-17 23:44:51,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=676971.1666666666, ans=0.125 2024-09-17 23:45:14,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.182e+02 2.372e+02 2.578e+02 3.656e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-17 23:45:25,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=677027.8333333334, ans=0.0 2024-09-17 23:45:26,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677027.8333333334, ans=0.0 2024-09-17 23:45:36,781 INFO [train.py:1198] (1/2) Epoch 38, batch 2500, loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3368, over 20984.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3684, over 4095705.67 frames. ], batch size: 52, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:45:44,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=22.5 2024-09-17 23:46:22,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=677141.1666666666, ans=0.0 2024-09-17 23:46:28,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677141.1666666666, ans=0.125 2024-09-17 23:46:30,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=22.5 2024-09-17 23:46:43,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=22.5 2024-09-17 23:46:47,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=677169.5, ans=0.125 2024-09-17 23:46:51,882 INFO [train.py:1198] (1/2) Epoch 38, batch 2550, loss[loss=0.2007, ctc_loss=0.1327, cr_loss=0.3402, over 20957.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1443, cr_loss=0.3676, over 4104441.11 frames. ], batch size: 50, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:47:25,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=677254.5, ans=0.2 2024-09-17 23:47:30,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=677254.5, ans=15.0 2024-09-17 23:47:44,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.160e+02 2.286e+02 2.515e+02 3.049e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 23:48:10,389 INFO [train.py:1198] (1/2) Epoch 38, batch 2600, loss[loss=0.2311, ctc_loss=0.153, cr_loss=0.3906, over 19947.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1445, cr_loss=0.3681, over 4095693.79 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:48:23,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.96 vs. limit=12.0 2024-09-17 23:48:38,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-17 23:49:03,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=677424.5, ans=0.0 2024-09-17 23:49:20,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=677452.8333333334, ans=0.04949747468305833 2024-09-17 23:49:28,942 INFO [train.py:1198] (1/2) Epoch 38, batch 2650, loss[loss=0.2273, ctc_loss=0.1499, cr_loss=0.3869, over 20649.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1446, cr_loss=0.3682, over 4089490.11 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:49:47,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=677509.5, ans=0.125 2024-09-17 23:50:21,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.240e+02 2.383e+02 2.555e+02 4.331e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-17 23:50:44,625 INFO [train.py:1198] (1/2) Epoch 38, batch 2700, loss[loss=0.2341, ctc_loss=0.1538, cr_loss=0.4016, over 21060.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3701, over 4092565.52 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:50:54,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=677622.8333333334, ans=0.125 2024-09-17 23:51:11,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=677651.1666666666, ans=0.0 2024-09-17 23:51:15,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=677679.5, ans=22.5 2024-09-17 23:51:41,074 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:51:45,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677736.1666666666, ans=0.1 2024-09-17 23:52:00,365 INFO [train.py:1198] (1/2) Epoch 38, batch 2750, loss[loss=0.1821, ctc_loss=0.1186, cr_loss=0.3175, over 20959.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3692, over 4099659.79 frames. ], batch size: 49, lr: 2.17e-03, grad_scale: 64.0 2024-09-17 23:52:03,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=677764.5, ans=0.125 2024-09-17 23:52:10,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=22.5 2024-09-17 23:52:35,184 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:52:45,578 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:52:49,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=677849.5, ans=0.125 2024-09-17 23:52:51,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=677849.5, ans=0.125 2024-09-17 23:52:52,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.201e+02 2.301e+02 2.469e+02 3.405e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 23:53:15,301 INFO [train.py:1198] (1/2) Epoch 38, batch 2800, loss[loss=0.2173, ctc_loss=0.1426, cr_loss=0.3731, over 20929.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.37, over 4100219.72 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 64.0 2024-09-17 23:53:52,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=677962.8333333334, ans=15.0 2024-09-17 23:54:13,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=677991.1666666666, ans=10.0 2024-09-17 23:54:22,324 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:54:31,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=678019.5, ans=0.125 2024-09-17 23:54:36,768 INFO [train.py:1198] (1/2) Epoch 38, batch 2850, loss[loss=0.1897, ctc_loss=0.1226, cr_loss=0.3353, over 20959.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3692, over 4099439.29 frames. ], batch size: 51, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:54:37,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678047.8333333334, ans=0.1 2024-09-17 23:54:37,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=678047.8333333334, ans=0.125 2024-09-17 23:55:02,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=678076.1666666666, ans=0.125 2024-09-17 23:55:31,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.169e+02 2.263e+02 2.441e+02 3.215e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-17 23:55:52,872 INFO [train.py:1198] (1/2) Epoch 38, batch 2900, loss[loss=0.192, ctc_loss=0.1266, cr_loss=0.3268, over 21002.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3689, over 4110753.10 frames. ], batch size: 52, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:56:03,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=678189.5, ans=0.0 2024-09-17 23:56:17,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=678217.8333333334, ans=0.0 2024-09-17 23:57:04,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=678302.8333333334, ans=0.025 2024-09-17 23:57:07,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=678331.1666666666, ans=0.125 2024-09-17 23:57:08,364 INFO [train.py:1198] (1/2) Epoch 38, batch 2950, loss[loss=0.2013, ctc_loss=0.1301, cr_loss=0.3561, over 21055.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3699, over 4109207.74 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:57:21,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-17 23:57:59,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=678416.1666666666, ans=0.0 2024-09-17 23:58:03,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.235e+02 2.352e+02 2.536e+02 8.217e+02, threshold=4.704e+02, percent-clipped=1.0 2024-09-17 23:58:05,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678416.1666666666, ans=0.1 2024-09-17 23:58:25,016 INFO [train.py:1198] (1/2) Epoch 38, batch 3000, loss[loss=0.2321, ctc_loss=0.1547, cr_loss=0.3871, over 20842.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3701, over 4087085.71 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:58:25,016 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 23:58:48,113 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.04012, ctc_loss=0.04012, cr_loss=1.433e-14, over 944034.00 frames. 2024-09-17 23:58:48,113 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-17 23:59:00,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=678472.8333333334, ans=0.05 2024-09-17 23:59:06,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678501.1666666666, ans=0.1 2024-09-17 23:59:25,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=22.5 2024-09-17 23:59:58,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-18 00:00:03,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 00:00:10,400 INFO [train.py:1198] (1/2) Epoch 38, batch 3050, loss[loss=0.2356, ctc_loss=0.1565, cr_loss=0.3954, over 20860.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3711, over 4096198.80 frames. ], batch size: 65, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:00:10,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=678614.5, ans=0.0 2024-09-18 00:00:19,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=678614.5, ans=0.2 2024-09-18 00:00:20,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-18 00:00:50,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=678671.1666666666, ans=0.1 2024-09-18 00:00:53,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2024-09-18 00:00:54,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=678699.5, ans=0.0 2024-09-18 00:00:57,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=678699.5, ans=0.0 2024-09-18 00:01:04,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.170e+02 2.309e+02 2.471e+02 3.592e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-18 00:01:24,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=22.5 2024-09-18 00:01:26,252 INFO [train.py:1198] (1/2) Epoch 38, batch 3100, loss[loss=0.2177, ctc_loss=0.1395, cr_loss=0.3914, over 20785.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3714, over 4101056.83 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:01:38,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-18 00:01:44,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=678784.5, ans=0.125 2024-09-18 00:01:52,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=678784.5, ans=0.05 2024-09-18 00:02:18,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678841.1666666666, ans=0.1 2024-09-18 00:02:31,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=678869.5, ans=0.0 2024-09-18 00:02:42,108 INFO [train.py:1198] (1/2) Epoch 38, batch 3150, loss[loss=0.1976, ctc_loss=0.1287, cr_loss=0.3447, over 20779.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3711, over 4104847.84 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:02:54,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=678897.8333333334, ans=0.125 2024-09-18 00:03:00,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678926.1666666666, ans=0.1 2024-09-18 00:03:03,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=678926.1666666666, ans=0.95 2024-09-18 00:03:37,835 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.168e+02 2.310e+02 2.490e+02 3.894e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-18 00:03:44,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679011.1666666666, ans=0.1 2024-09-18 00:03:48,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=679011.1666666666, ans=0.2 2024-09-18 00:03:50,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=679011.1666666666, ans=0.2 2024-09-18 00:03:57,606 INFO [train.py:1198] (1/2) Epoch 38, batch 3200, loss[loss=0.2242, ctc_loss=0.1489, cr_loss=0.3766, over 21019.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3706, over 4101190.80 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:05:06,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=679152.8333333334, ans=0.125 2024-09-18 00:05:16,840 INFO [train.py:1198] (1/2) Epoch 38, batch 3250, loss[loss=0.1925, ctc_loss=0.1286, cr_loss=0.3194, over 20961.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3711, over 4102601.73 frames. ], batch size: 49, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:05:53,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-09-18 00:06:08,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-18 00:06:15,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.235e+02 2.338e+02 2.498e+02 4.233e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 00:06:15,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=679266.1666666666, ans=0.2 2024-09-18 00:06:35,036 INFO [train.py:1198] (1/2) Epoch 38, batch 3300, loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3424, over 19449.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3718, over 4097628.41 frames. ], batch size: 43, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:06:59,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679351.1666666666, ans=0.1 2024-09-18 00:07:19,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=679407.8333333334, ans=0.2 2024-09-18 00:07:36,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=679436.1666666666, ans=0.125 2024-09-18 00:07:49,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=679464.5, ans=0.025 2024-09-18 00:07:50,578 INFO [train.py:1198] (1/2) Epoch 38, batch 3350, loss[loss=0.2092, ctc_loss=0.1377, cr_loss=0.3576, over 20665.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3716, over 4097160.07 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:07:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=679464.5, ans=0.035 2024-09-18 00:08:11,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=679492.8333333334, ans=0.0 2024-09-18 00:08:24,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-09-18 00:08:25,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=679521.1666666666, ans=0.125 2024-09-18 00:08:46,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.192e+02 2.281e+02 2.427e+02 3.859e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-18 00:09:06,272 INFO [train.py:1198] (1/2) Epoch 38, batch 3400, loss[loss=0.262, ctc_loss=0.1762, cr_loss=0.4286, over 20202.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3718, over 4111465.31 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:09:06,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=679606.1666666666, ans=0.125 2024-09-18 00:09:31,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=679634.5, ans=0.0 2024-09-18 00:09:40,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=15.0 2024-09-18 00:09:47,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679662.8333333334, ans=0.1 2024-09-18 00:10:13,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=679719.5, ans=0.0 2024-09-18 00:10:20,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=679747.8333333334, ans=0.125 2024-09-18 00:10:22,115 INFO [train.py:1198] (1/2) Epoch 38, batch 3450, loss[loss=0.1805, ctc_loss=0.1184, cr_loss=0.3106, over 20951.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.371, over 4123493.04 frames. ], batch size: 48, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:11:25,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.189e+02 2.319e+02 2.495e+02 3.342e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-18 00:11:43,240 INFO [train.py:1198] (1/2) Epoch 38, batch 3500, loss[loss=0.2643, ctc_loss=0.1865, cr_loss=0.3891, over 14732.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3709, over 4115658.57 frames. ], batch size: 149, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:12:23,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=679946.1666666666, ans=0.0 2024-09-18 00:12:29,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=679974.5, ans=0.09899494936611666 2024-09-18 00:12:39,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-18 00:13:00,313 INFO [train.py:1198] (1/2) Epoch 38, batch 3550, loss[loss=0.2583, ctc_loss=0.1749, cr_loss=0.4171, over 18418.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3708, over 4112991.84 frames. ], batch size: 108, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:13:05,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=680031.1666666666, ans=0.2 2024-09-18 00:13:21,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=680059.5, ans=0.1 2024-09-18 00:13:47,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=680116.1666666666, ans=0.0 2024-09-18 00:13:50,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680116.1666666666, ans=0.1 2024-09-18 00:13:56,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=680116.1666666666, ans=0.125 2024-09-18 00:13:57,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.246e+02 2.376e+02 2.589e+02 3.239e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 00:14:02,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=680144.5, ans=0.0 2024-09-18 00:14:15,178 INFO [train.py:1198] (1/2) Epoch 38, batch 3600, loss[loss=0.2174, ctc_loss=0.1446, cr_loss=0.3639, over 20826.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3715, over 4111148.72 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:14:16,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=680172.8333333334, ans=0.125 2024-09-18 00:15:00,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=680257.8333333334, ans=0.0 2024-09-18 00:15:23,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=680286.1666666666, ans=0.125 2024-09-18 00:15:24,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=680286.1666666666, ans=0.125 2024-09-18 00:15:30,326 INFO [train.py:1198] (1/2) Epoch 38, batch 3650, loss[loss=0.2359, ctc_loss=0.1562, cr_loss=0.3982, over 19429.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.371, over 4113848.79 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:15:45,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=680342.8333333334, ans=15.0 2024-09-18 00:15:47,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680342.8333333334, ans=0.125 2024-09-18 00:16:27,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=680399.5, ans=0.125 2024-09-18 00:16:28,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.174e+02 2.298e+02 2.482e+02 3.802e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-18 00:16:49,220 INFO [train.py:1198] (1/2) Epoch 38, batch 3700, loss[loss=0.2556, ctc_loss=0.1696, cr_loss=0.4302, over 20983.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3698, over 4114150.50 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:16:49,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-18 00:17:10,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=680484.5, ans=0.025 2024-09-18 00:17:11,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2024-09-18 00:18:07,911 INFO [train.py:1198] (1/2) Epoch 38, batch 3750, loss[loss=0.2235, ctc_loss=0.1476, cr_loss=0.3793, over 21059.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3709, over 4119152.50 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:18:44,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680654.5, ans=0.1 2024-09-18 00:18:57,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=680682.8333333334, ans=0.0 2024-09-18 00:18:58,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 00:19:04,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.188e+02 2.340e+02 2.478e+02 3.466e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-18 00:19:08,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680711.1666666666, ans=0.1 2024-09-18 00:19:16,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-09-18 00:19:22,888 INFO [train.py:1198] (1/2) Epoch 38, batch 3800, loss[loss=0.1893, ctc_loss=0.1257, cr_loss=0.3182, over 20996.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3698, over 4110824.74 frames. ], batch size: 51, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:19:53,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=680796.1666666666, ans=0.0 2024-09-18 00:20:06,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-18 00:20:39,058 INFO [train.py:1198] (1/2) Epoch 38, batch 3850, loss[loss=0.1665, ctc_loss=0.105, cr_loss=0.3072, over 20969.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3711, over 4112055.33 frames. ], batch size: 49, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:21:15,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680937.8333333334, ans=0.1 2024-09-18 00:21:39,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.177e+02 2.335e+02 2.487e+02 1.316e+03, threshold=4.669e+02, percent-clipped=2.0 2024-09-18 00:21:54,740 INFO [train.py:1198] (1/2) Epoch 38, batch 3900, loss[loss=0.2292, ctc_loss=0.1531, cr_loss=0.3805, over 20674.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3715, over 4110739.29 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:22:34,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=681079.5, ans=0.0 2024-09-18 00:23:09,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=681136.1666666666, ans=0.125 2024-09-18 00:23:15,988 INFO [train.py:1198] (1/2) Epoch 38, batch 3950, loss[loss=0.2146, ctc_loss=0.1448, cr_loss=0.349, over 20919.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1459, cr_loss=0.3702, over 4110753.70 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:23:39,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=681192.8333333334, ans=0.025 2024-09-18 00:24:10,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=681249.5, ans=0.125 2024-09-18 00:24:11,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=681249.5, ans=0.015 2024-09-18 00:24:11,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=681249.5, ans=0.5 2024-09-18 00:24:16,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.172e+02 2.301e+02 2.467e+02 5.100e+02, threshold=4.602e+02, percent-clipped=1.0 2024-09-18 00:24:19,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=681277.8333333334, ans=0.0 2024-09-18 00:24:31,236 INFO [train.py:1198] (1/2) Epoch 38, batch 4000, loss[loss=0.2291, ctc_loss=0.1533, cr_loss=0.3788, over 20974.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.146, cr_loss=0.3703, over 4103046.36 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:24:37,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=681306.1666666666, ans=0.0 2024-09-18 00:24:52,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=681334.5, ans=0.125 2024-09-18 00:24:54,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=681334.5, ans=0.0 2024-09-18 00:25:12,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681362.8333333334, ans=0.1 2024-09-18 00:25:46,910 INFO [train.py:1198] (1/2) Epoch 38, batch 4050, loss[loss=0.2388, ctc_loss=0.1595, cr_loss=0.3962, over 21042.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3714, over 4102826.08 frames. ], batch size: 62, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:25:47,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=681447.8333333334, ans=0.125 2024-09-18 00:25:56,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681447.8333333334, ans=0.125 2024-09-18 00:26:02,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=681476.1666666666, ans=0.125 2024-09-18 00:26:47,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.171e+02 2.348e+02 2.511e+02 3.000e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-18 00:27:02,958 INFO [train.py:1198] (1/2) Epoch 38, batch 4100, loss[loss=0.2348, ctc_loss=0.159, cr_loss=0.3793, over 21014.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3706, over 4110311.11 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:27:07,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=681589.5, ans=0.0 2024-09-18 00:27:08,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-18 00:27:09,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=681589.5, ans=0.2 2024-09-18 00:27:49,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=12.0 2024-09-18 00:27:58,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-09-18 00:28:04,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=681702.8333333334, ans=0.2 2024-09-18 00:28:18,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=681702.8333333334, ans=0.025 2024-09-18 00:28:22,268 INFO [train.py:1198] (1/2) Epoch 38, batch 4150, loss[loss=0.2191, ctc_loss=0.1438, cr_loss=0.3765, over 20993.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1456, cr_loss=0.3696, over 4114760.45 frames. ], batch size: 55, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:28:33,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681731.1666666666, ans=0.1 2024-09-18 00:29:04,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.10 vs. limit=6.0 2024-09-18 00:29:08,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681787.8333333334, ans=0.125 2024-09-18 00:29:11,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=12.0 2024-09-18 00:29:25,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.185e+02 2.324e+02 2.533e+02 7.205e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-18 00:29:40,901 INFO [train.py:1198] (1/2) Epoch 38, batch 4200, loss[loss=0.204, ctc_loss=0.135, cr_loss=0.345, over 19908.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1453, cr_loss=0.3684, over 4112339.10 frames. ], batch size: 44, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:30:02,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-18 00:30:07,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=681901.1666666666, ans=0.125 2024-09-18 00:30:48,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681986.1666666666, ans=0.1 2024-09-18 00:30:56,682 INFO [train.py:1198] (1/2) Epoch 38, batch 4250, loss[loss=0.2116, ctc_loss=0.1391, cr_loss=0.3624, over 21091.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1449, cr_loss=0.3676, over 4114784.07 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:30:58,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=682014.5, ans=0.0 2024-09-18 00:31:01,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=682014.5, ans=0.07 2024-09-18 00:31:30,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=682071.1666666666, ans=0.2 2024-09-18 00:31:58,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.234e+02 2.385e+02 2.551e+02 6.288e+02, threshold=4.770e+02, percent-clipped=1.0 2024-09-18 00:32:13,282 INFO [train.py:1198] (1/2) Epoch 38, batch 4300, loss[loss=0.1828, ctc_loss=0.1187, cr_loss=0.3208, over 20950.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1448, cr_loss=0.368, over 4114001.95 frames. ], batch size: 50, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:32:31,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=682184.5, ans=0.125 2024-09-18 00:32:31,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=682184.5, ans=0.125 2024-09-18 00:32:34,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=682184.5, ans=0.0 2024-09-18 00:33:15,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682269.5, ans=0.125 2024-09-18 00:33:25,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682269.5, ans=0.1 2024-09-18 00:33:28,522 INFO [train.py:1198] (1/2) Epoch 38, batch 4350, loss[loss=0.1832, ctc_loss=0.1199, cr_loss=0.3162, over 20962.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1444, cr_loss=0.3675, over 4116525.33 frames. ], batch size: 49, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:33:43,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=682326.1666666666, ans=0.125 2024-09-18 00:33:51,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-18 00:34:34,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.234e+02 2.368e+02 2.527e+02 2.981e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 00:34:37,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=682411.1666666666, ans=0.0 2024-09-18 00:34:49,414 INFO [train.py:1198] (1/2) Epoch 38, batch 4400, loss[loss=0.2404, ctc_loss=0.1618, cr_loss=0.3931, over 21005.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1449, cr_loss=0.3683, over 4103605.29 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:34:55,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-18 00:34:55,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682439.5, ans=0.125 2024-09-18 00:35:18,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=682496.1666666666, ans=0.125 2024-09-18 00:35:25,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-18 00:36:05,401 INFO [train.py:1198] (1/2) Epoch 38, batch 4450, loss[loss=0.2308, ctc_loss=0.1532, cr_loss=0.3883, over 20690.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1451, cr_loss=0.3685, over 4112582.70 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:36:11,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=682581.1666666666, ans=0.0 2024-09-18 00:36:27,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2024-09-18 00:36:28,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=682609.5, ans=0.125 2024-09-18 00:36:34,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=22.5 2024-09-18 00:36:58,419 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:37:04,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=682694.5, ans=0.2 2024-09-18 00:37:04,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=682694.5, ans=0.0 2024-09-18 00:37:04,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=682694.5, ans=0.0 2024-09-18 00:37:05,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.178e+02 2.267e+02 2.475e+02 3.401e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-18 00:37:12,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682694.5, ans=0.1 2024-09-18 00:37:21,052 INFO [train.py:1198] (1/2) Epoch 38, batch 4500, loss[loss=0.2392, ctc_loss=0.1594, cr_loss=0.399, over 20702.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1448, cr_loss=0.3678, over 4114253.35 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:37:51,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=682779.5, ans=0.125 2024-09-18 00:38:00,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=22.5 2024-09-18 00:38:32,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682836.1666666666, ans=0.125 2024-09-18 00:38:33,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=682836.1666666666, ans=0.0 2024-09-18 00:38:36,240 INFO [train.py:1198] (1/2) Epoch 38, batch 4550, loss[loss=0.2395, ctc_loss=0.1592, cr_loss=0.4012, over 20967.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1456, cr_loss=0.3691, over 4109057.69 frames. ], batch size: 67, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:38:37,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-18 00:39:00,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=682892.8333333334, ans=0.0 2024-09-18 00:39:07,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 00:39:26,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=682949.5, ans=0.125 2024-09-18 00:39:36,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.204e+02 2.316e+02 2.469e+02 9.471e+02, threshold=4.633e+02, percent-clipped=1.0 2024-09-18 00:39:54,845 INFO [train.py:1198] (1/2) Epoch 38, batch 4600, loss[loss=0.1931, ctc_loss=0.1268, cr_loss=0.3315, over 20977.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3692, over 4114945.83 frames. ], batch size: 49, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:40:18,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=683034.5, ans=0.025 2024-09-18 00:40:54,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-18 00:40:58,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=683119.5, ans=0.2 2024-09-18 00:41:06,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=683119.5, ans=0.0 2024-09-18 00:41:13,660 INFO [train.py:1198] (1/2) Epoch 38, batch 4650, loss[loss=0.224, ctc_loss=0.1498, cr_loss=0.3712, over 20661.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1442, cr_loss=0.3675, over 4112492.64 frames. ], batch size: 66, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:41:14,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=683147.8333333334, ans=0.125 2024-09-18 00:41:56,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683204.5, ans=0.125 2024-09-18 00:41:59,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683232.8333333334, ans=0.1 2024-09-18 00:41:59,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=683232.8333333334, ans=0.025 2024-09-18 00:42:02,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=683232.8333333334, ans=0.025 2024-09-18 00:42:14,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.184e+02 2.332e+02 2.461e+02 3.395e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-18 00:42:29,504 INFO [train.py:1198] (1/2) Epoch 38, batch 4700, loss[loss=0.2138, ctc_loss=0.1379, cr_loss=0.3793, over 20967.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1439, cr_loss=0.3671, over 4113035.10 frames. ], batch size: 58, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:42:47,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=683317.8333333334, ans=0.0 2024-09-18 00:43:45,165 INFO [train.py:1198] (1/2) Epoch 38, batch 4750, loss[loss=0.2012, ctc_loss=0.1313, cr_loss=0.3493, over 20992.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3683, over 4099967.11 frames. ], batch size: 52, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:43:50,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=683431.1666666666, ans=0.125 2024-09-18 00:44:32,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683516.1666666666, ans=0.1 2024-09-18 00:44:44,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-18 00:44:45,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.203e+02 2.333e+02 2.487e+02 5.202e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 00:45:00,942 INFO [train.py:1198] (1/2) Epoch 38, batch 4800, loss[loss=0.1892, ctc_loss=0.1242, cr_loss=0.3248, over 19515.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.369, over 4093772.96 frames. ], batch size: 43, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:45:21,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683601.1666666666, ans=0.0 2024-09-18 00:45:38,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-18 00:45:40,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-09-18 00:45:52,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683657.8333333334, ans=0.125 2024-09-18 00:45:59,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683657.8333333334, ans=0.1 2024-09-18 00:46:11,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=683686.1666666666, ans=0.2 2024-09-18 00:46:22,373 INFO [train.py:1198] (1/2) Epoch 38, batch 4850, loss[loss=0.2491, ctc_loss=0.1682, cr_loss=0.4041, over 19288.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1459, cr_loss=0.3697, over 4089472.34 frames. ], batch size: 90, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:46:25,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=683714.5, ans=0.125 2024-09-18 00:46:28,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=683714.5, ans=0.125 2024-09-18 00:46:44,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=683742.8333333334, ans=0.125 2024-09-18 00:46:54,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=683771.1666666666, ans=0.2 2024-09-18 00:47:06,867 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:47:12,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=683799.5, ans=0.025 2024-09-18 00:47:12,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=683799.5, ans=0.02 2024-09-18 00:47:18,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=683799.5, ans=0.0 2024-09-18 00:47:22,855 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.185e+02 2.288e+02 2.442e+02 3.685e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-18 00:47:38,105 INFO [train.py:1198] (1/2) Epoch 38, batch 4900, loss[loss=0.1938, ctc_loss=0.1263, cr_loss=0.3374, over 20988.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.3687, over 4093568.00 frames. ], batch size: 55, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:47:55,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=683884.5, ans=0.125 2024-09-18 00:47:58,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=683884.5, ans=0.0 2024-09-18 00:48:53,076 INFO [train.py:1198] (1/2) Epoch 38, batch 4950, loss[loss=0.235, ctc_loss=0.1572, cr_loss=0.389, over 20072.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.3688, over 4093040.91 frames. ], batch size: 80, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:49:00,853 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:49:15,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=684026.1666666666, ans=10.0 2024-09-18 00:49:45,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=684082.8333333334, ans=0.125 2024-09-18 00:49:51,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=684111.1666666666, ans=0.2 2024-09-18 00:49:52,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.221e+02 2.337e+02 2.503e+02 4.200e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-18 00:50:07,456 INFO [train.py:1198] (1/2) Epoch 38, batch 5000, loss[loss=0.2174, ctc_loss=0.1454, cr_loss=0.3601, over 21048.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1446, cr_loss=0.3679, over 4098823.96 frames. ], batch size: 62, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:50:10,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-09-18 00:50:21,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=684167.8333333334, ans=0.0 2024-09-18 00:50:25,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=684167.8333333334, ans=0.2 2024-09-18 00:50:35,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684196.1666666666, ans=0.0 2024-09-18 00:50:48,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-18 00:50:55,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=684224.5, ans=0.125 2024-09-18 00:51:05,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=684252.8333333334, ans=0.2 2024-09-18 00:51:21,986 INFO [train.py:1198] (1/2) Epoch 38, batch 5050, loss[loss=0.2269, ctc_loss=0.1507, cr_loss=0.381, over 20686.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1445, cr_loss=0.3678, over 4098941.58 frames. ], batch size: 68, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:51:22,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=684281.1666666666, ans=0.125 2024-09-18 00:51:31,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=684281.1666666666, ans=0.0 2024-09-18 00:51:53,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=684337.8333333334, ans=0.025 2024-09-18 00:52:01,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-09-18 00:52:09,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=684366.1666666666, ans=0.04949747468305833 2024-09-18 00:52:20,571 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.246e+02 2.374e+02 2.516e+02 6.354e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 00:52:28,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=684394.5, ans=0.125 2024-09-18 00:52:30,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=684394.5, ans=15.0 2024-09-18 00:52:35,418 INFO [train.py:1198] (1/2) Epoch 38, batch 5100, loss[loss=0.2224, ctc_loss=0.1488, cr_loss=0.368, over 19444.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.145, cr_loss=0.3685, over 4090463.47 frames. ], batch size: 90, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:53:06,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684479.5, ans=0.1 2024-09-18 00:53:18,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=684507.8333333334, ans=0.125 2024-09-18 00:53:52,938 INFO [train.py:1198] (1/2) Epoch 38, batch 5150, loss[loss=0.2352, ctc_loss=0.158, cr_loss=0.3859, over 21023.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3686, over 4092399.51 frames. ], batch size: 63, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:54:39,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-09-18 00:54:54,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.184e+02 2.308e+02 2.448e+02 4.655e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-18 00:55:04,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=684677.8333333334, ans=0.2 2024-09-18 00:55:09,715 INFO [train.py:1198] (1/2) Epoch 38, batch 5200, loss[loss=0.2222, ctc_loss=0.1474, cr_loss=0.3739, over 20837.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1449, cr_loss=0.3682, over 4089201.94 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:56:00,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684791.1666666666, ans=0.1 2024-09-18 00:56:24,428 INFO [train.py:1198] (1/2) Epoch 38, batch 5250, loss[loss=0.2506, ctc_loss=0.1674, cr_loss=0.4161, over 21004.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3693, over 4093330.96 frames. ], batch size: 61, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:56:27,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=684847.8333333334, ans=0.0 2024-09-18 00:56:29,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=684847.8333333334, ans=0.04949747468305833 2024-09-18 00:56:41,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=684876.1666666666, ans=0.0 2024-09-18 00:56:55,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=684904.5, ans=0.025 2024-09-18 00:57:10,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=684932.8333333334, ans=0.2 2024-09-18 00:57:23,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.195e+02 2.290e+02 2.458e+02 4.073e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-18 00:57:37,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=684989.5, ans=0.0 2024-09-18 00:57:39,088 INFO [train.py:1198] (1/2) Epoch 38, batch 5300, loss[loss=0.2295, ctc_loss=0.1515, cr_loss=0.3901, over 21043.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3706, over 4080430.32 frames. ], batch size: 56, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:57:57,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=685017.8333333334, ans=0.125 2024-09-18 00:58:04,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=685017.8333333334, ans=0.2 2024-09-18 00:58:04,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685017.8333333334, ans=0.1 2024-09-18 00:58:14,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=685046.1666666666, ans=0.2 2024-09-18 00:58:47,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=685102.8333333334, ans=0.0 2024-09-18 00:58:53,337 INFO [train.py:1198] (1/2) Epoch 38, batch 5350, loss[loss=0.2241, ctc_loss=0.1483, cr_loss=0.3788, over 20879.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1471, cr_loss=0.3729, over 4088384.35 frames. ], batch size: 54, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 00:58:55,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2024-09-18 00:58:58,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685131.1666666666, ans=0.1 2024-09-18 00:59:13,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=685159.5, ans=0.2 2024-09-18 00:59:32,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=685187.8333333334, ans=0.0 2024-09-18 00:59:54,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.204e+02 2.357e+02 2.541e+02 3.874e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 00:59:56,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=685244.5, ans=0.025 2024-09-18 01:00:07,907 INFO [train.py:1198] (1/2) Epoch 38, batch 5400, loss[loss=0.2324, ctc_loss=0.1542, cr_loss=0.3912, over 20953.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3726, over 4084135.44 frames. ], batch size: 64, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:00:09,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=685272.8333333334, ans=0.5 2024-09-18 01:00:17,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=685272.8333333334, ans=0.125 2024-09-18 01:00:22,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=685301.1666666666, ans=0.125 2024-09-18 01:00:48,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685329.5, ans=0.1 2024-09-18 01:00:49,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=685329.5, ans=0.0 2024-09-18 01:00:55,946 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:01:03,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.00 vs. limit=22.5 2024-09-18 01:01:09,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=685386.1666666666, ans=0.125 2024-09-18 01:01:18,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=685386.1666666666, ans=0.0 2024-09-18 01:01:22,628 INFO [train.py:1198] (1/2) Epoch 38, batch 5450, loss[loss=0.2349, ctc_loss=0.1558, cr_loss=0.3955, over 21047.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3729, over 4087003.23 frames. ], batch size: 63, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:01:42,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=685442.8333333334, ans=0.0 2024-09-18 01:01:45,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685442.8333333334, ans=0.125 2024-09-18 01:01:49,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=685442.8333333334, ans=0.2 2024-09-18 01:01:51,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=685471.1666666666, ans=0.125 2024-09-18 01:01:57,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=685471.1666666666, ans=0.125 2024-09-18 01:01:58,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=685471.1666666666, ans=0.125 2024-09-18 01:02:00,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685471.1666666666, ans=0.1 2024-09-18 01:02:23,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.229e+02 2.376e+02 2.564e+02 3.729e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-18 01:02:37,673 INFO [train.py:1198] (1/2) Epoch 38, batch 5500, loss[loss=0.1988, ctc_loss=0.1305, cr_loss=0.3417, over 21042.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.3734, over 4093812.93 frames. ], batch size: 53, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:03:17,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=685612.8333333334, ans=0.2 2024-09-18 01:03:31,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-18 01:03:36,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=685641.1666666666, ans=0.0 2024-09-18 01:03:38,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685669.5, ans=0.125 2024-09-18 01:03:38,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=685669.5, ans=0.2 2024-09-18 01:03:57,540 INFO [train.py:1198] (1/2) Epoch 38, batch 5550, loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3699, over 21005.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1472, cr_loss=0.3738, over 4100117.76 frames. ], batch size: 61, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:04:10,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-18 01:04:29,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2024-09-18 01:04:40,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=685782.8333333334, ans=0.125 2024-09-18 01:04:58,868 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.165e+02 2.298e+02 2.438e+02 4.355e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-18 01:05:00,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=685811.1666666666, ans=0.025 2024-09-18 01:05:12,208 INFO [train.py:1198] (1/2) Epoch 38, batch 5600, loss[loss=0.2542, ctc_loss=0.1697, cr_loss=0.4226, over 20856.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1474, cr_loss=0.3738, over 4097781.42 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:05:24,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=685839.5, ans=0.125 2024-09-18 01:05:32,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-18 01:05:45,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=685896.1666666666, ans=0.015 2024-09-18 01:06:26,290 INFO [train.py:1198] (1/2) Epoch 38, batch 5650, loss[loss=0.2177, ctc_loss=0.1471, cr_loss=0.3532, over 20769.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1471, cr_loss=0.3734, over 4101334.14 frames. ], batch size: 56, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:06:34,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-18 01:06:44,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=686009.5, ans=0.0 2024-09-18 01:06:47,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=686009.5, ans=0.2 2024-09-18 01:07:00,772 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:07:27,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.236e+02 2.355e+02 2.585e+02 3.333e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 01:07:40,406 INFO [train.py:1198] (1/2) Epoch 38, batch 5700, loss[loss=0.2266, ctc_loss=0.1519, cr_loss=0.3735, over 20876.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1479, cr_loss=0.3745, over 4090118.01 frames. ], batch size: 57, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:07:40,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=686122.8333333334, ans=0.2 2024-09-18 01:07:42,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686122.8333333334, ans=0.1 2024-09-18 01:08:00,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=686151.1666666666, ans=0.125 2024-09-18 01:08:26,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=686207.8333333334, ans=0.1 2024-09-18 01:08:28,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686207.8333333334, ans=0.0 2024-09-18 01:08:33,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=686207.8333333334, ans=0.0 2024-09-18 01:08:44,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=686236.1666666666, ans=0.0 2024-09-18 01:08:55,281 INFO [train.py:1198] (1/2) Epoch 38, batch 5750, loss[loss=0.2208, ctc_loss=0.1441, cr_loss=0.3835, over 20967.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1474, cr_loss=0.3737, over 4096574.01 frames. ], batch size: 55, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:09:04,585 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:09:25,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=686321.1666666666, ans=0.2 2024-09-18 01:09:56,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.234e+02 2.378e+02 2.540e+02 4.922e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 01:10:07,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=686377.8333333334, ans=0.2 2024-09-18 01:10:10,240 INFO [train.py:1198] (1/2) Epoch 38, batch 5800, loss[loss=0.2595, ctc_loss=0.1727, cr_loss=0.4338, over 20726.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1479, cr_loss=0.3745, over 4089717.93 frames. ], batch size: 71, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:10:15,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=686406.1666666666, ans=0.0 2024-09-18 01:10:15,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=22.5 2024-09-18 01:10:35,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=686434.5, ans=0.2 2024-09-18 01:10:38,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=686462.8333333334, ans=10.0 2024-09-18 01:11:02,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=686491.1666666666, ans=0.0 2024-09-18 01:11:04,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=686491.1666666666, ans=0.125 2024-09-18 01:11:27,192 INFO [train.py:1198] (1/2) Epoch 38, batch 5850, loss[loss=0.2034, ctc_loss=0.1349, cr_loss=0.3424, over 21089.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1482, cr_loss=0.3746, over 4077826.76 frames. ], batch size: 59, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:11:29,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=686547.8333333334, ans=0.0 2024-09-18 01:11:53,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=686576.1666666666, ans=0.2 2024-09-18 01:12:02,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=686604.5, ans=0.5 2024-09-18 01:12:08,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=686604.5, ans=0.125 2024-09-18 01:12:12,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-18 01:12:22,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686632.8333333334, ans=0.125 2024-09-18 01:12:28,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.263e+02 2.383e+02 2.562e+02 4.441e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 01:12:44,244 INFO [train.py:1198] (1/2) Epoch 38, batch 5900, loss[loss=0.2287, ctc_loss=0.1549, cr_loss=0.3692, over 21031.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3734, over 4094478.24 frames. ], batch size: 63, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:13:02,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686717.8333333334, ans=0.125 2024-09-18 01:13:06,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686717.8333333334, ans=0.125 2024-09-18 01:13:09,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686717.8333333334, ans=0.0 2024-09-18 01:13:17,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686746.1666666666, ans=0.1 2024-09-18 01:13:30,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=686774.5, ans=0.2 2024-09-18 01:13:58,795 INFO [train.py:1198] (1/2) Epoch 38, batch 5950, loss[loss=0.2149, ctc_loss=0.1434, cr_loss=0.3571, over 20878.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3725, over 4101735.68 frames. ], batch size: 54, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:13:59,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=686831.1666666666, ans=0.125 2024-09-18 01:14:03,426 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:14:03,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686831.1666666666, ans=0.125 2024-09-18 01:14:13,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=686859.5, ans=0.0 2024-09-18 01:14:19,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686859.5, ans=0.125 2024-09-18 01:14:33,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686887.8333333334, ans=0.125 2024-09-18 01:14:59,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.184e+02 2.312e+02 2.587e+02 4.338e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-18 01:15:04,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=686944.5, ans=0.125 2024-09-18 01:15:13,354 INFO [train.py:1198] (1/2) Epoch 38, batch 6000, loss[loss=0.1934, ctc_loss=0.1248, cr_loss=0.3429, over 20984.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.371, over 4100661.53 frames. ], batch size: 52, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:15:13,355 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 01:15:34,939 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.03984, ctc_loss=0.03984, cr_loss=1.403e-14, over 944034.00 frames. 2024-09-18 01:15:34,940 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 01:15:36,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=686972.8333333334, ans=0.125 2024-09-18 01:15:41,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=686972.8333333334, ans=0.2 2024-09-18 01:15:58,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=687001.1666666666, ans=0.025 2024-09-18 01:16:13,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=687029.5, ans=0.125 2024-09-18 01:16:17,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=687057.8333333334, ans=0.0 2024-09-18 01:16:19,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=687057.8333333334, ans=0.0 2024-09-18 01:16:48,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2024-09-18 01:16:48,772 INFO [train.py:1198] (1/2) Epoch 38, batch 6050, loss[loss=0.2282, ctc_loss=0.1542, cr_loss=0.3698, over 21067.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1454, cr_loss=0.3694, over 4110416.62 frames. ], batch size: 59, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:17:04,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-18 01:17:36,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=687199.5, ans=0.05 2024-09-18 01:17:50,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.121e+02 2.271e+02 2.489e+02 3.721e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-18 01:18:04,030 INFO [train.py:1198] (1/2) Epoch 38, batch 6100, loss[loss=0.2096, ctc_loss=0.1388, cr_loss=0.3541, over 21005.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3695, over 4115321.04 frames. ], batch size: 55, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:18:54,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=687341.1666666666, ans=0.0 2024-09-18 01:19:07,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=687369.5, ans=0.125 2024-09-18 01:19:17,677 INFO [train.py:1198] (1/2) Epoch 38, batch 6150, loss[loss=0.2129, ctc_loss=0.1411, cr_loss=0.3589, over 20973.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3703, over 4099365.61 frames. ], batch size: 55, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:20:20,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.208e+02 2.352e+02 2.559e+02 3.096e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 01:20:23,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-18 01:20:33,204 INFO [train.py:1198] (1/2) Epoch 38, batch 6200, loss[loss=0.23, ctc_loss=0.1531, cr_loss=0.3842, over 20672.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3722, over 4061938.21 frames. ], batch size: 71, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:20:50,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=687567.8333333334, ans=0.0 2024-09-18 01:20:51,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-18 01:21:48,059 INFO [train.py:1198] (1/2) Epoch 38, batch 6250, loss[loss=0.2598, ctc_loss=0.1751, cr_loss=0.4233, over 19991.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3721, over 4005724.78 frames. ], batch size: 80, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:22:16,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=12.0 2024-09-18 01:22:31,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2024-09-18 01:22:46,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.267e+02 2.458e+02 2.648e+02 4.941e+02, threshold=4.916e+02, percent-clipped=1.0 2024-09-18 01:22:59,759 INFO [train.py:1198] (1/2) Epoch 38, batch 6300, loss[loss=0.2609, ctc_loss=0.1808, cr_loss=0.4002, over 14301.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3742, over 3905921.73 frames. ], batch size: 149, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:23:01,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=687822.8333333334, ans=10.0 2024-09-18 01:23:11,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687822.8333333334, ans=0.1 2024-09-18 01:23:23,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=687851.1666666666, ans=0.125 2024-09-18 01:23:23,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=687851.1666666666, ans=0.0 2024-09-18 01:23:33,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687879.5, ans=0.125 2024-09-18 01:23:33,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=687879.5, ans=0.0 2024-09-18 01:23:46,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=687907.8333333334, ans=0.0 2024-09-18 01:24:10,117 INFO [train.py:1198] (1/2) Epoch 38, batch 6350, loss[loss=0.2742, ctc_loss=0.1917, cr_loss=0.4126, over 14142.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3788, over 3709931.43 frames. ], batch size: 149, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:24:33,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=15.0 2024-09-18 01:24:44,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=688021.1666666666, ans=0.025 2024-09-18 01:25:01,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=688049.5, ans=0.125 2024-09-18 01:25:01,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-09-18 01:25:06,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=688077.8333333334, ans=0.125 2024-09-18 01:25:58,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 2.573e+02 2.741e+02 2.937e+02 3.484e+02, threshold=5.482e+02, percent-clipped=0.0 2024-09-18 01:25:58,187 INFO [train.py:1198] (1/2) Epoch 39, batch 0, loss[loss=0.2168, ctc_loss=0.1403, cr_loss=0.3825, over 20921.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1403, cr_loss=0.3825, over 20921.00 frames. ], batch size: 60, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:25:58,188 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 01:26:12,485 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5209, 4.6161, 4.5279, 4.5978], device='cuda:1') 2024-09-18 01:26:16,554 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.03987, ctc_loss=0.03987, cr_loss=1.418e-14, over 944034.00 frames. 2024-09-18 01:26:16,554 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 01:26:16,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688080.6666666666, ans=0.1 2024-09-18 01:27:05,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=688165.6666666666, ans=0.025 2024-09-18 01:27:06,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=688165.6666666666, ans=0.0 2024-09-18 01:27:12,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=688165.6666666666, ans=0.0 2024-09-18 01:27:35,312 INFO [train.py:1198] (1/2) Epoch 39, batch 50, loss[loss=0.2573, ctc_loss=0.1822, cr_loss=0.3752, over 13792.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3703, over 918468.94 frames. ], batch size: 149, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:28:39,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-18 01:28:51,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.207e+02 2.366e+02 2.486e+02 4.313e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-18 01:28:51,220 INFO [train.py:1198] (1/2) Epoch 39, batch 100, loss[loss=0.1895, ctc_loss=0.1199, cr_loss=0.348, over 20810.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3704, over 1626244.34 frames. ], batch size: 53, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:29:35,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=688449.0, ans=0.07 2024-09-18 01:30:06,556 INFO [train.py:1198] (1/2) Epoch 39, batch 150, loss[loss=0.2461, ctc_loss=0.1637, cr_loss=0.4123, over 20712.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3697, over 2177451.23 frames. ], batch size: 71, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:30:20,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-09-18 01:30:24,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-18 01:31:25,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.205e+02 2.315e+02 2.455e+02 3.675e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-18 01:31:25,323 INFO [train.py:1198] (1/2) Epoch 39, batch 200, loss[loss=0.2239, ctc_loss=0.1496, cr_loss=0.3712, over 20279.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3698, over 2601759.08 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:31:43,853 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:31:51,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=688675.6666666666, ans=0.035 2024-09-18 01:32:16,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=688732.3333333334, ans=0.0 2024-09-18 01:32:32,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=688760.6666666666, ans=0.125 2024-09-18 01:32:44,922 INFO [train.py:1198] (1/2) Epoch 39, batch 250, loss[loss=0.1943, ctc_loss=0.1279, cr_loss=0.3319, over 20956.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3707, over 2921877.04 frames. ], batch size: 50, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:33:00,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-18 01:33:07,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=688817.3333333334, ans=0.125 2024-09-18 01:33:12,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=688817.3333333334, ans=0.0 2024-09-18 01:33:34,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=688874.0, ans=0.015 2024-09-18 01:33:47,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688902.3333333334, ans=0.1 2024-09-18 01:33:59,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-18 01:34:00,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.215e+02 2.353e+02 2.497e+02 4.786e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-18 01:34:00,670 INFO [train.py:1198] (1/2) Epoch 39, batch 300, loss[loss=0.1964, ctc_loss=0.1266, cr_loss=0.3489, over 20947.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3689, over 3176605.63 frames. ], batch size: 51, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:34:23,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=688959.0, ans=0.125 2024-09-18 01:34:57,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2024-09-18 01:35:02,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=689044.0, ans=0.0 2024-09-18 01:35:16,119 INFO [train.py:1198] (1/2) Epoch 39, batch 350, loss[loss=0.195, ctc_loss=0.1255, cr_loss=0.3474, over 20940.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3686, over 3387787.10 frames. ], batch size: 49, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:35:19,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689072.3333333334, ans=0.1 2024-09-18 01:35:19,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-18 01:35:20,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=689072.3333333334, ans=0.125 2024-09-18 01:35:27,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689072.3333333334, ans=0.1 2024-09-18 01:36:23,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=689185.6666666666, ans=0.04949747468305833 2024-09-18 01:36:26,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2024-09-18 01:36:35,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.171e+02 2.344e+02 2.456e+02 3.665e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 01:36:35,566 INFO [train.py:1198] (1/2) Epoch 39, batch 400, loss[loss=0.1956, ctc_loss=0.1273, cr_loss=0.3416, over 20942.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3696, over 3550418.63 frames. ], batch size: 50, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:36:42,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-18 01:36:48,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=689214.0, ans=0.0 2024-09-18 01:37:04,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689270.6666666666, ans=0.1 2024-09-18 01:37:15,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=689270.6666666666, ans=0.0 2024-09-18 01:37:15,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-18 01:37:51,277 INFO [train.py:1198] (1/2) Epoch 39, batch 450, loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.3761, over 21004.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3682, over 3674418.70 frames. ], batch size: 61, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:38:43,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-18 01:38:46,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=689440.6666666666, ans=0.0 2024-09-18 01:38:54,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-09-18 01:39:04,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=689469.0, ans=0.025 2024-09-18 01:39:07,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=689469.0, ans=0.2 2024-09-18 01:39:10,206 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.249e+02 2.367e+02 2.613e+02 3.957e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 01:39:10,224 INFO [train.py:1198] (1/2) Epoch 39, batch 500, loss[loss=0.2301, ctc_loss=0.1533, cr_loss=0.3839, over 20664.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3681, over 3767599.04 frames. ], batch size: 71, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:39:24,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-18 01:39:38,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-18 01:40:23,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=689610.6666666666, ans=0.125 2024-09-18 01:40:25,980 INFO [train.py:1198] (1/2) Epoch 39, batch 550, loss[loss=0.2205, ctc_loss=0.1471, cr_loss=0.367, over 21021.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1429, cr_loss=0.3662, over 3842859.00 frames. ], batch size: 61, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:40:31,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2024-09-18 01:40:32,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-18 01:40:34,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=689639.0, ans=0.2 2024-09-18 01:41:27,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=689752.3333333334, ans=0.2 2024-09-18 01:41:30,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-09-18 01:41:42,188 INFO [train.py:1198] (1/2) Epoch 39, batch 600, loss[loss=0.2248, ctc_loss=0.1485, cr_loss=0.3815, over 20863.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1432, cr_loss=0.3666, over 3898357.28 frames. ], batch size: 57, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:41:43,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.192e+02 2.325e+02 2.469e+02 3.248e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 01:41:49,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-18 01:41:56,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=22.5 2024-09-18 01:42:02,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=689809.0, ans=0.95 2024-09-18 01:42:16,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=689837.3333333334, ans=0.125 2024-09-18 01:43:01,320 INFO [train.py:1198] (1/2) Epoch 39, batch 650, loss[loss=0.2325, ctc_loss=0.1547, cr_loss=0.389, over 20962.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1431, cr_loss=0.3667, over 3955032.90 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:43:01,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=689922.3333333334, ans=0.125 2024-09-18 01:43:08,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=689922.3333333334, ans=0.2 2024-09-18 01:43:42,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=689979.0, ans=0.125 2024-09-18 01:43:52,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=690007.3333333334, ans=0.0 2024-09-18 01:44:17,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=690035.6666666666, ans=0.125 2024-09-18 01:44:20,005 INFO [train.py:1198] (1/2) Epoch 39, batch 700, loss[loss=0.2214, ctc_loss=0.1489, cr_loss=0.3629, over 21016.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3689, over 3980087.52 frames. ], batch size: 63, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:44:21,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.187e+02 2.336e+02 2.483e+02 5.228e+02, threshold=4.671e+02, percent-clipped=1.0 2024-09-18 01:44:32,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-18 01:44:49,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2024-09-18 01:45:28,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=690177.3333333334, ans=0.125 2024-09-18 01:45:34,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=690205.6666666666, ans=0.125 2024-09-18 01:45:35,699 INFO [train.py:1198] (1/2) Epoch 39, batch 750, loss[loss=0.21, ctc_loss=0.1418, cr_loss=0.3411, over 20957.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.369, over 4001033.52 frames. ], batch size: 48, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:45:54,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=690234.0, ans=0.0 2024-09-18 01:46:11,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=690262.3333333334, ans=0.125 2024-09-18 01:46:52,230 INFO [train.py:1198] (1/2) Epoch 39, batch 800, loss[loss=0.2092, ctc_loss=0.1377, cr_loss=0.3573, over 21005.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3682, over 4020830.33 frames. ], batch size: 52, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:46:53,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.253e+02 2.414e+02 2.538e+02 3.314e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-18 01:47:18,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690375.6666666666, ans=0.1 2024-09-18 01:47:45,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=690432.3333333334, ans=0.025 2024-09-18 01:47:55,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=690460.6666666666, ans=0.125 2024-09-18 01:48:02,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=690460.6666666666, ans=0.125 2024-09-18 01:48:10,958 INFO [train.py:1198] (1/2) Epoch 39, batch 850, loss[loss=0.2526, ctc_loss=0.1687, cr_loss=0.4192, over 20896.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3688, over 4042056.00 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:49:06,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=690574.0, ans=0.0 2024-09-18 01:49:20,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=690602.3333333334, ans=0.125 2024-09-18 01:49:23,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:49:27,446 INFO [train.py:1198] (1/2) Epoch 39, batch 900, loss[loss=0.2207, ctc_loss=0.1492, cr_loss=0.3572, over 20902.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.37, over 4043539.55 frames. ], batch size: 57, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:49:28,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.209e+02 2.350e+02 2.525e+02 4.801e+02, threshold=4.701e+02, percent-clipped=0.0 2024-09-18 01:49:43,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-09-18 01:49:44,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690659.0, ans=0.0 2024-09-18 01:50:15,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.83 vs. limit=10.0 2024-09-18 01:50:19,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=690715.6666666666, ans=0.04949747468305833 2024-09-18 01:50:39,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=690744.0, ans=0.025 2024-09-18 01:50:46,502 INFO [train.py:1198] (1/2) Epoch 39, batch 950, loss[loss=0.198, ctc_loss=0.1277, cr_loss=0.3513, over 20962.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3703, over 4059624.71 frames. ], batch size: 50, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:51:18,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.11 vs. limit=6.0 2024-09-18 01:51:56,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-18 01:52:01,725 INFO [train.py:1198] (1/2) Epoch 39, batch 1000, loss[loss=0.2273, ctc_loss=0.1525, cr_loss=0.3739, over 21028.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3696, over 4050141.27 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:52:04,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.207e+02 2.328e+02 2.485e+02 3.427e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 01:52:11,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.94 vs. limit=10.0 2024-09-18 01:53:08,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=691027.3333333334, ans=0.0 2024-09-18 01:53:17,089 INFO [train.py:1198] (1/2) Epoch 39, batch 1050, loss[loss=0.215, ctc_loss=0.1424, cr_loss=0.3634, over 20361.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3712, over 4065644.29 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:53:32,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=691055.6666666666, ans=0.0 2024-09-18 01:54:04,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=691140.6666666666, ans=0.0 2024-09-18 01:54:16,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=691140.6666666666, ans=0.95 2024-09-18 01:54:16,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=691140.6666666666, ans=0.07 2024-09-18 01:54:25,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=691169.0, ans=0.0 2024-09-18 01:54:30,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=691169.0, ans=0.125 2024-09-18 01:54:31,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=691169.0, ans=0.05 2024-09-18 01:54:35,702 INFO [train.py:1198] (1/2) Epoch 39, batch 1100, loss[loss=0.2451, ctc_loss=0.1628, cr_loss=0.4114, over 20089.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3714, over 4062535.14 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:54:38,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.228e+02 2.356e+02 2.530e+02 2.946e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-18 01:54:45,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691197.3333333334, ans=0.1 2024-09-18 01:55:03,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-18 01:55:12,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691254.0, ans=0.1 2024-09-18 01:55:16,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-18 01:55:30,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=691282.3333333334, ans=0.125 2024-09-18 01:55:56,169 INFO [train.py:1198] (1/2) Epoch 39, batch 1150, loss[loss=0.2857, ctc_loss=0.1968, cr_loss=0.4443, over 13688.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1459, cr_loss=0.3703, over 4064351.04 frames. ], batch size: 149, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:56:03,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-18 01:56:04,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=691339.0, ans=0.2 2024-09-18 01:57:12,960 INFO [train.py:1198] (1/2) Epoch 39, batch 1200, loss[loss=0.2335, ctc_loss=0.1538, cr_loss=0.3989, over 20946.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.371, over 4081194.36 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:57:15,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.212e+02 2.345e+02 2.470e+02 3.222e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 01:57:25,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=691480.6666666666, ans=0.125 2024-09-18 01:57:42,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-18 01:57:45,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=691537.3333333334, ans=0.125 2024-09-18 01:58:29,000 INFO [train.py:1198] (1/2) Epoch 39, batch 1250, loss[loss=0.2052, ctc_loss=0.1358, cr_loss=0.3468, over 20984.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3701, over 4091471.02 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:58:39,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-18 01:58:50,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=691650.6666666666, ans=0.125 2024-09-18 01:58:52,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691650.6666666666, ans=0.1 2024-09-18 01:58:55,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=691650.6666666666, ans=0.125 2024-09-18 01:59:48,443 INFO [train.py:1198] (1/2) Epoch 39, batch 1300, loss[loss=0.2082, ctc_loss=0.137, cr_loss=0.3559, over 21004.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3699, over 4093197.91 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:59:51,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.365e+02 2.635e+02 3.413e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 02:00:06,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-18 02:00:08,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=691792.3333333334, ans=0.04949747468305833 2024-09-18 02:00:10,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=691792.3333333334, ans=0.125 2024-09-18 02:00:26,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=691820.6666666666, ans=0.125 2024-09-18 02:00:29,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=691820.6666666666, ans=0.025 2024-09-18 02:00:49,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=691877.3333333334, ans=0.125 2024-09-18 02:00:58,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691877.3333333334, ans=0.1 2024-09-18 02:01:03,954 INFO [train.py:1198] (1/2) Epoch 39, batch 1350, loss[loss=0.2017, ctc_loss=0.1343, cr_loss=0.337, over 21054.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1446, cr_loss=0.3687, over 4100192.74 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:02:22,568 INFO [train.py:1198] (1/2) Epoch 39, batch 1400, loss[loss=0.21, ctc_loss=0.1385, cr_loss=0.3573, over 20799.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3695, over 4097288.21 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:02:25,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.227e+02 2.358e+02 2.516e+02 4.226e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 02:02:44,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-18 02:03:04,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=692104.0, ans=0.125 2024-09-18 02:03:27,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692160.6666666666, ans=0.1 2024-09-18 02:03:28,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=692160.6666666666, ans=0.025 2024-09-18 02:03:38,790 INFO [train.py:1198] (1/2) Epoch 39, batch 1450, loss[loss=0.2235, ctc_loss=0.147, cr_loss=0.3823, over 21072.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3705, over 4103922.39 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:03:53,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=692217.3333333334, ans=0.025 2024-09-18 02:04:23,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=692274.0, ans=0.05 2024-09-18 02:04:24,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692274.0, ans=0.1 2024-09-18 02:04:54,824 INFO [train.py:1198] (1/2) Epoch 39, batch 1500, loss[loss=0.2425, ctc_loss=0.1595, cr_loss=0.4149, over 20985.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3714, over 4107903.06 frames. ], batch size: 64, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:04:57,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.181e+02 2.284e+02 2.433e+02 3.092e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-18 02:06:13,550 INFO [train.py:1198] (1/2) Epoch 39, batch 1550, loss[loss=0.2628, ctc_loss=0.1784, cr_loss=0.4221, over 20659.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3717, over 4095088.48 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:06:24,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=692472.3333333334, ans=0.125 2024-09-18 02:06:30,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=692500.6666666666, ans=0.125 2024-09-18 02:07:06,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692557.3333333334, ans=0.1 2024-09-18 02:07:33,095 INFO [train.py:1198] (1/2) Epoch 39, batch 1600, loss[loss=0.2351, ctc_loss=0.1588, cr_loss=0.3818, over 20962.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3721, over 4099895.94 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:07:36,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.173e+02 2.336e+02 2.487e+02 3.153e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-18 02:08:27,805 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:08:47,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=692755.6666666666, ans=0.5 2024-09-18 02:08:48,611 INFO [train.py:1198] (1/2) Epoch 39, batch 1650, loss[loss=0.2068, ctc_loss=0.1363, cr_loss=0.3525, over 20977.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3715, over 4105713.16 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:09:08,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=692784.0, ans=0.5 2024-09-18 02:09:17,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=692812.3333333334, ans=0.125 2024-09-18 02:09:20,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=692812.3333333334, ans=0.2 2024-09-18 02:09:22,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=692812.3333333334, ans=0.0 2024-09-18 02:09:43,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=692840.6666666666, ans=0.025 2024-09-18 02:09:56,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-18 02:10:03,136 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:10:04,187 INFO [train.py:1198] (1/2) Epoch 39, batch 1700, loss[loss=0.2155, ctc_loss=0.1415, cr_loss=0.3704, over 20797.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.371, over 4105378.58 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:10:07,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=692897.3333333334, ans=0.125 2024-09-18 02:10:08,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.165e+02 2.297e+02 2.416e+02 6.730e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-18 02:10:50,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=15.0 2024-09-18 02:11:22,982 INFO [train.py:1198] (1/2) Epoch 39, batch 1750, loss[loss=0.2364, ctc_loss=0.1669, cr_loss=0.3472, over 14161.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3709, over 4093932.24 frames. ], batch size: 149, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:11:29,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693039.0, ans=0.1 2024-09-18 02:11:39,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=693067.3333333334, ans=0.0 2024-09-18 02:12:07,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693124.0, ans=0.125 2024-09-18 02:12:38,395 INFO [train.py:1198] (1/2) Epoch 39, batch 1800, loss[loss=0.1972, ctc_loss=0.129, cr_loss=0.3406, over 20948.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3705, over 4101543.17 frames. ], batch size: 49, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:12:42,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.216e+02 2.325e+02 2.535e+02 3.394e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 02:12:44,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=693180.6666666666, ans=0.2 2024-09-18 02:13:28,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693265.6666666666, ans=0.1 2024-09-18 02:13:31,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 02:13:38,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=693265.6666666666, ans=0.0 2024-09-18 02:13:56,560 INFO [train.py:1198] (1/2) Epoch 39, batch 1850, loss[loss=0.2104, ctc_loss=0.1369, cr_loss=0.3675, over 20777.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.372, over 4092090.86 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:13:58,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693322.3333333334, ans=0.125 2024-09-18 02:14:35,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=693379.0, ans=0.0 2024-09-18 02:14:50,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693407.3333333334, ans=0.1 2024-09-18 02:15:01,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-09-18 02:15:13,214 INFO [train.py:1198] (1/2) Epoch 39, batch 1900, loss[loss=0.2307, ctc_loss=0.1529, cr_loss=0.389, over 20971.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.37, over 4095195.74 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:15:19,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.166e+02 2.310e+02 2.460e+02 3.340e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-18 02:15:39,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2024-09-18 02:16:12,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=693577.3333333334, ans=0.125 2024-09-18 02:16:26,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-09-18 02:16:28,581 INFO [train.py:1198] (1/2) Epoch 39, batch 1950, loss[loss=0.2043, ctc_loss=0.1327, cr_loss=0.3581, over 20902.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3711, over 4101573.08 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:16:44,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-09-18 02:16:48,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2024-09-18 02:16:52,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-09-18 02:17:43,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2024-09-18 02:17:43,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=8.0 2024-09-18 02:17:47,111 INFO [train.py:1198] (1/2) Epoch 39, batch 2000, loss[loss=0.2374, ctc_loss=0.159, cr_loss=0.3919, over 20936.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.371, over 4109743.52 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:17:53,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.194e+02 2.329e+02 2.491e+02 3.480e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-18 02:17:54,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-09-18 02:18:01,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=693775.6666666666, ans=0.125 2024-09-18 02:18:33,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-18 02:19:06,066 INFO [train.py:1198] (1/2) Epoch 39, batch 2050, loss[loss=0.2131, ctc_loss=0.1377, cr_loss=0.377, over 20935.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.369, over 4108788.67 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:19:17,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-18 02:19:53,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693974.0, ans=0.125 2024-09-18 02:20:00,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=693974.0, ans=0.2 2024-09-18 02:20:21,884 INFO [train.py:1198] (1/2) Epoch 39, batch 2100, loss[loss=0.2396, ctc_loss=0.1592, cr_loss=0.402, over 20658.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.369, over 4115951.90 frames. ], batch size: 66, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:20:27,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.207e+02 2.301e+02 2.550e+02 4.509e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-18 02:20:57,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=694087.3333333334, ans=0.125 2024-09-18 02:21:10,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=694115.6666666666, ans=0.0 2024-09-18 02:21:29,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-18 02:21:37,998 INFO [train.py:1198] (1/2) Epoch 39, batch 2150, loss[loss=0.2951, ctc_loss=0.2091, cr_loss=0.4302, over 14385.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3693, over 4103411.09 frames. ], batch size: 149, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:22:04,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=694200.6666666666, ans=0.0 2024-09-18 02:22:04,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=694200.6666666666, ans=10.0 2024-09-18 02:22:57,062 INFO [train.py:1198] (1/2) Epoch 39, batch 2200, loss[loss=0.2429, ctc_loss=0.1601, cr_loss=0.4142, over 21017.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.37, over 4099936.84 frames. ], batch size: 61, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:23:02,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=694314.0, ans=0.025 2024-09-18 02:23:03,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.219e+02 2.374e+02 2.554e+02 3.781e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 02:23:29,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=694370.6666666666, ans=0.2 2024-09-18 02:23:35,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=694370.6666666666, ans=0.0 2024-09-18 02:23:38,146 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:24:05,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=694427.3333333334, ans=10.0 2024-09-18 02:24:12,236 INFO [train.py:1198] (1/2) Epoch 39, batch 2250, loss[loss=0.2392, ctc_loss=0.1616, cr_loss=0.3879, over 19960.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3697, over 4094885.22 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:24:41,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=694484.0, ans=0.0 2024-09-18 02:24:51,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=694512.3333333334, ans=0.0 2024-09-18 02:25:12,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694540.6666666666, ans=0.1 2024-09-18 02:25:16,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-18 02:25:19,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=694569.0, ans=0.0 2024-09-18 02:25:21,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=694569.0, ans=0.125 2024-09-18 02:25:30,110 INFO [train.py:1198] (1/2) Epoch 39, batch 2300, loss[loss=0.2429, ctc_loss=0.1641, cr_loss=0.3941, over 20023.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3705, over 4082484.44 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:25:34,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694597.3333333334, ans=0.0 2024-09-18 02:25:36,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.214e+02 2.383e+02 2.580e+02 4.808e+02, threshold=4.766e+02, percent-clipped=1.0 2024-09-18 02:25:59,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=694654.0, ans=0.0 2024-09-18 02:26:36,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694710.6666666666, ans=0.0 2024-09-18 02:26:42,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=694710.6666666666, ans=0.125 2024-09-18 02:26:46,175 INFO [train.py:1198] (1/2) Epoch 39, batch 2350, loss[loss=0.2122, ctc_loss=0.1407, cr_loss=0.3574, over 20976.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3715, over 4081108.40 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:27:24,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=694795.6666666666, ans=0.0 2024-09-18 02:28:04,659 INFO [train.py:1198] (1/2) Epoch 39, batch 2400, loss[loss=0.208, ctc_loss=0.1378, cr_loss=0.3509, over 20792.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3703, over 4085050.04 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:28:10,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.205e+02 2.353e+02 2.565e+02 3.948e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 02:28:50,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-18 02:28:53,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694965.6666666666, ans=0.0 2024-09-18 02:28:56,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694965.6666666666, ans=0.1 2024-09-18 02:28:57,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694965.6666666666, ans=0.125 2024-09-18 02:29:10,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=694994.0, ans=0.025 2024-09-18 02:29:20,623 INFO [train.py:1198] (1/2) Epoch 39, batch 2450, loss[loss=0.2449, ctc_loss=0.1626, cr_loss=0.4111, over 21079.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3699, over 4081868.56 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:29:20,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695022.3333333334, ans=0.1 2024-09-18 02:29:27,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=695022.3333333334, ans=0.0 2024-09-18 02:30:27,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=695135.6666666666, ans=0.025 2024-09-18 02:30:37,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=695164.0, ans=0.125 2024-09-18 02:30:39,083 INFO [train.py:1198] (1/2) Epoch 39, batch 2500, loss[loss=0.1883, ctc_loss=0.1225, cr_loss=0.3289, over 20961.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3703, over 4094690.80 frames. ], batch size: 48, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:30:45,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.185e+02 2.340e+02 2.463e+02 3.257e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 02:31:00,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=695192.3333333334, ans=0.125 2024-09-18 02:31:17,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=695220.6666666666, ans=0.0 2024-09-18 02:31:30,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=695249.0, ans=0.2 2024-09-18 02:31:51,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=695277.3333333334, ans=0.125 2024-09-18 02:31:54,339 INFO [train.py:1198] (1/2) Epoch 39, batch 2550, loss[loss=0.1968, ctc_loss=0.1301, cr_loss=0.3339, over 20937.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3693, over 4090399.16 frames. ], batch size: 50, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:32:08,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=22.5 2024-09-18 02:32:09,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=695334.0, ans=0.125 2024-09-18 02:32:31,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-09-18 02:32:53,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=695419.0, ans=0.2 2024-09-18 02:32:54,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-18 02:32:59,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=695419.0, ans=0.125 2024-09-18 02:33:09,983 INFO [train.py:1198] (1/2) Epoch 39, batch 2600, loss[loss=0.2874, ctc_loss=0.195, cr_loss=0.4621, over 18488.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3688, over 4101091.32 frames. ], batch size: 108, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:33:13,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=695447.3333333334, ans=0.0 2024-09-18 02:33:16,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.210e+02 2.325e+02 2.513e+02 3.040e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-18 02:33:37,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=695475.6666666666, ans=0.125 2024-09-18 02:33:47,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=695504.0, ans=0.0 2024-09-18 02:34:28,563 INFO [train.py:1198] (1/2) Epoch 39, batch 2650, loss[loss=0.2099, ctc_loss=0.1409, cr_loss=0.3447, over 20956.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3695, over 4109151.11 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:34:31,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=695589.0, ans=0.125 2024-09-18 02:34:40,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695589.0, ans=0.1 2024-09-18 02:34:45,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=695617.3333333334, ans=0.2 2024-09-18 02:34:58,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-09-18 02:35:20,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=695674.0, ans=0.125 2024-09-18 02:35:26,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=695674.0, ans=0.125 2024-09-18 02:35:28,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=695702.3333333334, ans=0.125 2024-09-18 02:35:44,692 INFO [train.py:1198] (1/2) Epoch 39, batch 2700, loss[loss=0.1864, ctc_loss=0.1218, cr_loss=0.3228, over 21040.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3686, over 4109239.08 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:35:50,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.202e+02 2.348e+02 2.533e+02 4.629e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 02:36:07,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=695759.0, ans=0.125 2024-09-18 02:36:19,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=695787.3333333334, ans=0.0 2024-09-18 02:36:32,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-18 02:37:03,320 INFO [train.py:1198] (1/2) Epoch 39, batch 2750, loss[loss=0.2297, ctc_loss=0.1523, cr_loss=0.3873, over 20288.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.368, over 4109880.89 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:37:26,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-18 02:37:41,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=695929.0, ans=0.125 2024-09-18 02:37:50,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=695957.3333333334, ans=0.0 2024-09-18 02:37:53,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695957.3333333334, ans=0.1 2024-09-18 02:37:55,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=695957.3333333334, ans=0.2 2024-09-18 02:38:01,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=695957.3333333334, ans=0.125 2024-09-18 02:38:19,068 INFO [train.py:1198] (1/2) Epoch 39, batch 2800, loss[loss=0.2162, ctc_loss=0.1448, cr_loss=0.3569, over 20887.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3706, over 4101937.57 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:38:25,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.185e+02 2.290e+02 2.480e+02 5.351e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-18 02:39:04,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696099.0, ans=0.125 2024-09-18 02:39:06,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696099.0, ans=0.1 2024-09-18 02:39:06,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=696099.0, ans=0.0 2024-09-18 02:39:13,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=696099.0, ans=0.1 2024-09-18 02:39:18,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=696127.3333333334, ans=0.125 2024-09-18 02:39:38,057 INFO [train.py:1198] (1/2) Epoch 39, batch 2850, loss[loss=0.2062, ctc_loss=0.135, cr_loss=0.3558, over 21072.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.371, over 4104142.72 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:39:41,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696155.6666666666, ans=0.125 2024-09-18 02:39:49,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2024-09-18 02:40:02,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=696184.0, ans=0.125 2024-09-18 02:40:03,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=696184.0, ans=0.125 2024-09-18 02:40:22,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696240.6666666666, ans=0.125 2024-09-18 02:40:53,630 INFO [train.py:1198] (1/2) Epoch 39, batch 2900, loss[loss=0.2318, ctc_loss=0.1518, cr_loss=0.4, over 20834.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3712, over 4101467.09 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:40:59,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.170e+02 2.308e+02 2.451e+02 3.176e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-18 02:41:07,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=696325.6666666666, ans=0.0 2024-09-18 02:41:18,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-18 02:41:34,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=696354.0, ans=0.2 2024-09-18 02:41:59,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696410.6666666666, ans=0.125 2024-09-18 02:42:01,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=696410.6666666666, ans=0.0 2024-09-18 02:42:12,302 INFO [train.py:1198] (1/2) Epoch 39, batch 2950, loss[loss=0.228, ctc_loss=0.1534, cr_loss=0.3732, over 21044.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3713, over 4092973.87 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:42:58,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696524.0, ans=0.1 2024-09-18 02:42:59,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=696524.0, ans=0.2 2024-09-18 02:43:10,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-18 02:43:27,895 INFO [train.py:1198] (1/2) Epoch 39, batch 3000, loss[loss=0.2254, ctc_loss=0.1509, cr_loss=0.3725, over 21064.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3703, over 4105499.08 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:43:27,896 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 02:43:51,733 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.03975, ctc_loss=0.03975, cr_loss=1.411e-14, over 944034.00 frames. 2024-09-18 02:43:51,733 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 02:43:57,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.205e+02 2.340e+02 2.590e+02 3.976e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 02:44:02,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696580.6666666666, ans=0.1 2024-09-18 02:44:08,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=696609.0, ans=0.0 2024-09-18 02:44:35,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=696665.6666666666, ans=0.125 2024-09-18 02:45:09,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696722.3333333334, ans=0.125 2024-09-18 02:45:10,537 INFO [train.py:1198] (1/2) Epoch 39, batch 3050, loss[loss=0.2521, ctc_loss=0.1702, cr_loss=0.4093, over 18320.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3719, over 4090494.23 frames. ], batch size: 108, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:45:16,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=696722.3333333334, ans=0.125 2024-09-18 02:45:39,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-18 02:45:44,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696779.0, ans=0.125 2024-09-18 02:45:57,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=696807.3333333334, ans=0.0 2024-09-18 02:46:26,422 INFO [train.py:1198] (1/2) Epoch 39, batch 3100, loss[loss=0.1995, ctc_loss=0.1336, cr_loss=0.3293, over 20996.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.369, over 4100992.63 frames. ], batch size: 51, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:46:34,078 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.214e+02 2.345e+02 2.505e+02 3.409e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 02:46:45,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696892.3333333334, ans=0.125 2024-09-18 02:46:49,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=696892.3333333334, ans=0.2 2024-09-18 02:47:22,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=696949.0, ans=0.125 2024-09-18 02:47:44,725 INFO [train.py:1198] (1/2) Epoch 39, batch 3150, loss[loss=0.229, ctc_loss=0.1558, cr_loss=0.3661, over 20854.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3703, over 4105765.71 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 02:49:00,472 INFO [train.py:1198] (1/2) Epoch 39, batch 3200, loss[loss=0.1871, ctc_loss=0.1224, cr_loss=0.3234, over 20980.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.3678, over 4102765.66 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:49:08,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.154e+02 2.285e+02 2.445e+02 2.956e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-18 02:49:28,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-18 02:49:58,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=697232.3333333334, ans=0.035 2024-09-18 02:50:06,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697260.6666666666, ans=0.1 2024-09-18 02:50:11,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=697260.6666666666, ans=0.125 2024-09-18 02:50:14,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=697260.6666666666, ans=0.0 2024-09-18 02:50:16,758 INFO [train.py:1198] (1/2) Epoch 39, batch 3250, loss[loss=0.2557, ctc_loss=0.1686, cr_loss=0.4354, over 20956.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.3679, over 4107159.54 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:50:42,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=22.5 2024-09-18 02:51:10,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-18 02:51:17,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=697374.0, ans=0.125 2024-09-18 02:51:36,064 INFO [train.py:1198] (1/2) Epoch 39, batch 3300, loss[loss=0.2263, ctc_loss=0.1489, cr_loss=0.3868, over 20956.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.368, over 4103212.45 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:51:43,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.211e+02 2.364e+02 2.512e+02 3.508e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 02:51:43,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=697430.6666666666, ans=0.0 2024-09-18 02:51:47,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697430.6666666666, ans=0.1 2024-09-18 02:51:47,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=697430.6666666666, ans=0.0 2024-09-18 02:52:01,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=697459.0, ans=0.2 2024-09-18 02:52:07,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2024-09-18 02:52:26,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=697515.6666666666, ans=0.0 2024-09-18 02:52:29,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697515.6666666666, ans=0.1 2024-09-18 02:52:36,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=697544.0, ans=0.05 2024-09-18 02:52:41,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=697544.0, ans=0.07 2024-09-18 02:52:51,700 INFO [train.py:1198] (1/2) Epoch 39, batch 3350, loss[loss=0.1952, ctc_loss=0.1268, cr_loss=0.3423, over 20929.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.369, over 4110287.60 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:53:02,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=697572.3333333334, ans=0.125 2024-09-18 02:53:08,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=697600.6666666666, ans=0.125 2024-09-18 02:53:37,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=697629.0, ans=0.025 2024-09-18 02:54:10,190 INFO [train.py:1198] (1/2) Epoch 39, batch 3400, loss[loss=0.2325, ctc_loss=0.1563, cr_loss=0.3809, over 21050.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3699, over 4107215.84 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:54:17,719 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.216e+02 2.332e+02 2.482e+02 7.936e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-18 02:54:44,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=697770.6666666666, ans=0.0 2024-09-18 02:55:05,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=697799.0, ans=0.0 2024-09-18 02:55:08,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-09-18 02:55:25,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=697855.6666666666, ans=0.0 2024-09-18 02:55:26,233 INFO [train.py:1198] (1/2) Epoch 39, batch 3450, loss[loss=0.2065, ctc_loss=0.1342, cr_loss=0.3616, over 20842.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.3701, over 4098320.25 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:55:31,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697855.6666666666, ans=0.1 2024-09-18 02:55:56,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=697912.3333333334, ans=0.125 2024-09-18 02:56:31,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=697969.0, ans=0.125 2024-09-18 02:56:44,832 INFO [train.py:1198] (1/2) Epoch 39, batch 3500, loss[loss=0.2352, ctc_loss=0.1591, cr_loss=0.3809, over 20069.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3694, over 4102355.65 frames. ], batch size: 80, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:56:48,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697997.3333333334, ans=0.1 2024-09-18 02:56:51,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-18 02:56:52,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.181e+02 2.338e+02 2.494e+02 3.391e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 02:57:16,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=698054.0, ans=0.125 2024-09-18 02:57:24,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698054.0, ans=0.125 2024-09-18 02:57:33,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698082.3333333334, ans=0.125 2024-09-18 02:57:35,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=698082.3333333334, ans=15.0 2024-09-18 02:57:37,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=698082.3333333334, ans=0.125 2024-09-18 02:57:50,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-18 02:58:00,207 INFO [train.py:1198] (1/2) Epoch 39, batch 3550, loss[loss=0.1976, ctc_loss=0.1298, cr_loss=0.3391, over 20970.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3716, over 4091561.05 frames. ], batch size: 49, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:59:19,568 INFO [train.py:1198] (1/2) Epoch 39, batch 3600, loss[loss=0.1953, ctc_loss=0.1273, cr_loss=0.3398, over 21062.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3699, over 4087584.14 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:59:27,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.233e+02 2.349e+02 2.443e+02 2.960e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 02:59:48,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=698337.3333333334, ans=0.125 2024-09-18 03:00:14,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-18 03:00:35,287 INFO [train.py:1198] (1/2) Epoch 39, batch 3650, loss[loss=0.191, ctc_loss=0.1236, cr_loss=0.3367, over 20977.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3687, over 4100682.30 frames. ], batch size: 50, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:00:40,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=698422.3333333334, ans=0.125 2024-09-18 03:01:50,958 INFO [train.py:1198] (1/2) Epoch 39, batch 3700, loss[loss=0.1761, ctc_loss=0.1158, cr_loss=0.3015, over 21077.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.369, over 4104757.64 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:01:58,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.234e+02 2.357e+02 2.503e+02 2.991e+02, threshold=4.714e+02, percent-clipped=0.0 2024-09-18 03:02:00,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=698564.0, ans=0.05 2024-09-18 03:02:03,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698564.0, ans=0.1 2024-09-18 03:02:03,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=698564.0, ans=0.0 2024-09-18 03:03:02,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=698677.3333333334, ans=0.5 2024-09-18 03:03:03,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-09-18 03:03:09,958 INFO [train.py:1198] (1/2) Epoch 39, batch 3750, loss[loss=0.2462, ctc_loss=0.1712, cr_loss=0.3747, over 14484.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3708, over 4095879.68 frames. ], batch size: 149, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:03:24,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698734.0, ans=0.0 2024-09-18 03:03:34,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=698734.0, ans=0.125 2024-09-18 03:04:15,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698819.0, ans=0.0 2024-09-18 03:04:25,473 INFO [train.py:1198] (1/2) Epoch 39, batch 3800, loss[loss=0.1698, ctc_loss=0.1108, cr_loss=0.2954, over 19882.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3712, over 4084017.17 frames. ], batch size: 44, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:04:35,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.202e+02 2.364e+02 2.543e+02 6.705e+02, threshold=4.729e+02, percent-clipped=2.0 2024-09-18 03:05:05,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-18 03:05:06,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=698904.0, ans=0.0 2024-09-18 03:05:11,182 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:05:17,446 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:05:17,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=698932.3333333334, ans=0.125 2024-09-18 03:05:18,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=698932.3333333334, ans=0.125 2024-09-18 03:05:27,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698960.6666666666, ans=0.1 2024-09-18 03:05:40,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-18 03:05:44,286 INFO [train.py:1198] (1/2) Epoch 39, batch 3850, loss[loss=0.2404, ctc_loss=0.1615, cr_loss=0.3942, over 20318.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3709, over 4090584.53 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:06:17,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=699045.6666666666, ans=0.125 2024-09-18 03:06:19,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=699045.6666666666, ans=0.125 2024-09-18 03:06:40,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=699074.0, ans=0.02 2024-09-18 03:06:42,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=699074.0, ans=0.025 2024-09-18 03:06:59,768 INFO [train.py:1198] (1/2) Epoch 39, batch 3900, loss[loss=0.1696, ctc_loss=0.1106, cr_loss=0.2948, over 20989.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3719, over 4100210.70 frames. ], batch size: 48, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:07:07,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.205e+02 2.341e+02 2.496e+02 6.315e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 03:07:57,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=699215.6666666666, ans=0.0 2024-09-18 03:08:18,346 INFO [train.py:1198] (1/2) Epoch 39, batch 3950, loss[loss=0.272, ctc_loss=0.1848, cr_loss=0.4362, over 18537.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.3708, over 4094136.13 frames. ], batch size: 108, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:08:50,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-18 03:08:51,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-18 03:09:02,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=699357.3333333334, ans=0.125 2024-09-18 03:09:28,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=699385.6666666666, ans=0.0 2024-09-18 03:09:34,450 INFO [train.py:1198] (1/2) Epoch 39, batch 4000, loss[loss=0.2259, ctc_loss=0.1485, cr_loss=0.3869, over 21077.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1459, cr_loss=0.3705, over 4084166.44 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:09:42,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.248e+02 2.374e+02 2.559e+02 5.193e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 03:10:10,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=12.0 2024-09-18 03:10:52,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=699555.6666666666, ans=0.125 2024-09-18 03:10:53,662 INFO [train.py:1198] (1/2) Epoch 39, batch 4050, loss[loss=0.2315, ctc_loss=0.1531, cr_loss=0.3919, over 20819.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3708, over 4084971.08 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:11:34,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-18 03:12:08,941 INFO [train.py:1198] (1/2) Epoch 39, batch 4100, loss[loss=0.2191, ctc_loss=0.1457, cr_loss=0.3668, over 21010.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3716, over 4073261.61 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:12:16,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.262e+02 2.394e+02 2.590e+02 4.150e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 03:12:30,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=699725.6666666666, ans=0.125 2024-09-18 03:12:50,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=699754.0, ans=0.125 2024-09-18 03:12:51,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699754.0, ans=0.1 2024-09-18 03:12:56,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=699782.3333333334, ans=0.025 2024-09-18 03:12:56,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=699782.3333333334, ans=0.0 2024-09-18 03:13:25,246 INFO [train.py:1198] (1/2) Epoch 39, batch 4150, loss[loss=0.1723, ctc_loss=0.1087, cr_loss=0.3176, over 20935.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3696, over 4095075.76 frames. ], batch size: 49, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:13:33,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-09-18 03:13:45,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=699867.3333333334, ans=0.2 2024-09-18 03:14:35,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=699952.3333333334, ans=0.0 2024-09-18 03:14:44,750 INFO [train.py:1198] (1/2) Epoch 39, batch 4200, loss[loss=0.1867, ctc_loss=0.1208, cr_loss=0.3295, over 21023.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3689, over 4099250.32 frames. ], batch size: 52, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:14:47,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-18 03:14:49,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=699980.6666666666, ans=0.2 2024-09-18 03:14:52,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.240e+02 2.333e+02 2.497e+02 6.454e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 03:14:52,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-09-18 03:15:19,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=700037.3333333334, ans=0.025 2024-09-18 03:15:20,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=700037.3333333334, ans=0.125 2024-09-18 03:15:22,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=700037.3333333334, ans=0.125 2024-09-18 03:15:23,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700037.3333333334, ans=0.125 2024-09-18 03:15:38,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=700065.6666666666, ans=0.125 2024-09-18 03:16:03,697 INFO [train.py:1198] (1/2) Epoch 39, batch 4250, loss[loss=0.218, ctc_loss=0.1432, cr_loss=0.3738, over 20761.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3688, over 4099589.89 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:16:23,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=700150.6666666666, ans=0.0 2024-09-18 03:16:34,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-09-18 03:16:52,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=700207.3333333334, ans=0.125 2024-09-18 03:16:57,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=700207.3333333334, ans=0.0 2024-09-18 03:17:00,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700207.3333333334, ans=0.1 2024-09-18 03:17:00,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.11 vs. limit=6.0 2024-09-18 03:17:19,402 INFO [train.py:1198] (1/2) Epoch 39, batch 4300, loss[loss=0.1886, ctc_loss=0.1226, cr_loss=0.3299, over 20982.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3689, over 4096195.44 frames. ], batch size: 49, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:17:28,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.207e+02 2.362e+02 2.524e+02 3.240e+02, threshold=4.724e+02, percent-clipped=0.0 2024-09-18 03:17:33,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=700292.3333333334, ans=0.0 2024-09-18 03:18:12,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=700349.0, ans=0.125 2024-09-18 03:18:20,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=700377.3333333334, ans=0.0 2024-09-18 03:18:31,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-18 03:18:34,993 INFO [train.py:1198] (1/2) Epoch 39, batch 4350, loss[loss=0.2067, ctc_loss=0.1373, cr_loss=0.3468, over 20775.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3691, over 4096424.09 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:18:57,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-09-18 03:19:07,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=700462.3333333334, ans=0.0 2024-09-18 03:19:08,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=700462.3333333334, ans=0.125 2024-09-18 03:19:53,960 INFO [train.py:1198] (1/2) Epoch 39, batch 4400, loss[loss=0.2284, ctc_loss=0.1495, cr_loss=0.3944, over 20829.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3702, over 4098233.48 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:19:55,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=700547.3333333334, ans=0.0 2024-09-18 03:20:03,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.195e+02 2.398e+02 2.587e+02 4.163e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 03:20:14,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=700575.6666666666, ans=0.2 2024-09-18 03:20:25,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-09-18 03:20:47,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=700632.3333333334, ans=0.025 2024-09-18 03:20:55,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=700660.6666666666, ans=0.05 2024-09-18 03:21:08,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=700689.0, ans=0.0 2024-09-18 03:21:10,109 INFO [train.py:1198] (1/2) Epoch 39, batch 4450, loss[loss=0.2055, ctc_loss=0.1359, cr_loss=0.348, over 20979.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1459, cr_loss=0.3718, over 4103800.29 frames. ], batch size: 51, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:21:25,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=700717.3333333334, ans=0.0 2024-09-18 03:22:01,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=700774.0, ans=0.05 2024-09-18 03:22:13,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=700802.3333333334, ans=0.0 2024-09-18 03:22:18,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=700802.3333333334, ans=0.125 2024-09-18 03:22:27,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=700830.6666666666, ans=0.125 2024-09-18 03:22:27,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=700830.6666666666, ans=0.2 2024-09-18 03:22:29,026 INFO [train.py:1198] (1/2) Epoch 39, batch 4500, loss[loss=0.2265, ctc_loss=0.1518, cr_loss=0.3734, over 21082.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3725, over 4098973.41 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:22:38,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.206e+02 2.349e+02 2.484e+02 3.264e+02, threshold=4.698e+02, percent-clipped=0.0 2024-09-18 03:22:53,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=700859.0, ans=0.2 2024-09-18 03:23:04,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=700887.3333333334, ans=0.0 2024-09-18 03:23:13,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=700915.6666666666, ans=0.125 2024-09-18 03:23:33,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=700944.0, ans=10.0 2024-09-18 03:23:34,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=8.0 2024-09-18 03:23:36,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=700944.0, ans=0.0 2024-09-18 03:23:45,222 INFO [train.py:1198] (1/2) Epoch 39, batch 4550, loss[loss=0.2007, ctc_loss=0.1313, cr_loss=0.3473, over 20913.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1464, cr_loss=0.3721, over 4092347.07 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:23:48,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=700972.3333333334, ans=0.125 2024-09-18 03:23:54,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=700972.3333333334, ans=0.0 2024-09-18 03:24:20,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=701029.0, ans=0.0 2024-09-18 03:24:23,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=701029.0, ans=0.05 2024-09-18 03:24:48,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=701085.6666666666, ans=0.125 2024-09-18 03:25:03,850 INFO [train.py:1198] (1/2) Epoch 39, batch 4600, loss[loss=0.2331, ctc_loss=0.1578, cr_loss=0.3769, over 20682.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3729, over 4075647.37 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:25:12,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.230e+02 2.365e+02 2.513e+02 5.047e+02, threshold=4.730e+02, percent-clipped=1.0 2024-09-18 03:25:15,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-09-18 03:25:44,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=701170.6666666666, ans=0.125 2024-09-18 03:25:55,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=701199.0, ans=0.025 2024-09-18 03:25:58,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-18 03:26:07,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-18 03:26:20,505 INFO [train.py:1198] (1/2) Epoch 39, batch 4650, loss[loss=0.1949, ctc_loss=0.1273, cr_loss=0.3385, over 21065.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3727, over 4093283.53 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:26:54,134 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:26:57,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-18 03:27:09,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2024-09-18 03:27:37,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701397.3333333334, ans=0.1 2024-09-18 03:27:39,051 INFO [train.py:1198] (1/2) Epoch 39, batch 4700, loss[loss=0.2158, ctc_loss=0.1447, cr_loss=0.3556, over 20776.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1475, cr_loss=0.374, over 4087228.58 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:27:48,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.206e+02 2.297e+02 2.510e+02 5.197e+02, threshold=4.593e+02, percent-clipped=1.0 2024-09-18 03:28:04,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=701425.6666666666, ans=0.2 2024-09-18 03:28:33,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-09-18 03:28:40,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=701510.6666666666, ans=0.125 2024-09-18 03:28:46,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=701510.6666666666, ans=0.07 2024-09-18 03:28:55,406 INFO [train.py:1198] (1/2) Epoch 39, batch 4750, loss[loss=0.1891, ctc_loss=0.1234, cr_loss=0.3285, over 21010.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.3732, over 4086463.98 frames. ], batch size: 48, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:29:29,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=701595.6666666666, ans=0.0 2024-09-18 03:29:50,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=701624.0, ans=0.125 2024-09-18 03:30:11,534 INFO [train.py:1198] (1/2) Epoch 39, batch 4800, loss[loss=0.249, ctc_loss=0.168, cr_loss=0.4053, over 20698.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1473, cr_loss=0.3735, over 4082248.67 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:30:20,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.185e+02 2.330e+02 2.510e+02 3.666e+02, threshold=4.660e+02, percent-clipped=0.0 2024-09-18 03:30:37,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=701709.0, ans=0.0 2024-09-18 03:30:48,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=701737.3333333334, ans=0.0 2024-09-18 03:31:15,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=701794.0, ans=0.125 2024-09-18 03:31:29,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-09-18 03:31:30,347 INFO [train.py:1198] (1/2) Epoch 39, batch 4850, loss[loss=0.2402, ctc_loss=0.1657, cr_loss=0.3727, over 14675.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1476, cr_loss=0.374, over 4072928.69 frames. ], batch size: 149, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:31:41,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-09-18 03:32:25,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=701907.3333333334, ans=0.0 2024-09-18 03:32:46,083 INFO [train.py:1198] (1/2) Epoch 39, batch 4900, loss[loss=0.217, ctc_loss=0.143, cr_loss=0.3703, over 20931.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1477, cr_loss=0.3744, over 4076369.78 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:32:46,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701964.0, ans=0.125 2024-09-18 03:32:55,013 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.209e+02 2.353e+02 2.553e+02 4.303e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 03:33:26,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=702020.6666666666, ans=0.125 2024-09-18 03:33:50,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=702077.3333333334, ans=0.0 2024-09-18 03:34:03,786 INFO [train.py:1198] (1/2) Epoch 39, batch 4950, loss[loss=0.2461, ctc_loss=0.1641, cr_loss=0.4099, over 20866.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3722, over 4086467.40 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:34:46,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=702190.6666666666, ans=0.95 2024-09-18 03:35:13,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-09-18 03:35:15,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=702219.0, ans=0.125 2024-09-18 03:35:17,868 INFO [train.py:1198] (1/2) Epoch 39, batch 5000, loss[loss=0.2608, ctc_loss=0.1755, cr_loss=0.4267, over 18334.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1464, cr_loss=0.3721, over 4086623.25 frames. ], batch size: 108, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:35:28,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.209e+02 2.333e+02 2.459e+02 3.408e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 03:35:36,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=702275.6666666666, ans=0.125 2024-09-18 03:36:01,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=702332.3333333334, ans=0.07 2024-09-18 03:36:10,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=702332.3333333334, ans=0.0 2024-09-18 03:36:16,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=702360.6666666666, ans=0.0 2024-09-18 03:36:32,970 INFO [train.py:1198] (1/2) Epoch 39, batch 5050, loss[loss=0.2011, ctc_loss=0.131, cr_loss=0.3506, over 21074.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1462, cr_loss=0.3721, over 4086381.28 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:36:35,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2024-09-18 03:36:39,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-18 03:37:34,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-18 03:37:39,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702502.3333333334, ans=0.1 2024-09-18 03:37:46,919 INFO [train.py:1198] (1/2) Epoch 39, batch 5100, loss[loss=0.1657, ctc_loss=0.1086, cr_loss=0.2854, over 20983.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1463, cr_loss=0.3723, over 4095728.73 frames. ], batch size: 51, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:37:50,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=702530.6666666666, ans=0.125 2024-09-18 03:37:57,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.244e+02 2.345e+02 2.463e+02 3.614e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 03:38:21,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=702587.3333333334, ans=0.125 2024-09-18 03:38:30,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=702615.6666666666, ans=0.2 2024-09-18 03:38:45,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=702644.0, ans=0.2 2024-09-18 03:39:05,352 INFO [train.py:1198] (1/2) Epoch 39, batch 5150, loss[loss=0.1814, ctc_loss=0.1171, cr_loss=0.3214, over 21063.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3702, over 4096815.81 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:39:20,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=702700.6666666666, ans=0.0 2024-09-18 03:40:01,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-18 03:40:19,914 INFO [train.py:1198] (1/2) Epoch 39, batch 5200, loss[loss=0.2201, ctc_loss=0.1442, cr_loss=0.3794, over 20969.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3709, over 4098010.67 frames. ], batch size: 51, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:40:24,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702814.0, ans=0.125 2024-09-18 03:40:27,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=702814.0, ans=0.0 2024-09-18 03:40:30,148 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.182e+02 2.339e+02 2.493e+02 7.446e+02, threshold=4.678e+02, percent-clipped=1.0 2024-09-18 03:40:47,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=702842.3333333334, ans=15.0 2024-09-18 03:41:34,877 INFO [train.py:1198] (1/2) Epoch 39, batch 5250, loss[loss=0.2209, ctc_loss=0.1473, cr_loss=0.368, over 20116.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3696, over 4100323.69 frames. ], batch size: 80, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:42:08,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703012.3333333334, ans=0.125 2024-09-18 03:42:49,577 INFO [train.py:1198] (1/2) Epoch 39, batch 5300, loss[loss=0.2206, ctc_loss=0.146, cr_loss=0.3734, over 21028.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3692, over 4104799.09 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:42:59,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.164e+02 2.284e+02 2.448e+02 4.598e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-18 03:43:20,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703154.0, ans=0.125 2024-09-18 03:43:33,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-18 03:44:02,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=703210.6666666666, ans=0.125 2024-09-18 03:44:07,167 INFO [train.py:1198] (1/2) Epoch 39, batch 5350, loss[loss=0.2234, ctc_loss=0.1446, cr_loss=0.3943, over 20779.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3694, over 4101374.09 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:44:22,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703267.3333333334, ans=0.125 2024-09-18 03:44:28,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=703267.3333333334, ans=0.125 2024-09-18 03:44:43,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=703295.6666666666, ans=0.125 2024-09-18 03:44:43,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=703295.6666666666, ans=0.125 2024-09-18 03:44:59,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=703324.0, ans=0.125 2024-09-18 03:45:21,226 INFO [train.py:1198] (1/2) Epoch 39, batch 5400, loss[loss=0.2239, ctc_loss=0.152, cr_loss=0.3596, over 21076.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3701, over 4106507.62 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:45:26,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2024-09-18 03:45:31,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.226e+02 2.356e+02 2.492e+02 2.774e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-18 03:45:33,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=703380.6666666666, ans=0.125 2024-09-18 03:45:49,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-18 03:45:51,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=703437.3333333334, ans=0.125 2024-09-18 03:46:08,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=703465.6666666666, ans=0.125 2024-09-18 03:46:11,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-09-18 03:46:17,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=703465.6666666666, ans=0.125 2024-09-18 03:46:34,976 INFO [train.py:1198] (1/2) Epoch 39, batch 5450, loss[loss=0.2162, ctc_loss=0.1402, cr_loss=0.3801, over 20965.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3707, over 4094113.27 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:46:47,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=703522.3333333334, ans=0.125 2024-09-18 03:46:48,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703550.6666666666, ans=0.125 2024-09-18 03:47:00,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=703550.6666666666, ans=0.125 2024-09-18 03:47:06,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=703579.0, ans=0.125 2024-09-18 03:47:33,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=703635.6666666666, ans=0.0 2024-09-18 03:47:49,285 INFO [train.py:1198] (1/2) Epoch 39, batch 5500, loss[loss=0.2492, ctc_loss=0.1655, cr_loss=0.4184, over 20845.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3712, over 4096664.23 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 16.0 2024-09-18 03:48:01,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.242e+02 2.361e+02 2.514e+02 4.210e+02, threshold=4.721e+02, percent-clipped=0.0 2024-09-18 03:48:01,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=703664.0, ans=0.2 2024-09-18 03:49:06,409 INFO [train.py:1198] (1/2) Epoch 39, batch 5550, loss[loss=0.2143, ctc_loss=0.1438, cr_loss=0.3522, over 20681.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3705, over 4102524.62 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 16.0 2024-09-18 03:49:29,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:50:03,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.594e-03 2024-09-18 03:50:21,067 INFO [train.py:1198] (1/2) Epoch 39, batch 5600, loss[loss=0.2021, ctc_loss=0.1328, cr_loss=0.3466, over 20984.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3717, over 4099059.50 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:50:32,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.226e+02 2.335e+02 2.461e+02 3.574e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-18 03:51:02,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=704004.0, ans=0.2 2024-09-18 03:51:22,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=704060.6666666666, ans=0.025 2024-09-18 03:51:24,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 03:51:35,371 INFO [train.py:1198] (1/2) Epoch 39, batch 5650, loss[loss=0.2169, ctc_loss=0.1426, cr_loss=0.3715, over 21055.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3717, over 4098605.07 frames. ], batch size: 62, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:52:03,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=704117.3333333334, ans=0.015 2024-09-18 03:52:37,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=704202.3333333334, ans=0.125 2024-09-18 03:52:52,474 INFO [train.py:1198] (1/2) Epoch 39, batch 5700, loss[loss=0.2281, ctc_loss=0.1517, cr_loss=0.3823, over 20799.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3715, over 4105577.20 frames. ], batch size: 53, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:53:04,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.246e+02 2.344e+02 2.590e+02 3.010e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 03:53:19,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=704259.0, ans=0.125 2024-09-18 03:53:23,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704287.3333333334, ans=0.1 2024-09-18 03:53:23,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704287.3333333334, ans=0.1 2024-09-18 03:54:05,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704372.3333333334, ans=0.125 2024-09-18 03:54:06,435 INFO [train.py:1198] (1/2) Epoch 39, batch 5750, loss[loss=0.1566, ctc_loss=0.1014, cr_loss=0.2758, over 20962.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1463, cr_loss=0.3718, over 4092168.85 frames. ], batch size: 48, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:55:21,048 INFO [train.py:1198] (1/2) Epoch 39, batch 5800, loss[loss=0.2401, ctc_loss=0.1595, cr_loss=0.4031, over 20995.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3718, over 4099593.48 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:55:32,901 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.213e+02 2.387e+02 2.552e+02 3.474e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-18 03:55:54,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704570.6666666666, ans=0.1 2024-09-18 03:56:07,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=704599.0, ans=0.125 2024-09-18 03:56:34,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704655.6666666666, ans=0.125 2024-09-18 03:56:35,805 INFO [train.py:1198] (1/2) Epoch 39, batch 5850, loss[loss=0.248, ctc_loss=0.1661, cr_loss=0.4095, over 20219.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1465, cr_loss=0.3728, over 4101496.47 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:56:40,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=704655.6666666666, ans=0.2 2024-09-18 03:56:59,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=704684.0, ans=0.125 2024-09-18 03:57:15,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=704712.3333333334, ans=0.0 2024-09-18 03:57:35,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=704740.6666666666, ans=0.125 2024-09-18 03:57:52,789 INFO [train.py:1198] (1/2) Epoch 39, batch 5900, loss[loss=0.2195, ctc_loss=0.1446, cr_loss=0.3745, over 20992.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1469, cr_loss=0.3728, over 4092293.23 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:58:04,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.199e+02 2.339e+02 2.534e+02 6.800e+02, threshold=4.678e+02, percent-clipped=1.0 2024-09-18 03:58:06,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=704825.6666666666, ans=0.07 2024-09-18 03:58:20,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=704825.6666666666, ans=0.125 2024-09-18 03:58:22,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=704854.0, ans=0.125 2024-09-18 03:58:54,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=704910.6666666666, ans=0.2 2024-09-18 03:58:57,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=704910.6666666666, ans=0.125 2024-09-18 03:59:07,433 INFO [train.py:1198] (1/2) Epoch 39, batch 5950, loss[loss=0.2271, ctc_loss=0.1483, cr_loss=0.3938, over 20782.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1466, cr_loss=0.3724, over 4095897.66 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:59:07,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=704939.0, ans=0.0 2024-09-18 03:59:11,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2024-09-18 03:59:25,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=704967.3333333334, ans=0.125 2024-09-18 03:59:37,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=704995.6666666666, ans=0.125 2024-09-18 04:00:06,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705052.3333333334, ans=0.1 2024-09-18 04:00:24,424 INFO [train.py:1198] (1/2) Epoch 39, batch 6000, loss[loss=0.2217, ctc_loss=0.1435, cr_loss=0.3907, over 21053.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1463, cr_loss=0.3718, over 4103887.64 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:00:24,425 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 04:00:46,791 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.03967, ctc_loss=0.03967, cr_loss=1.424e-14, over 944034.00 frames. 2024-09-18 04:00:46,792 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 04:00:58,684 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.194e+02 2.300e+02 2.472e+02 3.542e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-18 04:01:00,538 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:01:24,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=705137.3333333334, ans=0.2 2024-09-18 04:01:49,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=705194.0, ans=0.125 2024-09-18 04:01:50,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-09-18 04:02:02,081 INFO [train.py:1198] (1/2) Epoch 39, batch 6050, loss[loss=0.2051, ctc_loss=0.1355, cr_loss=0.3478, over 20887.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3721, over 4104876.49 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:02:08,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2024-09-18 04:02:09,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705222.3333333334, ans=0.125 2024-09-18 04:02:12,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=705222.3333333334, ans=0.2 2024-09-18 04:02:24,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=705250.6666666666, ans=0.0 2024-09-18 04:02:28,136 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:02:55,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-18 04:03:16,786 INFO [train.py:1198] (1/2) Epoch 39, batch 6100, loss[loss=0.2449, ctc_loss=0.1633, cr_loss=0.4078, over 21003.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.146, cr_loss=0.3719, over 4096570.85 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:03:28,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.204e+02 2.305e+02 2.440e+02 4.328e+02, threshold=4.610e+02, percent-clipped=0.0 2024-09-18 04:04:30,161 INFO [train.py:1198] (1/2) Epoch 39, batch 6150, loss[loss=0.2462, ctc_loss=0.166, cr_loss=0.401, over 20674.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3711, over 4074899.26 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:05:35,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=705619.0, ans=0.2 2024-09-18 04:05:44,838 INFO [train.py:1198] (1/2) Epoch 39, batch 6200, loss[loss=0.2128, ctc_loss=0.1425, cr_loss=0.3513, over 20667.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3708, over 4054731.24 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:05:56,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.246e+02 2.417e+02 2.528e+02 7.333e+02, threshold=4.835e+02, percent-clipped=2.0 2024-09-18 04:05:58,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705675.6666666666, ans=0.1 2024-09-18 04:06:46,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=705760.6666666666, ans=0.2 2024-09-18 04:06:58,460 INFO [train.py:1198] (1/2) Epoch 39, batch 6250, loss[loss=0.2275, ctc_loss=0.149, cr_loss=0.3926, over 20672.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.369, over 4056484.79 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:07:28,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=22.5 2024-09-18 04:08:00,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=22.5 2024-09-18 04:08:05,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-18 04:08:12,584 INFO [train.py:1198] (1/2) Epoch 39, batch 6300, loss[loss=0.2569, ctc_loss=0.1809, cr_loss=0.3802, over 14867.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1456, cr_loss=0.3686, over 4012975.39 frames. ], batch size: 150, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:08:24,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.183e+02 2.396e+02 2.589e+02 3.527e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 04:08:41,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2024-09-18 04:08:46,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=705987.3333333334, ans=0.025 2024-09-18 04:08:56,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-09-18 04:08:57,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=706015.6666666666, ans=0.125 2024-09-18 04:09:14,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=706044.0, ans=0.125 2024-09-18 04:09:19,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706044.0, ans=0.1 2024-09-18 04:09:24,983 INFO [train.py:1198] (1/2) Epoch 39, batch 6350, loss[loss=0.2687, ctc_loss=0.1907, cr_loss=0.3902, over 14208.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1487, cr_loss=0.3689, over 3830453.53 frames. ], batch size: 150, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:09:36,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=706072.3333333334, ans=0.0 2024-09-18 04:09:46,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=706100.6666666666, ans=0.2 2024-09-18 04:09:50,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=706100.6666666666, ans=0.04949747468305833 2024-09-18 04:09:55,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=12.0 2024-09-18 04:09:56,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=12.0 2024-09-18 04:10:11,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=706157.3333333334, ans=0.2 2024-09-18 04:11:13,987 INFO [train.py:1198] (1/2) Epoch 40, batch 0, loss[loss=0.2509, ctc_loss=0.166, cr_loss=0.4241, over 20839.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.166, cr_loss=0.4241, over 20839.00 frames. ], batch size: 65, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:11:13,987 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 04:11:32,440 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.03941, ctc_loss=0.03941, cr_loss=1.447e-14, over 944034.00 frames. 2024-09-18 04:11:32,440 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 04:11:35,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=706188.5, ans=0.0 2024-09-18 04:11:40,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-18 04:11:58,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.572e+02 2.820e+02 3.038e+02 4.034e+02, threshold=5.640e+02, percent-clipped=0.0 2024-09-18 04:12:05,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-18 04:12:06,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-09-18 04:12:10,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706245.1666666666, ans=0.125 2024-09-18 04:12:33,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=706301.8333333334, ans=0.0 2024-09-18 04:12:48,444 INFO [train.py:1198] (1/2) Epoch 40, batch 50, loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3667, over 20945.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1421, cr_loss=0.3648, over 909445.20 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:12:58,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-18 04:13:03,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=706358.5, ans=0.0 2024-09-18 04:13:29,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=706386.8333333334, ans=0.05 2024-09-18 04:13:30,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706386.8333333334, ans=0.125 2024-09-18 04:13:53,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-18 04:14:03,778 INFO [train.py:1198] (1/2) Epoch 40, batch 100, loss[loss=0.2304, ctc_loss=0.1542, cr_loss=0.3814, over 20757.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1427, cr_loss=0.3659, over 1613891.37 frames. ], batch size: 71, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:14:14,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=706471.8333333334, ans=0.125 2024-09-18 04:14:14,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-09-18 04:14:19,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:14:29,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.168e+02 2.313e+02 2.529e+02 4.230e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-18 04:14:35,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706528.5, ans=0.125 2024-09-18 04:15:08,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=706585.1666666666, ans=0.125 2024-09-18 04:15:13,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=706585.1666666666, ans=0.2 2024-09-18 04:15:13,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=706585.1666666666, ans=0.0 2024-09-18 04:15:16,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-18 04:15:17,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=706613.5, ans=0.0 2024-09-18 04:15:19,129 INFO [train.py:1198] (1/2) Epoch 40, batch 150, loss[loss=0.2105, ctc_loss=0.1384, cr_loss=0.3607, over 20981.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.143, cr_loss=0.3663, over 2162713.28 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:16:17,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=706698.5, ans=0.025 2024-09-18 04:16:35,073 INFO [train.py:1198] (1/2) Epoch 40, batch 200, loss[loss=0.198, ctc_loss=0.1291, cr_loss=0.3441, over 21078.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1434, cr_loss=0.3669, over 2581534.93 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:16:39,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706755.1666666666, ans=0.125 2024-09-18 04:16:43,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-18 04:17:07,689 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.212e+02 2.348e+02 2.520e+02 6.792e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-18 04:17:50,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=706868.5, ans=0.025 2024-09-18 04:17:55,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=706896.8333333334, ans=0.125 2024-09-18 04:17:57,067 INFO [train.py:1198] (1/2) Epoch 40, batch 250, loss[loss=0.2261, ctc_loss=0.148, cr_loss=0.3906, over 20963.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3692, over 2922710.03 frames. ], batch size: 64, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:18:03,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=706896.8333333334, ans=0.04949747468305833 2024-09-18 04:18:21,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=706925.1666666666, ans=0.125 2024-09-18 04:18:23,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=706925.1666666666, ans=22.5 2024-09-18 04:18:44,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=706981.8333333334, ans=0.0 2024-09-18 04:18:59,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-09-18 04:19:12,700 INFO [train.py:1198] (1/2) Epoch 40, batch 300, loss[loss=0.1984, ctc_loss=0.1295, cr_loss=0.3445, over 20871.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3681, over 3185072.38 frames. ], batch size: 57, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:19:19,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707038.5, ans=0.125 2024-09-18 04:19:21,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-09-18 04:19:39,692 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.195e+02 2.334e+02 2.503e+02 3.189e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 04:19:45,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-18 04:19:49,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=707095.1666666666, ans=0.025 2024-09-18 04:20:27,788 INFO [train.py:1198] (1/2) Epoch 40, batch 350, loss[loss=0.1811, ctc_loss=0.1178, cr_loss=0.3165, over 20972.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3695, over 3395604.80 frames. ], batch size: 49, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:21:06,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=707236.8333333334, ans=0.0 2024-09-18 04:21:15,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707265.1666666666, ans=0.0 2024-09-18 04:21:15,376 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:21:28,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707293.5, ans=0.1 2024-09-18 04:21:33,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=707293.5, ans=0.025 2024-09-18 04:21:43,725 INFO [train.py:1198] (1/2) Epoch 40, batch 400, loss[loss=0.2484, ctc_loss=0.166, cr_loss=0.4121, over 20242.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3701, over 3543137.28 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:21:50,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=22.5 2024-09-18 04:22:11,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.212e+02 2.334e+02 2.468e+02 5.090e+02, threshold=4.668e+02, percent-clipped=1.0 2024-09-18 04:22:23,749 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:22:34,711 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:22:50,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=707435.1666666666, ans=0.125 2024-09-18 04:23:05,519 INFO [train.py:1198] (1/2) Epoch 40, batch 450, loss[loss=0.2356, ctc_loss=0.1554, cr_loss=0.4012, over 21033.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3706, over 3671796.96 frames. ], batch size: 61, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:23:49,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=707548.5, ans=0.0 2024-09-18 04:24:19,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707605.1666666666, ans=0.125 2024-09-18 04:24:20,689 INFO [train.py:1198] (1/2) Epoch 40, batch 500, loss[loss=0.2478, ctc_loss=0.1621, cr_loss=0.4282, over 20946.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1455, cr_loss=0.3719, over 3760498.26 frames. ], batch size: 64, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:24:30,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=707605.1666666666, ans=0.0 2024-09-18 04:24:47,899 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.249e+02 2.379e+02 2.495e+02 3.005e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 04:24:48,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=707633.5, ans=0.2 2024-09-18 04:24:54,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707661.8333333334, ans=0.125 2024-09-18 04:24:54,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-18 04:25:00,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707661.8333333334, ans=0.125 2024-09-18 04:25:03,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707661.8333333334, ans=0.1 2024-09-18 04:25:15,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=707690.1666666666, ans=0.0 2024-09-18 04:25:21,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-18 04:25:27,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2024-09-18 04:25:28,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=707718.5, ans=10.0 2024-09-18 04:25:35,980 INFO [train.py:1198] (1/2) Epoch 40, batch 550, loss[loss=0.2146, ctc_loss=0.1373, cr_loss=0.3863, over 21060.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3718, over 3816945.77 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:25:41,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=707746.8333333334, ans=0.125 2024-09-18 04:25:55,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-18 04:25:57,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707775.1666666666, ans=0.1 2024-09-18 04:26:05,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=707803.5, ans=0.125 2024-09-18 04:26:12,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707803.5, ans=0.1 2024-09-18 04:26:20,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=707831.8333333334, ans=12.0 2024-09-18 04:26:32,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707831.8333333334, ans=0.1 2024-09-18 04:26:40,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-18 04:26:44,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=707860.1666666666, ans=0.125 2024-09-18 04:26:51,346 INFO [train.py:1198] (1/2) Epoch 40, batch 600, loss[loss=0.2042, ctc_loss=0.1348, cr_loss=0.3472, over 21039.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1464, cr_loss=0.3726, over 3876947.64 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:27:18,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.210e+02 2.343e+02 2.499e+02 4.528e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-18 04:27:54,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2024-09-18 04:28:05,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=708030.1666666666, ans=0.125 2024-09-18 04:28:09,079 INFO [train.py:1198] (1/2) Epoch 40, batch 650, loss[loss=0.2531, ctc_loss=0.1781, cr_loss=0.3754, over 13712.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.372, over 3919892.29 frames. ], batch size: 150, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:28:10,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708030.1666666666, ans=0.1 2024-09-18 04:28:15,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=708030.1666666666, ans=0.025 2024-09-18 04:28:30,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708058.5, ans=0.1 2024-09-18 04:28:32,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=708058.5, ans=0.04949747468305833 2024-09-18 04:28:55,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-18 04:29:09,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=708115.1666666666, ans=0.125 2024-09-18 04:29:25,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-09-18 04:29:27,615 INFO [train.py:1198] (1/2) Epoch 40, batch 700, loss[loss=0.2563, ctc_loss=0.1745, cr_loss=0.409, over 20694.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3722, over 3959465.80 frames. ], batch size: 71, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:29:55,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.228e+02 2.397e+02 2.537e+02 4.140e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 04:30:15,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-18 04:30:19,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708256.8333333334, ans=0.1 2024-09-18 04:30:43,123 INFO [train.py:1198] (1/2) Epoch 40, batch 750, loss[loss=0.2213, ctc_loss=0.1477, cr_loss=0.3678, over 20946.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3699, over 3996581.91 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:30:45,144 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:30:58,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=708341.8333333334, ans=0.125 2024-09-18 04:31:30,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=708398.5, ans=0.125 2024-09-18 04:31:46,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=708426.8333333334, ans=0.125 2024-09-18 04:31:59,325 INFO [train.py:1198] (1/2) Epoch 40, batch 800, loss[loss=0.2485, ctc_loss=0.1667, cr_loss=0.409, over 18617.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3687, over 4024379.65 frames. ], batch size: 108, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:32:25,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.189e+02 2.320e+02 2.492e+02 3.200e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 04:32:35,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=708511.8333333334, ans=0.0 2024-09-18 04:32:47,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=708540.1666666666, ans=0.125 2024-09-18 04:33:08,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=708568.5, ans=0.125 2024-09-18 04:33:13,981 INFO [train.py:1198] (1/2) Epoch 40, batch 850, loss[loss=0.2522, ctc_loss=0.1709, cr_loss=0.4064, over 19942.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3685, over 4021699.66 frames. ], batch size: 80, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:33:18,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708596.8333333334, ans=0.1 2024-09-18 04:33:26,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=708596.8333333334, ans=0.0 2024-09-18 04:33:30,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=8.0 2024-09-18 04:33:32,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=708625.1666666666, ans=0.2 2024-09-18 04:33:56,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=708653.5, ans=0.125 2024-09-18 04:34:18,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=708681.8333333334, ans=0.025 2024-09-18 04:34:21,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-18 04:34:35,864 INFO [train.py:1198] (1/2) Epoch 40, batch 900, loss[loss=0.2195, ctc_loss=0.1449, cr_loss=0.3733, over 21011.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.145, cr_loss=0.3693, over 4037352.53 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:34:53,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708766.8333333334, ans=0.1 2024-09-18 04:35:00,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=12.0 2024-09-18 04:35:03,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.244e+02 2.376e+02 2.600e+02 3.926e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-18 04:35:05,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708795.1666666666, ans=0.1 2024-09-18 04:35:11,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=708795.1666666666, ans=0.125 2024-09-18 04:35:14,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=708795.1666666666, ans=0.125 2024-09-18 04:35:41,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=708851.8333333334, ans=0.0 2024-09-18 04:35:50,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=708880.1666666666, ans=0.0 2024-09-18 04:35:52,004 INFO [train.py:1198] (1/2) Epoch 40, batch 950, loss[loss=0.2155, ctc_loss=0.1406, cr_loss=0.3746, over 21011.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3701, over 4063894.00 frames. ], batch size: 61, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:36:13,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=708908.5, ans=0.0 2024-09-18 04:36:20,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=708936.8333333334, ans=0.0 2024-09-18 04:36:31,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=12.0 2024-09-18 04:36:39,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=708965.1666666666, ans=0.09899494936611666 2024-09-18 04:36:56,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=708993.5, ans=0.025 2024-09-18 04:37:03,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=708993.5, ans=0.125 2024-09-18 04:37:07,758 INFO [train.py:1198] (1/2) Epoch 40, batch 1000, loss[loss=0.2643, ctc_loss=0.176, cr_loss=0.4411, over 20020.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3699, over 4070923.39 frames. ], batch size: 80, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:37:34,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.169e+02 2.364e+02 2.541e+02 3.218e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 04:37:36,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-18 04:38:19,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=22.5 2024-09-18 04:38:23,240 INFO [train.py:1198] (1/2) Epoch 40, batch 1050, loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3686, over 21002.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3691, over 4083127.44 frames. ], batch size: 52, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:38:47,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-18 04:39:04,647 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:39:38,835 INFO [train.py:1198] (1/2) Epoch 40, batch 1100, loss[loss=0.2313, ctc_loss=0.1533, cr_loss=0.3898, over 20771.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4093436.39 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:39:40,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=709305.1666666666, ans=0.0 2024-09-18 04:40:12,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.194e+02 2.309e+02 2.464e+02 3.691e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-18 04:40:18,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=709361.8333333334, ans=0.0 2024-09-18 04:41:00,939 INFO [train.py:1198] (1/2) Epoch 40, batch 1150, loss[loss=0.2656, ctc_loss=0.1847, cr_loss=0.4042, over 14456.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3708, over 4089700.59 frames. ], batch size: 150, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:41:48,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=709531.8333333334, ans=0.125 2024-09-18 04:42:06,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-18 04:42:10,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=709560.1666666666, ans=0.0 2024-09-18 04:42:17,544 INFO [train.py:1198] (1/2) Epoch 40, batch 1200, loss[loss=0.1915, ctc_loss=0.1255, cr_loss=0.3299, over 20969.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3694, over 4104318.65 frames. ], batch size: 50, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:42:44,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.192e+02 2.327e+02 2.524e+02 2.926e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-18 04:42:46,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709645.1666666666, ans=0.0 2024-09-18 04:42:52,617 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:42:55,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709645.1666666666, ans=0.1 2024-09-18 04:43:09,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-18 04:43:13,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709673.5, ans=0.125 2024-09-18 04:43:20,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709701.8333333334, ans=0.1 2024-09-18 04:43:30,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709701.8333333334, ans=0.1 2024-09-18 04:43:33,022 INFO [train.py:1198] (1/2) Epoch 40, batch 1250, loss[loss=0.2268, ctc_loss=0.1495, cr_loss=0.3867, over 20830.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3688, over 4089655.63 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:43:43,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709730.1666666666, ans=0.125 2024-09-18 04:44:06,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709786.8333333334, ans=0.125 2024-09-18 04:44:24,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=709815.1666666666, ans=0.125 2024-09-18 04:44:25,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=709815.1666666666, ans=0.0 2024-09-18 04:44:36,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=709843.5, ans=0.125 2024-09-18 04:44:45,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=709843.5, ans=0.2 2024-09-18 04:44:47,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=709871.8333333334, ans=0.07 2024-09-18 04:44:48,543 INFO [train.py:1198] (1/2) Epoch 40, batch 1300, loss[loss=0.1848, ctc_loss=0.1211, cr_loss=0.3184, over 20971.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.369, over 4093597.63 frames. ], batch size: 48, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:45:00,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=709871.8333333334, ans=0.125 2024-09-18 04:45:15,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.223e+02 2.329e+02 2.563e+02 4.312e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 04:46:10,296 INFO [train.py:1198] (1/2) Epoch 40, batch 1350, loss[loss=0.2586, ctc_loss=0.173, cr_loss=0.4283, over 20644.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3695, over 4105895.68 frames. ], batch size: 68, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:46:12,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=710013.5, ans=0.125 2024-09-18 04:46:12,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=15.0 2024-09-18 04:46:31,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=710041.8333333334, ans=0.125 2024-09-18 04:47:08,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=710098.5, ans=0.125 2024-09-18 04:47:09,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=710126.8333333334, ans=0.0 2024-09-18 04:47:25,607 INFO [train.py:1198] (1/2) Epoch 40, batch 1400, loss[loss=0.2417, ctc_loss=0.1637, cr_loss=0.3898, over 20979.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3704, over 4097460.81 frames. ], batch size: 64, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:47:52,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.249e+02 2.383e+02 2.555e+02 2.990e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 04:48:41,422 INFO [train.py:1198] (1/2) Epoch 40, batch 1450, loss[loss=0.2069, ctc_loss=0.1374, cr_loss=0.3472, over 21042.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.3706, over 4093343.90 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:49:32,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=710381.8333333334, ans=0.125 2024-09-18 04:49:44,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=710410.1666666666, ans=0.125 2024-09-18 04:49:56,512 INFO [train.py:1198] (1/2) Epoch 40, batch 1500, loss[loss=0.2097, ctc_loss=0.1372, cr_loss=0.3625, over 21055.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3718, over 4074657.70 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:50:07,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=710438.5, ans=0.125 2024-09-18 04:50:18,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=710466.8333333334, ans=0.07 2024-09-18 04:50:19,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=710466.8333333334, ans=0.125 2024-09-18 04:50:19,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=710466.8333333334, ans=0.2 2024-09-18 04:50:23,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.185e+02 2.334e+02 2.503e+02 4.735e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-18 04:50:30,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=710495.1666666666, ans=0.125 2024-09-18 04:50:39,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-18 04:51:12,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=710551.8333333334, ans=0.0 2024-09-18 04:51:14,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=710551.8333333334, ans=0.125 2024-09-18 04:51:18,371 INFO [train.py:1198] (1/2) Epoch 40, batch 1550, loss[loss=0.2731, ctc_loss=0.1833, cr_loss=0.449, over 18498.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.147, cr_loss=0.3722, over 4063784.51 frames. ], batch size: 108, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:51:30,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=710580.1666666666, ans=0.125 2024-09-18 04:51:55,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=710636.8333333334, ans=0.125 2024-09-18 04:52:31,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=710693.5, ans=0.125 2024-09-18 04:52:34,368 INFO [train.py:1198] (1/2) Epoch 40, batch 1600, loss[loss=0.2256, ctc_loss=0.1476, cr_loss=0.3903, over 21080.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.3712, over 4084713.82 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:52:47,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=710721.8333333334, ans=0.09899494936611666 2024-09-18 04:52:48,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=710750.1666666666, ans=0.125 2024-09-18 04:53:01,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.237e+02 2.362e+02 2.527e+02 3.476e+02, threshold=4.725e+02, percent-clipped=0.0 2024-09-18 04:53:22,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=710806.8333333334, ans=0.04949747468305833 2024-09-18 04:53:25,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=710806.8333333334, ans=0.125 2024-09-18 04:53:29,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-09-18 04:53:32,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=710806.8333333334, ans=0.125 2024-09-18 04:53:48,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=710863.5, ans=0.0 2024-09-18 04:53:50,214 INFO [train.py:1198] (1/2) Epoch 40, batch 1650, loss[loss=0.2396, ctc_loss=0.1574, cr_loss=0.4106, over 20869.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1469, cr_loss=0.3733, over 4080656.00 frames. ], batch size: 65, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:53:57,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=22.5 2024-09-18 04:54:45,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=710948.5, ans=0.2 2024-09-18 04:55:03,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=710976.8333333334, ans=0.125 2024-09-18 04:55:04,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=12.0 2024-09-18 04:55:06,558 INFO [train.py:1198] (1/2) Epoch 40, batch 1700, loss[loss=0.2191, ctc_loss=0.1451, cr_loss=0.3697, over 20810.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3721, over 4088050.79 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:55:29,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711033.5, ans=0.0 2024-09-18 04:55:33,250 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.182e+02 2.337e+02 2.536e+02 4.066e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-18 04:56:12,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=711118.5, ans=0.125 2024-09-18 04:56:14,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=711118.5, ans=0.125 2024-09-18 04:56:21,750 INFO [train.py:1198] (1/2) Epoch 40, batch 1750, loss[loss=0.1935, ctc_loss=0.1243, cr_loss=0.346, over 20928.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3725, over 4087120.94 frames. ], batch size: 49, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:56:40,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-09-18 04:56:47,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=711175.1666666666, ans=0.0 2024-09-18 04:56:53,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=711175.1666666666, ans=0.2 2024-09-18 04:56:59,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711203.5, ans=0.1 2024-09-18 04:57:01,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=711203.5, ans=0.125 2024-09-18 04:57:04,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=711203.5, ans=0.0 2024-09-18 04:57:17,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-18 04:57:18,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711231.8333333334, ans=0.1 2024-09-18 04:57:21,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711231.8333333334, ans=0.0 2024-09-18 04:57:24,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=711231.8333333334, ans=0.125 2024-09-18 04:57:43,901 INFO [train.py:1198] (1/2) Epoch 40, batch 1800, loss[loss=0.237, ctc_loss=0.1573, cr_loss=0.3989, over 20841.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3709, over 4084103.27 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:58:06,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=711316.8333333334, ans=0.0 2024-09-18 04:58:07,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=711316.8333333334, ans=0.0 2024-09-18 04:58:10,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-18 04:58:11,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.231e+02 2.387e+02 2.606e+02 3.849e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 04:58:23,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=711345.1666666666, ans=0.125 2024-09-18 04:58:34,481 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:58:43,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=711401.8333333334, ans=0.2 2024-09-18 04:58:54,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711401.8333333334, ans=0.1 2024-09-18 04:58:59,971 INFO [train.py:1198] (1/2) Epoch 40, batch 1850, loss[loss=0.2334, ctc_loss=0.1564, cr_loss=0.3848, over 21002.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3723, over 4091998.09 frames. ], batch size: 61, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:59:11,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-18 04:59:27,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=711458.5, ans=15.0 2024-09-18 04:59:39,248 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:00:00,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=711543.5, ans=0.2 2024-09-18 05:00:14,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=711571.8333333334, ans=0.025 2024-09-18 05:00:15,715 INFO [train.py:1198] (1/2) Epoch 40, batch 1900, loss[loss=0.2207, ctc_loss=0.1467, cr_loss=0.3698, over 20409.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3716, over 4100099.35 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:00:19,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=711571.8333333334, ans=0.125 2024-09-18 05:00:24,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=711571.8333333334, ans=15.0 2024-09-18 05:00:43,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.243e+02 2.329e+02 2.533e+02 7.885e+02, threshold=4.658e+02, percent-clipped=1.0 2024-09-18 05:01:07,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=711656.8333333334, ans=0.125 2024-09-18 05:01:15,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711685.1666666666, ans=0.1 2024-09-18 05:01:25,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=711685.1666666666, ans=0.2 2024-09-18 05:01:31,418 INFO [train.py:1198] (1/2) Epoch 40, batch 1950, loss[loss=0.2285, ctc_loss=0.1542, cr_loss=0.3712, over 21017.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3707, over 4101961.77 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:01:46,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711741.8333333334, ans=0.125 2024-09-18 05:02:52,287 INFO [train.py:1198] (1/2) Epoch 40, batch 2000, loss[loss=0.1698, ctc_loss=0.1083, cr_loss=0.3076, over 20966.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3702, over 4069601.93 frames. ], batch size: 50, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:02:52,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711855.1666666666, ans=0.0 2024-09-18 05:03:19,713 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.217e+02 2.386e+02 2.605e+02 3.145e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 05:03:50,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=711940.1666666666, ans=0.2 2024-09-18 05:04:00,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=711968.5, ans=0.0 2024-09-18 05:04:08,710 INFO [train.py:1198] (1/2) Epoch 40, batch 2050, loss[loss=0.2253, ctc_loss=0.1506, cr_loss=0.3737, over 21032.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3709, over 4080719.78 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:04:12,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.87 vs. limit=10.0 2024-09-18 05:04:48,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=712053.5, ans=0.0 2024-09-18 05:04:54,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=712081.8333333334, ans=0.125 2024-09-18 05:04:59,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712081.8333333334, ans=0.0 2024-09-18 05:05:24,379 INFO [train.py:1198] (1/2) Epoch 40, batch 2100, loss[loss=0.2151, ctc_loss=0.1402, cr_loss=0.3743, over 20940.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3708, over 4091326.21 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:05:52,082 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.197e+02 2.316e+02 2.443e+02 3.588e+02, threshold=4.633e+02, percent-clipped=0.0 2024-09-18 05:05:52,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=712166.8333333334, ans=15.0 2024-09-18 05:05:56,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-09-18 05:06:28,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=712251.8333333334, ans=0.125 2024-09-18 05:06:40,450 INFO [train.py:1198] (1/2) Epoch 40, batch 2150, loss[loss=0.2795, ctc_loss=0.1947, cr_loss=0.4236, over 14579.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3708, over 4089959.55 frames. ], batch size: 151, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:06:49,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=712280.1666666666, ans=0.125 2024-09-18 05:07:02,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=712308.5, ans=0.0 2024-09-18 05:07:11,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=712336.8333333334, ans=0.0 2024-09-18 05:07:15,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=712336.8333333334, ans=0.04949747468305833 2024-09-18 05:07:56,576 INFO [train.py:1198] (1/2) Epoch 40, batch 2200, loss[loss=0.2335, ctc_loss=0.1561, cr_loss=0.3869, over 20298.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3694, over 4095033.49 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:08:16,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712450.1666666666, ans=0.1 2024-09-18 05:08:21,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712450.1666666666, ans=0.0 2024-09-18 05:08:29,673 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.189e+02 2.341e+02 2.464e+02 3.912e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 05:08:45,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-09-18 05:09:18,263 INFO [train.py:1198] (1/2) Epoch 40, batch 2250, loss[loss=0.1933, ctc_loss=0.1241, cr_loss=0.3457, over 20963.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3707, over 4104289.44 frames. ], batch size: 50, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:09:18,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712563.5, ans=0.1 2024-09-18 05:09:26,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=712563.5, ans=0.125 2024-09-18 05:09:45,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=712591.8333333334, ans=10.0 2024-09-18 05:09:58,006 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:10:05,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=712648.5, ans=0.2 2024-09-18 05:10:07,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=712648.5, ans=0.125 2024-09-18 05:10:34,066 INFO [train.py:1198] (1/2) Epoch 40, batch 2300, loss[loss=0.2709, ctc_loss=0.1896, cr_loss=0.4068, over 14557.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3714, over 4094542.87 frames. ], batch size: 149, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:10:52,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=712733.5, ans=0.0 2024-09-18 05:10:56,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=712733.5, ans=0.125 2024-09-18 05:11:00,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2024-09-18 05:11:01,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.194e+02 2.340e+02 2.474e+02 4.176e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 05:11:22,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-18 05:11:28,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=712790.1666666666, ans=0.125 2024-09-18 05:11:30,194 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:11:49,329 INFO [train.py:1198] (1/2) Epoch 40, batch 2350, loss[loss=0.2512, ctc_loss=0.1726, cr_loss=0.3933, over 20656.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3713, over 4096459.17 frames. ], batch size: 71, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:12:08,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=712875.1666666666, ans=0.125 2024-09-18 05:12:29,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=712903.5, ans=0.125 2024-09-18 05:12:35,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-18 05:13:05,327 INFO [train.py:1198] (1/2) Epoch 40, batch 2400, loss[loss=0.2506, ctc_loss=0.1645, cr_loss=0.4302, over 20966.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3716, over 4087892.32 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:13:05,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712988.5, ans=0.1 2024-09-18 05:13:18,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712988.5, ans=0.125 2024-09-18 05:13:21,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=713016.8333333334, ans=0.2 2024-09-18 05:13:30,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=713016.8333333334, ans=0.2 2024-09-18 05:13:32,719 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.281e+02 2.452e+02 2.758e+02 4.358e+02, threshold=4.904e+02, percent-clipped=0.0 2024-09-18 05:14:05,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=713073.5, ans=0.07 2024-09-18 05:14:26,529 INFO [train.py:1198] (1/2) Epoch 40, batch 2450, loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3736, over 20944.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3714, over 4092570.73 frames. ], batch size: 60, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:14:34,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713130.1666666666, ans=0.1 2024-09-18 05:14:54,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-18 05:15:11,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=713215.1666666666, ans=0.2 2024-09-18 05:15:15,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=713215.1666666666, ans=0.125 2024-09-18 05:15:21,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=713215.1666666666, ans=0.0 2024-09-18 05:15:41,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713271.8333333334, ans=0.1 2024-09-18 05:15:42,449 INFO [train.py:1198] (1/2) Epoch 40, batch 2500, loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3653, over 21025.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3718, over 4095326.79 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:15:56,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713300.1666666666, ans=0.1 2024-09-18 05:16:09,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=713300.1666666666, ans=0.125 2024-09-18 05:16:10,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.211e+02 2.344e+02 2.471e+02 3.301e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 05:16:17,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=713328.5, ans=0.04949747468305833 2024-09-18 05:16:45,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=713385.1666666666, ans=0.125 2024-09-18 05:16:53,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.96 vs. limit=10.0 2024-09-18 05:16:57,596 INFO [train.py:1198] (1/2) Epoch 40, batch 2550, loss[loss=0.2752, ctc_loss=0.1948, cr_loss=0.402, over 14452.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3729, over 4071318.21 frames. ], batch size: 149, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:16:59,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=713413.5, ans=0.1 2024-09-18 05:17:08,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=713413.5, ans=10.0 2024-09-18 05:17:10,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=713413.5, ans=0.2 2024-09-18 05:17:42,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=12.0 2024-09-18 05:17:57,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=713526.8333333334, ans=0.0 2024-09-18 05:18:13,580 INFO [train.py:1198] (1/2) Epoch 40, batch 2600, loss[loss=0.2062, ctc_loss=0.1347, cr_loss=0.3574, over 20890.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.3728, over 4085843.11 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:18:20,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=713555.1666666666, ans=0.125 2024-09-18 05:18:42,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.166e+02 2.288e+02 2.468e+02 5.699e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-18 05:18:53,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=713611.8333333334, ans=0.125 2024-09-18 05:19:35,022 INFO [train.py:1198] (1/2) Epoch 40, batch 2650, loss[loss=0.1925, ctc_loss=0.1253, cr_loss=0.3362, over 20957.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.372, over 4071065.25 frames. ], batch size: 50, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:20:22,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-09-18 05:20:50,420 INFO [train.py:1198] (1/2) Epoch 40, batch 2700, loss[loss=0.203, ctc_loss=0.1337, cr_loss=0.3464, over 20946.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3715, over 4056481.53 frames. ], batch size: 51, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:20:51,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-18 05:20:53,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=713838.5, ans=0.025 2024-09-18 05:21:19,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.197e+02 2.311e+02 2.496e+02 3.574e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-18 05:21:25,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=713895.1666666666, ans=0.125 2024-09-18 05:21:56,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=713951.8333333334, ans=0.2 2024-09-18 05:22:06,507 INFO [train.py:1198] (1/2) Epoch 40, batch 2750, loss[loss=0.2387, ctc_loss=0.1589, cr_loss=0.399, over 19316.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3703, over 4058379.12 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:22:46,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714036.8333333334, ans=0.1 2024-09-18 05:22:52,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=714065.1666666666, ans=0.125 2024-09-18 05:23:06,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-18 05:23:08,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=714093.5, ans=0.125 2024-09-18 05:23:23,746 INFO [train.py:1198] (1/2) Epoch 40, batch 2800, loss[loss=0.232, ctc_loss=0.1508, cr_loss=0.4059, over 20975.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3704, over 4059114.68 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:23:36,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2024-09-18 05:23:38,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=714150.1666666666, ans=0.0 2024-09-18 05:23:38,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=714150.1666666666, ans=10.0 2024-09-18 05:23:52,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.199e+02 2.307e+02 2.456e+02 5.088e+02, threshold=4.613e+02, percent-clipped=1.0 2024-09-18 05:24:39,726 INFO [train.py:1198] (1/2) Epoch 40, batch 2850, loss[loss=0.2063, ctc_loss=0.137, cr_loss=0.3468, over 20774.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1439, cr_loss=0.3678, over 4063210.24 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:25:06,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-18 05:25:21,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-18 05:26:01,650 INFO [train.py:1198] (1/2) Epoch 40, batch 2900, loss[loss=0.2126, ctc_loss=0.1417, cr_loss=0.3542, over 20895.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.143, cr_loss=0.3666, over 4080397.60 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:26:30,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.193e+02 2.354e+02 2.517e+02 5.695e+02, threshold=4.708e+02, percent-clipped=1.0 2024-09-18 05:26:47,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=714490.1666666666, ans=0.0 2024-09-18 05:26:53,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-09-18 05:27:17,446 INFO [train.py:1198] (1/2) Epoch 40, batch 2950, loss[loss=0.2285, ctc_loss=0.1502, cr_loss=0.3916, over 20685.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3675, over 4090496.04 frames. ], batch size: 66, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:27:23,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=714546.8333333334, ans=0.025 2024-09-18 05:27:30,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2024-09-18 05:27:42,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=714575.1666666666, ans=0.125 2024-09-18 05:28:03,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=714631.8333333334, ans=0.0 2024-09-18 05:28:22,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714660.1666666666, ans=0.1 2024-09-18 05:28:31,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=714688.5, ans=0.125 2024-09-18 05:28:32,966 INFO [train.py:1198] (1/2) Epoch 40, batch 3000, loss[loss=0.1854, ctc_loss=0.1209, cr_loss=0.3224, over 20986.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3684, over 4083706.09 frames. ], batch size: 50, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:28:32,967 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 05:28:50,373 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6617, 4.1567, 3.1800, 3.6277], device='cuda:1') 2024-09-18 05:28:52,105 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.03997, ctc_loss=0.03997, cr_loss=1.401e-14, over 944034.00 frames. 2024-09-18 05:28:52,105 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 05:29:21,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.205e+02 2.349e+02 2.498e+02 3.902e+02, threshold=4.698e+02, percent-clipped=0.0 2024-09-18 05:29:21,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=714745.1666666666, ans=0.2 2024-09-18 05:29:41,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-09-18 05:29:44,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=714773.5, ans=0.125 2024-09-18 05:30:05,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-18 05:30:07,831 INFO [train.py:1198] (1/2) Epoch 40, batch 3050, loss[loss=0.2212, ctc_loss=0.1462, cr_loss=0.3753, over 20938.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3698, over 4092890.67 frames. ], batch size: 60, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:30:18,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=714830.1666666666, ans=0.025 2024-09-18 05:30:23,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=714858.5, ans=0.125 2024-09-18 05:31:21,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=714943.5, ans=0.0 2024-09-18 05:31:27,011 INFO [train.py:1198] (1/2) Epoch 40, batch 3100, loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3742, over 20874.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3698, over 4095787.46 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:31:55,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.196e+02 2.363e+02 2.569e+02 5.054e+02, threshold=4.725e+02, percent-clipped=1.0 2024-09-18 05:32:14,077 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:32:34,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-18 05:32:43,023 INFO [train.py:1198] (1/2) Epoch 40, batch 3150, loss[loss=0.2417, ctc_loss=0.1575, cr_loss=0.4208, over 20634.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3698, over 4100651.88 frames. ], batch size: 68, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:33:00,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2024-09-18 05:33:12,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715170.1666666666, ans=0.0 2024-09-18 05:33:17,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.96 vs. limit=12.0 2024-09-18 05:33:34,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-09-18 05:33:36,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=715198.5, ans=0.125 2024-09-18 05:33:37,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715198.5, ans=0.1 2024-09-18 05:33:58,450 INFO [train.py:1198] (1/2) Epoch 40, batch 3200, loss[loss=0.2203, ctc_loss=0.147, cr_loss=0.3664, over 20995.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.3714, over 4093751.07 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:34:00,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715255.1666666666, ans=0.1 2024-09-18 05:34:26,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-09-18 05:34:27,353 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.211e+02 2.329e+02 2.499e+02 3.959e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-18 05:34:42,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=715340.1666666666, ans=0.2 2024-09-18 05:35:14,001 INFO [train.py:1198] (1/2) Epoch 40, batch 3250, loss[loss=0.2271, ctc_loss=0.1515, cr_loss=0.378, over 20971.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1446, cr_loss=0.3706, over 4098270.56 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:35:14,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=715396.8333333334, ans=0.0 2024-09-18 05:35:15,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=715396.8333333334, ans=0.2 2024-09-18 05:36:24,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=715510.1666666666, ans=0.025 2024-09-18 05:36:35,997 INFO [train.py:1198] (1/2) Epoch 40, batch 3300, loss[loss=0.1831, ctc_loss=0.1215, cr_loss=0.3082, over 20992.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3715, over 4083290.03 frames. ], batch size: 52, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:36:42,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=715538.5, ans=0.2 2024-09-18 05:37:06,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.240e+02 2.360e+02 2.548e+02 4.061e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 05:37:49,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=715651.8333333334, ans=0.2 2024-09-18 05:37:52,072 INFO [train.py:1198] (1/2) Epoch 40, batch 3350, loss[loss=0.2281, ctc_loss=0.1501, cr_loss=0.3905, over 20945.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3721, over 4092902.40 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:38:18,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715708.5, ans=0.1 2024-09-18 05:38:21,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715736.8333333334, ans=0.1 2024-09-18 05:39:07,749 INFO [train.py:1198] (1/2) Epoch 40, batch 3400, loss[loss=0.2229, ctc_loss=0.146, cr_loss=0.3842, over 21030.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3713, over 4081379.95 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:39:37,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.268e+02 2.415e+02 2.555e+02 3.340e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-18 05:39:38,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-09-18 05:39:48,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=715878.5, ans=0.125 2024-09-18 05:39:56,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=715906.8333333334, ans=0.125 2024-09-18 05:40:23,427 INFO [train.py:1198] (1/2) Epoch 40, batch 3450, loss[loss=0.2629, ctc_loss=0.1792, cr_loss=0.4188, over 19517.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3702, over 4080461.65 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:41:03,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=716020.1666666666, ans=0.125 2024-09-18 05:41:06,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 05:41:06,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-18 05:41:10,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=716048.5, ans=0.125 2024-09-18 05:41:13,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=716048.5, ans=0.0 2024-09-18 05:41:18,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=22.5 2024-09-18 05:41:23,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2024-09-18 05:41:36,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-18 05:41:39,184 INFO [train.py:1198] (1/2) Epoch 40, batch 3500, loss[loss=0.2068, ctc_loss=0.1329, cr_loss=0.3695, over 20981.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.37, over 4079470.98 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:41:56,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-09-18 05:42:04,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=716133.5, ans=0.125 2024-09-18 05:42:12,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.250e+02 2.387e+02 2.515e+02 3.300e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 05:42:41,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-18 05:42:50,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=716218.5, ans=0.125 2024-09-18 05:43:01,003 INFO [train.py:1198] (1/2) Epoch 40, batch 3550, loss[loss=0.2523, ctc_loss=0.171, cr_loss=0.4065, over 17924.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3707, over 4077969.43 frames. ], batch size: 108, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:43:03,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716246.8333333334, ans=0.1 2024-09-18 05:43:22,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=716275.1666666666, ans=0.0 2024-09-18 05:44:17,830 INFO [train.py:1198] (1/2) Epoch 40, batch 3600, loss[loss=0.2625, ctc_loss=0.1871, cr_loss=0.3768, over 14045.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3703, over 4076924.93 frames. ], batch size: 150, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:44:48,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.192e+02 2.361e+02 2.576e+02 6.666e+02, threshold=4.721e+02, percent-clipped=1.0 2024-09-18 05:45:17,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=716501.8333333334, ans=0.0 2024-09-18 05:45:34,280 INFO [train.py:1198] (1/2) Epoch 40, batch 3650, loss[loss=0.2581, ctc_loss=0.1753, cr_loss=0.4143, over 20944.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3714, over 4080455.00 frames. ], batch size: 60, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:46:13,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=716586.8333333334, ans=0.0 2024-09-18 05:46:38,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2024-09-18 05:46:43,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=716643.5, ans=0.125 2024-09-18 05:46:46,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2024-09-18 05:46:49,429 INFO [train.py:1198] (1/2) Epoch 40, batch 3700, loss[loss=0.1689, ctc_loss=0.1099, cr_loss=0.2949, over 20974.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3727, over 4078122.50 frames. ], batch size: 49, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:47:19,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.252e+02 2.389e+02 2.518e+02 3.350e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-18 05:47:27,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=716728.5, ans=0.0 2024-09-18 05:47:27,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=716728.5, ans=0.125 2024-09-18 05:47:40,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=22.5 2024-09-18 05:47:54,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=716785.1666666666, ans=0.2 2024-09-18 05:48:10,916 INFO [train.py:1198] (1/2) Epoch 40, batch 3750, loss[loss=0.1982, ctc_loss=0.13, cr_loss=0.3409, over 20978.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3717, over 4084927.99 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:48:52,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=716870.1666666666, ans=0.0 2024-09-18 05:49:26,832 INFO [train.py:1198] (1/2) Epoch 40, batch 3800, loss[loss=0.2148, ctc_loss=0.1421, cr_loss=0.3635, over 21010.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3715, over 4078776.98 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:49:43,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=716983.5, ans=0.125 2024-09-18 05:49:56,936 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.174e+02 2.344e+02 2.485e+02 3.489e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 05:50:05,075 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:50:17,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717040.1666666666, ans=0.1 2024-09-18 05:50:27,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=717068.5, ans=0.125 2024-09-18 05:50:36,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=717068.5, ans=0.5 2024-09-18 05:50:38,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=12.0 2024-09-18 05:50:42,677 INFO [train.py:1198] (1/2) Epoch 40, batch 3850, loss[loss=0.2122, ctc_loss=0.1427, cr_loss=0.3476, over 21017.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3717, over 4094055.28 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:51:07,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717125.1666666666, ans=0.1 2024-09-18 05:51:26,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-18 05:51:58,337 INFO [train.py:1198] (1/2) Epoch 40, batch 3900, loss[loss=0.2033, ctc_loss=0.1326, cr_loss=0.3537, over 20949.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3711, over 4098175.74 frames. ], batch size: 50, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:52:28,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.204e+02 2.334e+02 2.486e+02 5.419e+02, threshold=4.667e+02, percent-clipped=2.0 2024-09-18 05:52:40,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2024-09-18 05:52:50,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717323.5, ans=0.1 2024-09-18 05:53:14,172 INFO [train.py:1198] (1/2) Epoch 40, batch 3950, loss[loss=0.1958, ctc_loss=0.1287, cr_loss=0.3355, over 20781.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3709, over 4109179.40 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:54:01,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717436.8333333334, ans=0.1 2024-09-18 05:54:24,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=717493.5, ans=0.2 2024-09-18 05:54:36,533 INFO [train.py:1198] (1/2) Epoch 40, batch 4000, loss[loss=0.2315, ctc_loss=0.1547, cr_loss=0.3842, over 21017.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3709, over 4114991.77 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:55:01,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=717550.1666666666, ans=0.125 2024-09-18 05:55:06,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.236e+02 2.373e+02 2.510e+02 5.322e+02, threshold=4.745e+02, percent-clipped=1.0 2024-09-18 05:55:34,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-18 05:55:51,639 INFO [train.py:1198] (1/2) Epoch 40, batch 4050, loss[loss=0.2415, ctc_loss=0.1684, cr_loss=0.3654, over 13972.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.369, over 4102002.18 frames. ], batch size: 149, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:56:23,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=717720.1666666666, ans=0.0 2024-09-18 05:56:26,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=717720.1666666666, ans=0.2 2024-09-18 05:56:27,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=717720.1666666666, ans=0.125 2024-09-18 05:56:31,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=717720.1666666666, ans=0.0 2024-09-18 05:56:57,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717776.8333333334, ans=0.1 2024-09-18 05:57:07,724 INFO [train.py:1198] (1/2) Epoch 40, batch 4100, loss[loss=0.2102, ctc_loss=0.1375, cr_loss=0.3635, over 20780.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3697, over 4101256.43 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:57:38,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.146e+02 2.266e+02 2.399e+02 2.941e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-18 05:57:59,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=717890.1666666666, ans=0.125 2024-09-18 05:58:17,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=717918.5, ans=0.025 2024-09-18 05:58:17,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=717918.5, ans=0.025 2024-09-18 05:58:23,600 INFO [train.py:1198] (1/2) Epoch 40, batch 4150, loss[loss=0.2023, ctc_loss=0.1302, cr_loss=0.3604, over 20883.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.37, over 4106886.84 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:58:27,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=717946.8333333334, ans=0.125 2024-09-18 05:58:31,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717946.8333333334, ans=0.125 2024-09-18 05:58:55,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=718003.5, ans=0.2 2024-09-18 05:59:24,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-18 05:59:31,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 05:59:43,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718088.5, ans=0.125 2024-09-18 05:59:45,141 INFO [train.py:1198] (1/2) Epoch 40, batch 4200, loss[loss=0.2607, ctc_loss=0.1741, cr_loss=0.4332, over 19999.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3711, over 4100109.63 frames. ], batch size: 80, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:59:53,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=718088.5, ans=0.2 2024-09-18 05:59:59,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=718116.8333333334, ans=0.0 2024-09-18 06:00:14,435 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:00:15,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.208e+02 2.321e+02 2.458e+02 3.626e+02, threshold=4.642e+02, percent-clipped=0.0 2024-09-18 06:00:19,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-18 06:01:00,870 INFO [train.py:1198] (1/2) Epoch 40, batch 4250, loss[loss=0.2272, ctc_loss=0.1531, cr_loss=0.3703, over 21052.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1446, cr_loss=0.3707, over 4099337.13 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:01:14,948 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:01:33,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=718286.8333333334, ans=15.0 2024-09-18 06:02:00,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=718343.5, ans=0.125 2024-09-18 06:02:05,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=22.5 2024-09-18 06:02:16,525 INFO [train.py:1198] (1/2) Epoch 40, batch 4300, loss[loss=0.2301, ctc_loss=0.1503, cr_loss=0.399, over 20776.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3696, over 4111359.96 frames. ], batch size: 56, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:02:16,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=718371.8333333334, ans=0.2 2024-09-18 06:02:23,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=718371.8333333334, ans=0.0 2024-09-18 06:02:37,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=22.5 2024-09-18 06:02:44,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718400.1666666666, ans=0.1 2024-09-18 06:02:47,031 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.232e+02 2.350e+02 2.494e+02 3.008e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 06:02:56,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=718428.5, ans=0.2 2024-09-18 06:03:02,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=718456.8333333334, ans=0.125 2024-09-18 06:03:16,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=718485.1666666666, ans=0.125 2024-09-18 06:03:31,967 INFO [train.py:1198] (1/2) Epoch 40, batch 4350, loss[loss=0.228, ctc_loss=0.1489, cr_loss=0.3952, over 21018.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3709, over 4091727.80 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:03:53,617 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:04:04,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=718570.1666666666, ans=0.0 2024-09-18 06:04:23,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=718598.5, ans=0.0 2024-09-18 06:04:47,483 INFO [train.py:1198] (1/2) Epoch 40, batch 4400, loss[loss=0.2436, ctc_loss=0.1632, cr_loss=0.4022, over 21076.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3701, over 4098469.11 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:05:20,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.237e+02 2.348e+02 2.483e+02 3.134e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-18 06:05:55,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=718768.5, ans=0.2 2024-09-18 06:06:08,667 INFO [train.py:1198] (1/2) Epoch 40, batch 4450, loss[loss=0.1953, ctc_loss=0.1268, cr_loss=0.3425, over 19946.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.3698, over 4098926.15 frames. ], batch size: 44, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:06:18,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=718796.8333333334, ans=0.07 2024-09-18 06:07:12,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=718910.1666666666, ans=15.0 2024-09-18 06:07:24,041 INFO [train.py:1198] (1/2) Epoch 40, batch 4500, loss[loss=0.1719, ctc_loss=0.1111, cr_loss=0.3041, over 20988.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3684, over 4095659.13 frames. ], batch size: 48, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:07:24,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=718938.5, ans=0.0 2024-09-18 06:07:32,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-18 06:07:53,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.213e+02 2.343e+02 2.475e+02 4.058e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-18 06:08:03,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-18 06:08:27,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=719051.8333333334, ans=0.2 2024-09-18 06:08:39,445 INFO [train.py:1198] (1/2) Epoch 40, batch 4550, loss[loss=0.2037, ctc_loss=0.1347, cr_loss=0.3447, over 20876.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3697, over 4091432.72 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:09:20,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=719136.8333333334, ans=0.2 2024-09-18 06:09:28,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 06:09:40,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=719193.5, ans=0.125 2024-09-18 06:09:55,086 INFO [train.py:1198] (1/2) Epoch 40, batch 4600, loss[loss=0.2271, ctc_loss=0.1527, cr_loss=0.3724, over 21032.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3704, over 4081173.10 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:09:55,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-18 06:10:25,685 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.191e+02 2.323e+02 2.509e+02 4.931e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-18 06:10:30,672 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:11:17,097 INFO [train.py:1198] (1/2) Epoch 40, batch 4650, loss[loss=0.2508, ctc_loss=0.1718, cr_loss=0.395, over 18192.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3702, over 4073419.96 frames. ], batch size: 108, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:11:18,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2024-09-18 06:11:30,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=22.5 2024-09-18 06:11:32,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=719391.8333333334, ans=0.2 2024-09-18 06:11:38,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=719391.8333333334, ans=0.125 2024-09-18 06:11:43,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=719391.8333333334, ans=0.07 2024-09-18 06:11:52,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=719420.1666666666, ans=0.125 2024-09-18 06:12:15,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=719448.5, ans=0.125 2024-09-18 06:12:15,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=719448.5, ans=0.125 2024-09-18 06:12:32,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=22.5 2024-09-18 06:12:32,933 INFO [train.py:1198] (1/2) Epoch 40, batch 4700, loss[loss=0.2174, ctc_loss=0.144, cr_loss=0.3669, over 21028.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3697, over 4080731.66 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:12:39,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-18 06:12:57,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=719533.5, ans=0.125 2024-09-18 06:13:03,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.254e+02 2.379e+02 2.509e+02 3.494e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 06:13:04,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=719561.8333333334, ans=0.025 2024-09-18 06:13:06,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=719561.8333333334, ans=0.5 2024-09-18 06:13:30,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=719590.1666666666, ans=0.125 2024-09-18 06:13:32,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-18 06:13:48,328 INFO [train.py:1198] (1/2) Epoch 40, batch 4750, loss[loss=0.2398, ctc_loss=0.1617, cr_loss=0.3901, over 20036.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3698, over 4091838.41 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:13:57,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719646.8333333334, ans=0.1 2024-09-18 06:13:59,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=719646.8333333334, ans=0.125 2024-09-18 06:14:06,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719675.1666666666, ans=0.1 2024-09-18 06:14:14,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=719675.1666666666, ans=0.125 2024-09-18 06:14:35,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=719731.8333333334, ans=0.0 2024-09-18 06:14:41,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=719731.8333333334, ans=0.0 2024-09-18 06:14:45,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719731.8333333334, ans=0.1 2024-09-18 06:15:03,566 INFO [train.py:1198] (1/2) Epoch 40, batch 4800, loss[loss=0.254, ctc_loss=0.1699, cr_loss=0.4205, over 20970.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3699, over 4088119.67 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:15:08,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719788.5, ans=0.1 2024-09-18 06:15:34,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.277e+02 2.412e+02 2.562e+02 5.856e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-18 06:15:35,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=719845.1666666666, ans=0.0 2024-09-18 06:15:59,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719873.5, ans=0.1 2024-09-18 06:16:02,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=719901.8333333334, ans=0.0 2024-09-18 06:16:03,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=719901.8333333334, ans=0.0 2024-09-18 06:16:22,061 INFO [train.py:1198] (1/2) Epoch 40, batch 4850, loss[loss=0.1822, ctc_loss=0.1153, cr_loss=0.3345, over 20955.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3698, over 4089380.53 frames. ], batch size: 49, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:16:52,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=719958.5, ans=0.0 2024-09-18 06:16:55,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719986.8333333334, ans=0.125 2024-09-18 06:17:22,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-18 06:17:23,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=720043.5, ans=0.0 2024-09-18 06:17:39,955 INFO [train.py:1198] (1/2) Epoch 40, batch 4900, loss[loss=0.2015, ctc_loss=0.1333, cr_loss=0.341, over 21050.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.3699, over 4088366.48 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:17:58,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2024-09-18 06:18:12,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.223e+02 2.324e+02 2.522e+02 5.395e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-18 06:18:13,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-18 06:18:54,428 INFO [train.py:1198] (1/2) Epoch 40, batch 4950, loss[loss=0.2169, ctc_loss=0.1465, cr_loss=0.3517, over 19428.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3704, over 4093476.45 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:19:01,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2024-09-18 06:19:30,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=720270.1666666666, ans=0.125 2024-09-18 06:20:08,677 INFO [train.py:1198] (1/2) Epoch 40, batch 5000, loss[loss=0.2261, ctc_loss=0.1482, cr_loss=0.3894, over 21065.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3711, over 4085774.03 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:20:25,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=720383.5, ans=0.125 2024-09-18 06:20:41,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.206e+02 2.341e+02 2.465e+02 6.433e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 06:20:51,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-18 06:20:51,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720440.1666666666, ans=0.1 2024-09-18 06:21:23,126 INFO [train.py:1198] (1/2) Epoch 40, batch 5050, loss[loss=0.2654, ctc_loss=0.1784, cr_loss=0.4353, over 19791.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3718, over 4082742.49 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:21:25,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-18 06:21:39,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=720525.1666666666, ans=0.125 2024-09-18 06:22:03,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720553.5, ans=0.125 2024-09-18 06:22:27,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=720610.1666666666, ans=0.0 2024-09-18 06:22:37,921 INFO [train.py:1198] (1/2) Epoch 40, batch 5100, loss[loss=0.2295, ctc_loss=0.1522, cr_loss=0.3869, over 20877.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1458, cr_loss=0.3716, over 4094361.25 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:22:44,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720638.5, ans=0.1 2024-09-18 06:23:01,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720666.8333333334, ans=0.1 2024-09-18 06:23:10,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.204e+02 2.346e+02 2.481e+02 2.923e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-18 06:23:52,532 INFO [train.py:1198] (1/2) Epoch 40, batch 5150, loss[loss=0.2386, ctc_loss=0.1588, cr_loss=0.399, over 20082.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3714, over 4086153.29 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:23:54,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-09-18 06:23:57,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=720780.1666666666, ans=0.2 2024-09-18 06:24:56,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=720893.5, ans=0.0 2024-09-18 06:25:07,349 INFO [train.py:1198] (1/2) Epoch 40, batch 5200, loss[loss=0.2357, ctc_loss=0.1583, cr_loss=0.387, over 20838.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1459, cr_loss=0.3724, over 4090301.76 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:25:41,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=720978.5, ans=0.0 2024-09-18 06:25:42,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=22.5 2024-09-18 06:25:42,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.162e+02 2.284e+02 2.423e+02 6.480e+02, threshold=4.569e+02, percent-clipped=1.0 2024-09-18 06:25:46,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.36 vs. limit=10.0 2024-09-18 06:25:57,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=721006.8333333334, ans=0.0 2024-09-18 06:26:00,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=721006.8333333334, ans=0.025 2024-09-18 06:26:04,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-09-18 06:26:26,545 INFO [train.py:1198] (1/2) Epoch 40, batch 5250, loss[loss=0.229, ctc_loss=0.1534, cr_loss=0.3782, over 20969.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3715, over 4089696.91 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:26:29,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=721063.5, ans=0.125 2024-09-18 06:27:20,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721148.5, ans=0.1 2024-09-18 06:27:41,052 INFO [train.py:1198] (1/2) Epoch 40, batch 5300, loss[loss=0.2392, ctc_loss=0.1623, cr_loss=0.3847, over 20677.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1469, cr_loss=0.3728, over 4072690.43 frames. ], batch size: 68, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:27:58,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=721233.5, ans=0.125 2024-09-18 06:28:06,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=721233.5, ans=0.025 2024-09-18 06:28:13,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.189e+02 2.348e+02 2.490e+02 3.604e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-18 06:28:22,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-09-18 06:28:46,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=721318.5, ans=0.125 2024-09-18 06:28:55,279 INFO [train.py:1198] (1/2) Epoch 40, batch 5350, loss[loss=0.2398, ctc_loss=0.159, cr_loss=0.4041, over 20650.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3712, over 4075801.56 frames. ], batch size: 66, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:28:58,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=721346.8333333334, ans=0.125 2024-09-18 06:29:30,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=15.0 2024-09-18 06:29:33,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-18 06:30:09,033 INFO [train.py:1198] (1/2) Epoch 40, batch 5400, loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3682, over 20282.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3714, over 4065539.08 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:30:28,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721516.8333333334, ans=0.1 2024-09-18 06:30:32,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=12.0 2024-09-18 06:30:41,800 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.239e+02 2.325e+02 2.499e+02 3.457e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-18 06:30:54,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=721573.5, ans=0.125 2024-09-18 06:31:19,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=721601.8333333334, ans=0.04949747468305833 2024-09-18 06:31:23,730 INFO [train.py:1198] (1/2) Epoch 40, batch 5450, loss[loss=0.2147, ctc_loss=0.144, cr_loss=0.3534, over 20977.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.371, over 4081703.76 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:31:28,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=721630.1666666666, ans=10.0 2024-09-18 06:31:37,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=721658.5, ans=0.2 2024-09-18 06:31:37,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721658.5, ans=0.0 2024-09-18 06:31:45,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721658.5, ans=0.1 2024-09-18 06:32:38,335 INFO [train.py:1198] (1/2) Epoch 40, batch 5500, loss[loss=0.2397, ctc_loss=0.1616, cr_loss=0.3902, over 19593.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.371, over 4087806.15 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:32:51,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721800.1666666666, ans=0.1 2024-09-18 06:32:55,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-18 06:33:10,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.195e+02 2.287e+02 2.514e+02 4.006e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-18 06:33:15,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=721828.5, ans=0.125 2024-09-18 06:33:23,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=22.5 2024-09-18 06:33:52,924 INFO [train.py:1198] (1/2) Epoch 40, batch 5550, loss[loss=0.2008, ctc_loss=0.1337, cr_loss=0.3355, over 20989.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3698, over 4097399.04 frames. ], batch size: 51, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:33:56,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=721913.5, ans=0.125 2024-09-18 06:34:29,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-18 06:35:09,676 INFO [train.py:1198] (1/2) Epoch 40, batch 5600, loss[loss=0.2149, ctc_loss=0.1425, cr_loss=0.3618, over 21008.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3709, over 4090215.34 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:35:31,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=722083.5, ans=0.0 2024-09-18 06:35:44,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.273e+02 2.378e+02 2.596e+02 8.158e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 06:35:51,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=722111.8333333334, ans=0.2 2024-09-18 06:36:26,339 INFO [train.py:1198] (1/2) Epoch 40, batch 5650, loss[loss=0.2102, ctc_loss=0.1383, cr_loss=0.3598, over 20963.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3699, over 4094842.40 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:36:38,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722196.8333333334, ans=0.125 2024-09-18 06:36:43,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=12.0 2024-09-18 06:37:08,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=722253.5, ans=0.5 2024-09-18 06:37:14,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=722281.8333333334, ans=0.125 2024-09-18 06:37:37,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=722310.1666666666, ans=0.0 2024-09-18 06:37:40,329 INFO [train.py:1198] (1/2) Epoch 40, batch 5700, loss[loss=0.2288, ctc_loss=0.1613, cr_loss=0.3374, over 19321.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3697, over 4082861.47 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:38:11,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722395.1666666666, ans=0.1 2024-09-18 06:38:12,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.198e+02 2.297e+02 2.427e+02 4.374e+02, threshold=4.594e+02, percent-clipped=0.0 2024-09-18 06:38:12,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=722395.1666666666, ans=0.2 2024-09-18 06:38:54,034 INFO [train.py:1198] (1/2) Epoch 40, batch 5750, loss[loss=0.2036, ctc_loss=0.1335, cr_loss=0.3503, over 20973.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3698, over 4085388.47 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:39:00,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722480.1666666666, ans=0.1 2024-09-18 06:39:01,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=722480.1666666666, ans=0.125 2024-09-18 06:39:32,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=722536.8333333334, ans=0.025 2024-09-18 06:39:53,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=722593.5, ans=0.125 2024-09-18 06:40:08,298 INFO [train.py:1198] (1/2) Epoch 40, batch 5800, loss[loss=0.2664, ctc_loss=0.1834, cr_loss=0.415, over 19942.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4085845.79 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:40:09,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=722621.8333333334, ans=0.125 2024-09-18 06:40:23,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=22.5 2024-09-18 06:40:28,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=722650.1666666666, ans=0.125 2024-09-18 06:40:28,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722650.1666666666, ans=0.1 2024-09-18 06:40:28,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=722650.1666666666, ans=0.2 2024-09-18 06:40:42,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.228e+02 2.322e+02 2.457e+02 3.664e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-18 06:40:48,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=722678.5, ans=0.025 2024-09-18 06:41:22,011 INFO [train.py:1198] (1/2) Epoch 40, batch 5850, loss[loss=0.2292, ctc_loss=0.1511, cr_loss=0.3901, over 20878.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3705, over 4096497.88 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:41:38,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=722791.8333333334, ans=0.125 2024-09-18 06:42:02,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=722820.1666666666, ans=0.125 2024-09-18 06:42:38,142 INFO [train.py:1198] (1/2) Epoch 40, batch 5900, loss[loss=0.2162, ctc_loss=0.1431, cr_loss=0.3658, over 19298.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1462, cr_loss=0.3723, over 4091773.38 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:42:44,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=722905.1666666666, ans=0.125 2024-09-18 06:43:12,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.213e+02 2.406e+02 2.573e+02 4.343e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-18 06:43:45,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723018.5, ans=0.0 2024-09-18 06:43:53,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=723046.8333333334, ans=0.0 2024-09-18 06:43:55,145 INFO [train.py:1198] (1/2) Epoch 40, batch 5950, loss[loss=0.2398, ctc_loss=0.1596, cr_loss=0.4007, over 20656.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3708, over 4094700.03 frames. ], batch size: 68, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:44:04,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=723046.8333333334, ans=0.07 2024-09-18 06:44:38,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-18 06:45:07,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=723160.1666666666, ans=0.2 2024-09-18 06:45:09,801 INFO [train.py:1198] (1/2) Epoch 40, batch 6000, loss[loss=0.2425, ctc_loss=0.1601, cr_loss=0.4119, over 20829.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3708, over 4100680.77 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:45:09,801 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 06:45:31,399 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.03931, ctc_loss=0.03931, cr_loss=1.423e-14, over 944034.00 frames. 2024-09-18 06:45:31,400 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 06:45:34,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=723188.5, ans=0.125 2024-09-18 06:45:39,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=723188.5, ans=0.125 2024-09-18 06:45:45,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-18 06:46:05,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.242e+02 2.381e+02 2.556e+02 3.426e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-18 06:46:05,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=723245.1666666666, ans=0.2 2024-09-18 06:46:26,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723273.5, ans=0.125 2024-09-18 06:46:32,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-18 06:46:46,686 INFO [train.py:1198] (1/2) Epoch 40, batch 6050, loss[loss=0.2509, ctc_loss=0.1735, cr_loss=0.387, over 14078.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3714, over 4086531.03 frames. ], batch size: 150, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:47:11,223 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:47:17,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=723386.8333333334, ans=0.125 2024-09-18 06:47:22,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=723386.8333333334, ans=0.125 2024-09-18 06:47:32,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=723415.1666666666, ans=0.0 2024-09-18 06:47:32,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=723415.1666666666, ans=0.2 2024-09-18 06:47:44,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723415.1666666666, ans=0.1 2024-09-18 06:48:01,233 INFO [train.py:1198] (1/2) Epoch 40, batch 6100, loss[loss=0.2165, ctc_loss=0.1434, cr_loss=0.3656, over 20642.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3696, over 4087149.78 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:48:11,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=723471.8333333334, ans=0.125 2024-09-18 06:48:17,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=723500.1666666666, ans=0.125 2024-09-18 06:48:35,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.232e+02 2.364e+02 2.512e+02 4.651e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 06:49:14,560 INFO [train.py:1198] (1/2) Epoch 40, batch 6150, loss[loss=0.1827, ctc_loss=0.1174, cr_loss=0.3264, over 19851.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3713, over 4062687.07 frames. ], batch size: 44, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:49:39,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723641.8333333334, ans=0.0 2024-09-18 06:49:47,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723670.1666666666, ans=0.125 2024-09-18 06:50:28,990 INFO [train.py:1198] (1/2) Epoch 40, batch 6200, loss[loss=0.2336, ctc_loss=0.1557, cr_loss=0.3899, over 20592.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1474, cr_loss=0.373, over 4029520.56 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:50:29,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=723755.1666666666, ans=0.0 2024-09-18 06:50:31,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.15 vs. limit=6.0 2024-09-18 06:50:35,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=723755.1666666666, ans=0.2 2024-09-18 06:50:43,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=723783.5, ans=0.0 2024-09-18 06:51:03,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.199e+02 2.310e+02 2.501e+02 4.148e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-18 06:51:06,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=723811.8333333334, ans=0.125 2024-09-18 06:51:29,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723868.5, ans=0.1 2024-09-18 06:51:43,924 INFO [train.py:1198] (1/2) Epoch 40, batch 6250, loss[loss=0.2565, ctc_loss=0.1806, cr_loss=0.3792, over 14214.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1463, cr_loss=0.3707, over 4019820.45 frames. ], batch size: 149, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:51:57,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=22.5 2024-09-18 06:52:01,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=723925.1666666666, ans=0.125 2024-09-18 06:52:12,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=723953.5, ans=0.125 2024-09-18 06:52:26,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=723981.8333333334, ans=0.125 2024-09-18 06:52:26,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=723981.8333333334, ans=0.125 2024-09-18 06:52:56,850 INFO [train.py:1198] (1/2) Epoch 40, batch 6300, loss[loss=0.2558, ctc_loss=0.1762, cr_loss=0.3983, over 14187.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.37, over 3983889.70 frames. ], batch size: 149, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:53:10,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724066.8333333334, ans=0.0 2024-09-18 06:53:28,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724095.1666666666, ans=0.125 2024-09-18 06:53:30,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.285e+02 2.505e+02 2.763e+02 3.528e+02, threshold=5.009e+02, percent-clipped=0.0 2024-09-18 06:53:32,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724095.1666666666, ans=0.1 2024-09-18 06:53:49,042 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:54:03,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=724151.8333333334, ans=0.025 2024-09-18 06:54:05,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-18 06:54:10,919 INFO [train.py:1198] (1/2) Epoch 40, batch 6350, loss[loss=0.2085, ctc_loss=0.1374, cr_loss=0.3556, over 19966.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1447, cr_loss=0.3654, over 3920949.96 frames. ], batch size: 44, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:54:32,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=724208.5, ans=0.125 2024-09-18 06:55:58,403 INFO [train.py:1198] (1/2) Epoch 41, batch 0, loss[loss=0.2174, ctc_loss=0.1427, cr_loss=0.3732, over 20789.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1427, cr_loss=0.3732, over 20789.00 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:55:58,403 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 06:56:18,010 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.0391, ctc_loss=0.0391, cr_loss=1.436e-14, over 944034.00 frames. 2024-09-18 06:56:18,011 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 06:57:06,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=724381.3333333334, ans=0.125 2024-09-18 06:57:07,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.339e+02 2.665e+02 2.902e+02 5.017e+02, threshold=5.331e+02, percent-clipped=1.0 2024-09-18 06:57:18,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-18 06:57:19,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-18 06:57:34,692 INFO [train.py:1198] (1/2) Epoch 41, batch 50, loss[loss=0.2283, ctc_loss=0.1493, cr_loss=0.3947, over 20961.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3679, over 927192.78 frames. ], batch size: 64, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:58:15,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=724494.6666666666, ans=0.2 2024-09-18 06:58:33,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724551.3333333334, ans=0.0 2024-09-18 06:58:50,007 INFO [train.py:1198] (1/2) Epoch 41, batch 100, loss[loss=0.2171, ctc_loss=0.1411, cr_loss=0.38, over 20974.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1451, cr_loss=0.3726, over 1635950.48 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:59:07,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=724608.0, ans=0.125 2024-09-18 06:59:15,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.92 vs. limit=15.0 2024-09-18 06:59:19,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=724608.0, ans=0.2 2024-09-18 06:59:28,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=724636.3333333334, ans=0.0 2024-09-18 06:59:39,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.224e+02 2.374e+02 2.549e+02 3.947e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 07:00:02,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=724693.0, ans=0.0 2024-09-18 07:00:07,303 INFO [train.py:1198] (1/2) Epoch 41, batch 150, loss[loss=0.1993, ctc_loss=0.1303, cr_loss=0.3448, over 21084.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3701, over 2155968.42 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:00:07,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724721.3333333334, ans=0.1 2024-09-18 07:00:51,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-18 07:00:52,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=22.5 2024-09-18 07:01:21,944 INFO [train.py:1198] (1/2) Epoch 41, batch 200, loss[loss=0.2112, ctc_loss=0.1411, cr_loss=0.3502, over 20703.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3703, over 2581167.99 frames. ], batch size: 68, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:01:25,374 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:01:46,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=724891.3333333334, ans=0.0 2024-09-18 07:01:52,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724919.6666666666, ans=0.0 2024-09-18 07:02:10,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.197e+02 2.300e+02 2.501e+02 4.580e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-18 07:02:21,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=724976.3333333334, ans=0.0 2024-09-18 07:02:40,702 INFO [train.py:1198] (1/2) Epoch 41, batch 250, loss[loss=0.2952, ctc_loss=0.2052, cr_loss=0.4497, over 14116.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3711, over 2922355.40 frames. ], batch size: 150, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:02:49,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-18 07:03:15,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-09-18 07:03:30,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725089.6666666666, ans=0.125 2024-09-18 07:03:56,401 INFO [train.py:1198] (1/2) Epoch 41, batch 300, loss[loss=0.2449, ctc_loss=0.1626, cr_loss=0.4114, over 20713.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3711, over 3183823.63 frames. ], batch size: 68, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:03:57,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2024-09-18 07:03:58,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=725146.3333333334, ans=0.125 2024-09-18 07:04:13,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=725174.6666666666, ans=0.0 2024-09-18 07:04:42,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=725231.3333333334, ans=0.125 2024-09-18 07:04:43,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.181e+02 2.315e+02 2.486e+02 2.972e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 07:05:11,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=725259.6666666666, ans=0.1 2024-09-18 07:05:14,420 INFO [train.py:1198] (1/2) Epoch 41, batch 350, loss[loss=0.2279, ctc_loss=0.1509, cr_loss=0.3847, over 20999.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3704, over 3376218.37 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:05:44,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=725344.6666666666, ans=0.2 2024-09-18 07:05:59,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=725373.0, ans=0.0 2024-09-18 07:06:04,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.86 vs. limit=10.0 2024-09-18 07:06:13,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=725373.0, ans=0.125 2024-09-18 07:06:16,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725401.3333333334, ans=0.1 2024-09-18 07:06:22,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=725401.3333333334, ans=0.0 2024-09-18 07:06:25,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=725401.3333333334, ans=0.125 2024-09-18 07:06:25,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=725401.3333333334, ans=0.0 2024-09-18 07:06:30,968 INFO [train.py:1198] (1/2) Epoch 41, batch 400, loss[loss=0.2242, ctc_loss=0.1482, cr_loss=0.3799, over 21004.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.369, over 3553221.59 frames. ], batch size: 61, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:06:35,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=725429.6666666666, ans=0.025 2024-09-18 07:06:54,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=725458.0, ans=0.0 2024-09-18 07:06:54,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=22.5 2024-09-18 07:07:19,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.240e+02 2.375e+02 2.534e+02 3.924e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 07:07:26,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725514.6666666666, ans=0.1 2024-09-18 07:07:46,900 INFO [train.py:1198] (1/2) Epoch 41, batch 450, loss[loss=0.2409, ctc_loss=0.1599, cr_loss=0.4049, over 20864.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3692, over 3666007.46 frames. ], batch size: 57, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:07:47,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=725571.3333333334, ans=0.09899494936611666 2024-09-18 07:08:08,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725599.6666666666, ans=0.125 2024-09-18 07:08:16,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-18 07:08:17,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725628.0, ans=0.1 2024-09-18 07:08:28,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-18 07:09:05,489 INFO [train.py:1198] (1/2) Epoch 41, batch 500, loss[loss=0.2024, ctc_loss=0.1327, cr_loss=0.3485, over 20991.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.369, over 3775437.03 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:09:05,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=725713.0, ans=0.0 2024-09-18 07:09:51,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725798.0, ans=0.125 2024-09-18 07:09:55,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.198e+02 2.324e+02 2.477e+02 3.326e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 07:09:59,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-09-18 07:10:21,612 INFO [train.py:1198] (1/2) Epoch 41, batch 550, loss[loss=0.1819, ctc_loss=0.119, cr_loss=0.3145, over 21046.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3679, over 3845652.55 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:10:21,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=725854.6666666666, ans=0.0 2024-09-18 07:10:21,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=725854.6666666666, ans=0.0 2024-09-18 07:11:40,827 INFO [train.py:1198] (1/2) Epoch 41, batch 600, loss[loss=0.2416, ctc_loss=0.1586, cr_loss=0.415, over 21013.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3672, over 3908561.95 frames. ], batch size: 63, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:12:08,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=726024.6666666666, ans=0.125 2024-09-18 07:12:30,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.188e+02 2.318e+02 2.434e+02 8.782e+02, threshold=4.636e+02, percent-clipped=1.0 2024-09-18 07:12:34,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-18 07:12:50,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=726109.6666666666, ans=0.2 2024-09-18 07:12:56,069 INFO [train.py:1198] (1/2) Epoch 41, batch 650, loss[loss=0.2538, ctc_loss=0.1677, cr_loss=0.4302, over 21037.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3691, over 3957208.61 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:13:16,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-18 07:13:31,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-18 07:13:40,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2024-09-18 07:14:14,093 INFO [train.py:1198] (1/2) Epoch 41, batch 700, loss[loss=0.2136, ctc_loss=0.1414, cr_loss=0.3608, over 20935.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1449, cr_loss=0.3712, over 3976399.67 frames. ], batch size: 60, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:14:21,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2024-09-18 07:14:44,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=726336.3333333334, ans=0.015 2024-09-18 07:14:59,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726364.6666666666, ans=0.125 2024-09-18 07:15:03,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.249e+02 2.365e+02 2.549e+02 3.781e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 07:15:12,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726393.0, ans=0.0 2024-09-18 07:15:20,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=726393.0, ans=0.125 2024-09-18 07:15:29,418 INFO [train.py:1198] (1/2) Epoch 41, batch 750, loss[loss=0.2028, ctc_loss=0.1315, cr_loss=0.3568, over 20787.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1449, cr_loss=0.3713, over 3990625.65 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:15:34,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=726421.3333333334, ans=0.0 2024-09-18 07:15:43,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=726449.6666666666, ans=0.0 2024-09-18 07:15:52,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=726449.6666666666, ans=0.125 2024-09-18 07:15:57,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726449.6666666666, ans=0.1 2024-09-18 07:16:07,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726478.0, ans=0.0 2024-09-18 07:16:48,270 INFO [train.py:1198] (1/2) Epoch 41, batch 800, loss[loss=0.2341, ctc_loss=0.1541, cr_loss=0.3999, over 20842.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1458, cr_loss=0.3723, over 3997457.43 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:17:08,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=726591.3333333334, ans=0.0 2024-09-18 07:17:14,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-09-18 07:17:37,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.274e+02 2.412e+02 2.552e+02 4.256e+02, threshold=4.824e+02, percent-clipped=0.0 2024-09-18 07:17:53,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=726676.3333333334, ans=0.125 2024-09-18 07:18:03,266 INFO [train.py:1198] (1/2) Epoch 41, batch 850, loss[loss=0.2621, ctc_loss=0.1784, cr_loss=0.4182, over 18490.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3718, over 4018922.27 frames. ], batch size: 108, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:18:51,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=726789.6666666666, ans=0.125 2024-09-18 07:19:01,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=726789.6666666666, ans=0.2 2024-09-18 07:19:18,707 INFO [train.py:1198] (1/2) Epoch 41, batch 900, loss[loss=0.2412, ctc_loss=0.159, cr_loss=0.4106, over 20690.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3716, over 4027193.61 frames. ], batch size: 68, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:19:28,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=726846.3333333334, ans=0.125 2024-09-18 07:19:32,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=726874.6666666666, ans=0.0 2024-09-18 07:19:37,155 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:19:47,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=726874.6666666666, ans=0.025 2024-09-18 07:19:59,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=726903.0, ans=0.0 2024-09-18 07:20:01,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2024-09-18 07:20:11,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.202e+02 2.312e+02 2.459e+02 4.292e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-18 07:20:36,892 INFO [train.py:1198] (1/2) Epoch 41, batch 950, loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.3771, over 19385.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1455, cr_loss=0.3717, over 4040799.14 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:20:43,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=726988.0, ans=0.125 2024-09-18 07:20:59,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=727016.3333333334, ans=0.125 2024-09-18 07:21:13,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=727044.6666666666, ans=0.0 2024-09-18 07:21:34,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=727073.0, ans=0.0 2024-09-18 07:21:52,344 INFO [train.py:1198] (1/2) Epoch 41, batch 1000, loss[loss=0.196, ctc_loss=0.1298, cr_loss=0.331, over 20781.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.371, over 4057314.26 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:22:36,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=727186.3333333334, ans=22.5 2024-09-18 07:22:45,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.216e+02 2.380e+02 2.511e+02 4.521e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 07:23:11,707 INFO [train.py:1198] (1/2) Epoch 41, batch 1050, loss[loss=0.2347, ctc_loss=0.1565, cr_loss=0.3913, over 21055.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3714, over 4065695.09 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:24:12,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2024-09-18 07:24:19,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=727384.6666666666, ans=0.125 2024-09-18 07:24:26,601 INFO [train.py:1198] (1/2) Epoch 41, batch 1100, loss[loss=0.254, ctc_loss=0.173, cr_loss=0.4049, over 13947.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1466, cr_loss=0.374, over 4063535.10 frames. ], batch size: 149, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:25:02,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727469.6666666666, ans=0.125 2024-09-18 07:25:02,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=727469.6666666666, ans=0.125 2024-09-18 07:25:09,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=727469.6666666666, ans=0.125 2024-09-18 07:25:16,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.218e+02 2.397e+02 2.583e+02 4.244e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 07:25:32,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=8.0 2024-09-18 07:25:45,328 INFO [train.py:1198] (1/2) Epoch 41, batch 1150, loss[loss=0.1997, ctc_loss=0.1322, cr_loss=0.3374, over 20833.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1464, cr_loss=0.3733, over 4081622.46 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:25:57,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=727554.6666666666, ans=10.0 2024-09-18 07:26:08,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=727583.0, ans=0.015 2024-09-18 07:26:39,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=727639.6666666666, ans=0.0 2024-09-18 07:27:01,798 INFO [train.py:1198] (1/2) Epoch 41, batch 1200, loss[loss=0.2097, ctc_loss=0.1406, cr_loss=0.3456, over 20865.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1453, cr_loss=0.3718, over 4095153.40 frames. ], batch size: 54, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:27:02,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=22.5 2024-09-18 07:27:56,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.218e+02 2.368e+02 2.540e+02 3.737e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 07:28:16,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727809.6666666666, ans=0.125 2024-09-18 07:28:20,883 INFO [train.py:1198] (1/2) Epoch 41, batch 1250, loss[loss=0.2043, ctc_loss=0.1339, cr_loss=0.3522, over 20949.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3704, over 4105756.35 frames. ], batch size: 60, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:28:42,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=727866.3333333334, ans=0.0 2024-09-18 07:28:44,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=22.5 2024-09-18 07:28:52,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=727894.6666666666, ans=0.2 2024-09-18 07:29:06,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=727923.0, ans=0.0 2024-09-18 07:29:13,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727923.0, ans=0.1 2024-09-18 07:29:35,841 INFO [train.py:1198] (1/2) Epoch 41, batch 1300, loss[loss=0.2294, ctc_loss=0.1521, cr_loss=0.3862, over 20992.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3704, over 4102667.75 frames. ], batch size: 48, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:29:39,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2024-09-18 07:29:47,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-18 07:29:54,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 07:30:27,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.243e+02 2.391e+02 2.591e+02 6.088e+02, threshold=4.783e+02, percent-clipped=1.0 2024-09-18 07:30:30,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=728064.6666666666, ans=0.025 2024-09-18 07:30:51,190 INFO [train.py:1198] (1/2) Epoch 41, batch 1350, loss[loss=0.2152, ctc_loss=0.1395, cr_loss=0.3787, over 21062.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.37, over 4094322.79 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:31:02,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=22.5 2024-09-18 07:31:20,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=728149.6666666666, ans=0.025 2024-09-18 07:31:48,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728206.3333333334, ans=0.1 2024-09-18 07:32:10,537 INFO [train.py:1198] (1/2) Epoch 41, batch 1400, loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.3728, over 20842.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3682, over 4100437.69 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:33:04,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.297e+02 2.441e+02 2.623e+02 8.118e+02, threshold=4.882e+02, percent-clipped=1.0 2024-09-18 07:33:29,613 INFO [train.py:1198] (1/2) Epoch 41, batch 1450, loss[loss=0.19, ctc_loss=0.1234, cr_loss=0.333, over 21004.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3684, over 4105246.82 frames. ], batch size: 48, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:34:12,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=728461.3333333334, ans=0.2 2024-09-18 07:34:27,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=728489.6666666666, ans=0.125 2024-09-18 07:34:45,290 INFO [train.py:1198] (1/2) Epoch 41, batch 1500, loss[loss=0.2469, ctc_loss=0.1634, cr_loss=0.4173, over 21017.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3687, over 4102461.35 frames. ], batch size: 63, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:35:12,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728574.6666666666, ans=0.0 2024-09-18 07:35:28,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2024-09-18 07:35:38,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.219e+02 2.350e+02 2.490e+02 3.492e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 07:36:00,905 INFO [train.py:1198] (1/2) Epoch 41, batch 1550, loss[loss=0.204, ctc_loss=0.133, cr_loss=0.355, over 20888.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3692, over 4112880.53 frames. ], batch size: 57, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:36:34,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=728744.6666666666, ans=0.0 2024-09-18 07:36:46,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=728773.0, ans=0.0 2024-09-18 07:37:02,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=728773.0, ans=0.125 2024-09-18 07:37:08,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-09-18 07:37:12,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=728801.3333333334, ans=0.0 2024-09-18 07:37:19,819 INFO [train.py:1198] (1/2) Epoch 41, batch 1600, loss[loss=0.1927, ctc_loss=0.1259, cr_loss=0.334, over 20991.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3687, over 4116384.39 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:37:20,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=728829.6666666666, ans=0.2 2024-09-18 07:37:37,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2024-09-18 07:37:42,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=728858.0, ans=0.125 2024-09-18 07:37:48,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=728886.3333333334, ans=0.125 2024-09-18 07:38:10,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728914.6666666666, ans=0.0 2024-09-18 07:38:12,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.192e+02 2.282e+02 2.402e+02 2.882e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-18 07:38:35,455 INFO [train.py:1198] (1/2) Epoch 41, batch 1650, loss[loss=0.2364, ctc_loss=0.1571, cr_loss=0.3962, over 20956.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3683, over 4116998.19 frames. ], batch size: 64, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:38:56,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=728999.6666666666, ans=0.0 2024-09-18 07:39:54,782 INFO [train.py:1198] (1/2) Epoch 41, batch 1700, loss[loss=0.2367, ctc_loss=0.1564, cr_loss=0.4017, over 21031.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3675, over 4124478.58 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:39:57,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.36 vs. limit=10.0 2024-09-18 07:39:58,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=729113.0, ans=0.5 2024-09-18 07:40:04,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2024-09-18 07:40:22,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=729141.3333333334, ans=0.0 2024-09-18 07:40:23,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=729169.6666666666, ans=0.125 2024-09-18 07:40:36,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-18 07:40:49,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.205e+02 2.353e+02 2.501e+02 3.279e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 07:41:10,554 INFO [train.py:1198] (1/2) Epoch 41, batch 1750, loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3679, over 20874.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.368, over 4123095.11 frames. ], batch size: 54, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:41:19,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729254.6666666666, ans=0.1 2024-09-18 07:42:01,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=729339.6666666666, ans=0.0 2024-09-18 07:42:10,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=729368.0, ans=0.2 2024-09-18 07:42:23,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=729368.0, ans=0.125 2024-09-18 07:42:26,484 INFO [train.py:1198] (1/2) Epoch 41, batch 1800, loss[loss=0.188, ctc_loss=0.1239, cr_loss=0.3205, over 20773.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3691, over 4123384.08 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:43:01,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-09-18 07:43:25,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.201e+02 2.314e+02 2.529e+02 3.521e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 07:43:39,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729509.6666666666, ans=0.1 2024-09-18 07:43:44,984 INFO [train.py:1198] (1/2) Epoch 41, batch 1850, loss[loss=0.2109, ctc_loss=0.1381, cr_loss=0.3644, over 21072.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3687, over 4115292.22 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 8.0 2024-09-18 07:44:00,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=729566.3333333334, ans=0.125 2024-09-18 07:44:14,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729594.6666666666, ans=0.0 2024-09-18 07:44:23,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-18 07:44:26,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729594.6666666666, ans=0.0 2024-09-18 07:44:29,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=729623.0, ans=0.125 2024-09-18 07:44:57,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=729651.3333333334, ans=0.125 2024-09-18 07:45:00,099 INFO [train.py:1198] (1/2) Epoch 41, batch 1900, loss[loss=0.203, ctc_loss=0.1355, cr_loss=0.3374, over 20951.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3677, over 4117463.30 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 8.0 2024-09-18 07:45:00,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=729679.6666666666, ans=0.125 2024-09-18 07:45:05,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=729679.6666666666, ans=0.0 2024-09-18 07:45:44,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=729736.3333333334, ans=0.125 2024-09-18 07:45:56,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=729764.6666666666, ans=0.1 2024-09-18 07:45:58,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.184e+02 2.286e+02 2.420e+02 2.896e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-18 07:46:03,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=729793.0, ans=0.2 2024-09-18 07:46:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=729793.0, ans=0.125 2024-09-18 07:46:18,570 INFO [train.py:1198] (1/2) Epoch 41, batch 1950, loss[loss=0.2082, ctc_loss=0.1404, cr_loss=0.3387, over 20874.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3683, over 4108217.46 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 8.0 2024-09-18 07:46:19,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=729821.3333333334, ans=0.95 2024-09-18 07:46:38,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-09-18 07:47:07,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=729906.3333333334, ans=0.0 2024-09-18 07:47:18,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=729934.6666666666, ans=0.125 2024-09-18 07:47:19,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=729934.6666666666, ans=0.2 2024-09-18 07:47:21,609 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:47:24,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=729934.6666666666, ans=0.125 2024-09-18 07:47:24,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-18 07:47:34,969 INFO [train.py:1198] (1/2) Epoch 41, batch 2000, loss[loss=0.213, ctc_loss=0.1396, cr_loss=0.367, over 20698.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1437, cr_loss=0.3674, over 4093933.48 frames. ], batch size: 71, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:47:49,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729991.3333333334, ans=0.0 2024-09-18 07:48:16,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=730019.6666666666, ans=0.125 2024-09-18 07:48:31,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.222e+02 2.343e+02 2.508e+02 4.807e+02, threshold=4.686e+02, percent-clipped=1.0 2024-09-18 07:48:53,977 INFO [train.py:1198] (1/2) Epoch 41, batch 2050, loss[loss=0.2322, ctc_loss=0.156, cr_loss=0.3808, over 21080.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1441, cr_loss=0.3676, over 4086429.16 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:49:10,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=730133.0, ans=0.2 2024-09-18 07:49:10,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730133.0, ans=0.125 2024-09-18 07:49:50,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=730189.6666666666, ans=0.0 2024-09-18 07:50:09,036 INFO [train.py:1198] (1/2) Epoch 41, batch 2100, loss[loss=0.2237, ctc_loss=0.1473, cr_loss=0.3822, over 20999.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3693, over 4075182.96 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:51:07,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.196e+02 2.356e+02 2.510e+02 3.523e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 07:51:23,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=730359.6666666666, ans=0.0 2024-09-18 07:51:27,860 INFO [train.py:1198] (1/2) Epoch 41, batch 2150, loss[loss=0.2195, ctc_loss=0.1501, cr_loss=0.3469, over 20071.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.3694, over 4086679.27 frames. ], batch size: 80, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:51:40,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-18 07:51:41,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=730416.3333333334, ans=0.125 2024-09-18 07:52:15,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=730473.0, ans=0.125 2024-09-18 07:52:43,707 INFO [train.py:1198] (1/2) Epoch 41, batch 2200, loss[loss=0.1924, ctc_loss=0.1226, cr_loss=0.3491, over 20945.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3686, over 4100815.61 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:53:10,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=730558.0, ans=0.04949747468305833 2024-09-18 07:53:23,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-18 07:53:23,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-18 07:53:39,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.235e+02 2.379e+02 2.572e+02 3.091e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 07:53:54,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=730643.0, ans=0.125 2024-09-18 07:53:59,092 INFO [train.py:1198] (1/2) Epoch 41, batch 2250, loss[loss=0.2113, ctc_loss=0.1419, cr_loss=0.3466, over 20901.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3688, over 4101845.74 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:54:11,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730671.3333333334, ans=0.125 2024-09-18 07:54:49,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=730756.3333333334, ans=0.125 2024-09-18 07:54:54,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=730756.3333333334, ans=0.025 2024-09-18 07:55:17,671 INFO [train.py:1198] (1/2) Epoch 41, batch 2300, loss[loss=0.2019, ctc_loss=0.1324, cr_loss=0.3473, over 20970.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3688, over 4093743.49 frames. ], batch size: 48, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:55:18,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=730813.0, ans=0.09899494936611666 2024-09-18 07:55:38,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=730841.3333333334, ans=0.125 2024-09-18 07:56:13,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.244e+02 2.374e+02 2.504e+02 4.440e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 07:56:13,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730898.0, ans=0.125 2024-09-18 07:56:15,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=730898.0, ans=0.0 2024-09-18 07:56:32,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=730926.3333333334, ans=0.125 2024-09-18 07:56:36,269 INFO [train.py:1198] (1/2) Epoch 41, batch 2350, loss[loss=0.2089, ctc_loss=0.1369, cr_loss=0.3599, over 20811.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3706, over 4080261.69 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:57:51,938 INFO [train.py:1198] (1/2) Epoch 41, batch 2400, loss[loss=0.2223, ctc_loss=0.1447, cr_loss=0.3882, over 20125.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3696, over 4082372.50 frames. ], batch size: 80, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 07:58:02,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.85 vs. limit=10.0 2024-09-18 07:58:47,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.276e+02 2.402e+02 2.528e+02 4.233e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 07:59:00,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=731209.6666666666, ans=0.0 2024-09-18 07:59:07,552 INFO [train.py:1198] (1/2) Epoch 41, batch 2450, loss[loss=0.2152, ctc_loss=0.1439, cr_loss=0.3564, over 20615.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3708, over 4095255.24 frames. ], batch size: 75, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 07:59:09,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=731238.0, ans=0.0 2024-09-18 07:59:32,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=731266.3333333334, ans=0.125 2024-09-18 07:59:33,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=731266.3333333334, ans=0.0 2024-09-18 07:59:38,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=731294.6666666666, ans=0.125 2024-09-18 07:59:57,680 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:00:03,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=731323.0, ans=0.0 2024-09-18 08:00:20,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=731351.3333333334, ans=0.125 2024-09-18 08:00:25,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=731379.6666666666, ans=0.125 2024-09-18 08:00:25,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=731379.6666666666, ans=0.2 2024-09-18 08:00:26,443 INFO [train.py:1198] (1/2) Epoch 41, batch 2500, loss[loss=0.2321, ctc_loss=0.1536, cr_loss=0.3926, over 20841.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3714, over 4094988.91 frames. ], batch size: 65, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:01:03,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=731436.3333333334, ans=0.0 2024-09-18 08:01:22,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.207e+02 2.324e+02 2.475e+02 4.356e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-18 08:01:42,276 INFO [train.py:1198] (1/2) Epoch 41, batch 2550, loss[loss=0.2381, ctc_loss=0.1608, cr_loss=0.3862, over 20026.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3712, over 4107201.25 frames. ], batch size: 80, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:02:00,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=731549.6666666666, ans=0.07 2024-09-18 08:02:17,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731578.0, ans=0.125 2024-09-18 08:03:00,796 INFO [train.py:1198] (1/2) Epoch 41, batch 2600, loss[loss=0.1955, ctc_loss=0.1289, cr_loss=0.3329, over 20978.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.371, over 4097971.79 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:03:57,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.195e+02 2.353e+02 2.539e+02 4.235e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 08:04:02,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=731776.3333333334, ans=0.0 2024-09-18 08:04:16,638 INFO [train.py:1198] (1/2) Epoch 41, batch 2650, loss[loss=0.2007, ctc_loss=0.1313, cr_loss=0.3469, over 20775.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3698, over 4093509.86 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:04:25,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=731804.6666666666, ans=0.0 2024-09-18 08:04:52,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=731861.3333333334, ans=0.0 2024-09-18 08:05:08,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=731889.6666666666, ans=0.0 2024-09-18 08:05:34,592 INFO [train.py:1198] (1/2) Epoch 41, batch 2700, loss[loss=0.2214, ctc_loss=0.1443, cr_loss=0.3853, over 21062.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3699, over 4079620.40 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:05:59,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731974.6666666666, ans=0.125 2024-09-18 08:06:18,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=732031.3333333334, ans=0.07 2024-09-18 08:06:30,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.229e+02 2.379e+02 2.585e+02 3.513e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 08:06:42,090 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-18 08:06:50,122 INFO [train.py:1198] (1/2) Epoch 41, batch 2750, loss[loss=0.2195, ctc_loss=0.1409, cr_loss=0.3928, over 21025.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.3693, over 4080514.98 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:07:15,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732116.3333333334, ans=0.1 2024-09-18 08:07:17,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=732116.3333333334, ans=0.95 2024-09-18 08:07:30,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=732144.6666666666, ans=0.1 2024-09-18 08:07:31,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732144.6666666666, ans=0.1 2024-09-18 08:07:55,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=732201.3333333334, ans=0.125 2024-09-18 08:07:59,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=732201.3333333334, ans=0.2 2024-09-18 08:08:08,413 INFO [train.py:1198] (1/2) Epoch 41, batch 2800, loss[loss=0.2099, ctc_loss=0.1386, cr_loss=0.3566, over 20831.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3697, over 4088590.93 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:08:23,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732258.0, ans=0.1 2024-09-18 08:08:26,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=732258.0, ans=0.07 2024-09-18 08:08:45,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-18 08:09:04,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.212e+02 2.345e+02 2.470e+02 3.194e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 08:09:17,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.66 vs. limit=12.0 2024-09-18 08:09:24,318 INFO [train.py:1198] (1/2) Epoch 41, batch 2850, loss[loss=0.1959, ctc_loss=0.1278, cr_loss=0.3405, over 20963.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3702, over 4082064.78 frames. ], batch size: 51, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:09:54,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=732428.0, ans=0.5 2024-09-18 08:10:18,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=732456.3333333334, ans=0.0 2024-09-18 08:10:22,749 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:10:40,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-18 08:10:40,674 INFO [train.py:1198] (1/2) Epoch 41, batch 2900, loss[loss=0.2287, ctc_loss=0.1506, cr_loss=0.3906, over 21015.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3693, over 4087427.25 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:10:43,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=732513.0, ans=0.0 2024-09-18 08:11:03,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=732541.3333333334, ans=0.125 2024-09-18 08:11:25,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-09-18 08:11:39,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.249e+02 2.391e+02 2.541e+02 3.899e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-18 08:11:41,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732598.0, ans=0.1 2024-09-18 08:11:59,081 INFO [train.py:1198] (1/2) Epoch 41, batch 2950, loss[loss=0.2549, ctc_loss=0.1798, cr_loss=0.3757, over 14089.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3706, over 4087151.20 frames. ], batch size: 151, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:12:16,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2024-09-18 08:12:49,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=732739.6666666666, ans=0.2 2024-09-18 08:13:03,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732768.0, ans=0.125 2024-09-18 08:13:14,594 INFO [train.py:1198] (1/2) Epoch 41, batch 3000, loss[loss=0.2469, ctc_loss=0.17, cr_loss=0.3843, over 19484.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3706, over 4078388.03 frames. ], batch size: 90, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:13:14,594 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 08:13:34,556 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.04006, ctc_loss=0.04006, cr_loss=1.437e-14, over 944034.00 frames. 2024-09-18 08:13:34,556 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 08:13:49,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-18 08:13:50,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.60 vs. limit=10.0 2024-09-18 08:14:14,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732853.0, ans=0.1 2024-09-18 08:14:17,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-18 08:14:18,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=732881.3333333334, ans=0.025 2024-09-18 08:14:30,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.190e+02 2.363e+02 2.508e+02 3.386e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-18 08:14:33,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=732909.6666666666, ans=0.125 2024-09-18 08:14:36,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=732909.6666666666, ans=0.125 2024-09-18 08:14:44,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732909.6666666666, ans=0.1 2024-09-18 08:14:50,279 INFO [train.py:1198] (1/2) Epoch 41, batch 3050, loss[loss=0.2305, ctc_loss=0.155, cr_loss=0.3778, over 20961.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3705, over 4069058.52 frames. ], batch size: 64, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:15:47,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=733023.0, ans=0.0 2024-09-18 08:16:05,142 INFO [train.py:1198] (1/2) Epoch 41, batch 3100, loss[loss=0.2336, ctc_loss=0.1563, cr_loss=0.3868, over 20942.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3712, over 4078645.97 frames. ], batch size: 60, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:16:55,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=733164.6666666666, ans=0.125 2024-09-18 08:17:04,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.250e+02 2.387e+02 2.580e+02 3.720e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 08:17:06,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-18 08:17:13,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-18 08:17:21,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733193.0, ans=0.125 2024-09-18 08:17:24,147 INFO [train.py:1198] (1/2) Epoch 41, batch 3150, loss[loss=0.2111, ctc_loss=0.1375, cr_loss=0.3684, over 20998.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3719, over 4072002.97 frames. ], batch size: 63, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:17:27,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=733221.3333333334, ans=0.0 2024-09-18 08:18:01,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=733278.0, ans=0.2 2024-09-18 08:18:04,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733278.0, ans=0.125 2024-09-18 08:18:08,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=733306.3333333334, ans=0.04949747468305833 2024-09-18 08:18:27,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=733334.6666666666, ans=0.0 2024-09-18 08:18:27,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733334.6666666666, ans=0.125 2024-09-18 08:18:39,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=733363.0, ans=0.5 2024-09-18 08:18:40,454 INFO [train.py:1198] (1/2) Epoch 41, batch 3200, loss[loss=0.2116, ctc_loss=0.1425, cr_loss=0.3455, over 19329.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3717, over 4079298.73 frames. ], batch size: 90, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:18:40,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=733363.0, ans=0.025 2024-09-18 08:19:12,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=733419.6666666666, ans=10.0 2024-09-18 08:19:17,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=733419.6666666666, ans=0.0 2024-09-18 08:19:39,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.245e+02 2.375e+02 2.572e+02 3.384e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 08:19:54,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=733476.3333333334, ans=0.125 2024-09-18 08:19:54,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733476.3333333334, ans=0.125 2024-09-18 08:19:58,736 INFO [train.py:1198] (1/2) Epoch 41, batch 3250, loss[loss=0.2356, ctc_loss=0.1557, cr_loss=0.3993, over 20829.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1456, cr_loss=0.3718, over 4077893.28 frames. ], batch size: 65, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:20:43,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-09-18 08:20:56,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=733589.6666666666, ans=0.0 2024-09-18 08:21:02,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733618.0, ans=0.125 2024-09-18 08:21:12,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=733646.3333333334, ans=0.2 2024-09-18 08:21:13,793 INFO [train.py:1198] (1/2) Epoch 41, batch 3300, loss[loss=0.2175, ctc_loss=0.1412, cr_loss=0.3817, over 20869.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3708, over 4081921.82 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:21:21,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=733646.3333333334, ans=0.2 2024-09-18 08:21:33,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=12.0 2024-09-18 08:22:05,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=733731.3333333334, ans=0.0 2024-09-18 08:22:08,570 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:22:09,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.252e+02 2.363e+02 2.555e+02 4.123e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 08:22:21,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=733759.6666666666, ans=0.125 2024-09-18 08:22:32,065 INFO [train.py:1198] (1/2) Epoch 41, batch 3350, loss[loss=0.2335, ctc_loss=0.1547, cr_loss=0.3943, over 20834.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3692, over 4098452.37 frames. ], batch size: 65, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:22:57,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=12.0 2024-09-18 08:23:09,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=733844.6666666666, ans=0.0 2024-09-18 08:23:10,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=733844.6666666666, ans=0.02 2024-09-18 08:23:13,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=733844.6666666666, ans=0.125 2024-09-18 08:23:18,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733873.0, ans=0.125 2024-09-18 08:23:30,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=733873.0, ans=0.0 2024-09-18 08:23:38,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=733901.3333333334, ans=0.125 2024-09-18 08:23:42,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-18 08:23:47,530 INFO [train.py:1198] (1/2) Epoch 41, batch 3400, loss[loss=0.2252, ctc_loss=0.1496, cr_loss=0.3782, over 20853.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3702, over 4089254.10 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:24:46,634 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.201e+02 2.323e+02 2.492e+02 4.629e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-18 08:25:06,208 INFO [train.py:1198] (1/2) Epoch 41, batch 3450, loss[loss=0.1672, ctc_loss=0.1053, cr_loss=0.3096, over 20018.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3701, over 4098580.79 frames. ], batch size: 44, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:25:32,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=734099.6666666666, ans=0.125 2024-09-18 08:25:45,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734128.0, ans=0.125 2024-09-18 08:26:18,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-09-18 08:26:21,747 INFO [train.py:1198] (1/2) Epoch 41, batch 3500, loss[loss=0.2072, ctc_loss=0.1348, cr_loss=0.3622, over 20972.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3695, over 4084120.75 frames. ], batch size: 49, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:26:25,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734213.0, ans=0.1 2024-09-18 08:26:30,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.95 vs. limit=5.0 2024-09-18 08:26:53,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2024-09-18 08:27:17,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.242e+02 2.384e+02 2.510e+02 3.270e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-18 08:27:37,446 INFO [train.py:1198] (1/2) Epoch 41, batch 3550, loss[loss=0.2031, ctc_loss=0.1337, cr_loss=0.3469, over 20975.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3708, over 4085418.74 frames. ], batch size: 49, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:27:39,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=734354.6666666666, ans=0.025 2024-09-18 08:27:45,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=734354.6666666666, ans=0.0 2024-09-18 08:28:55,532 INFO [train.py:1198] (1/2) Epoch 41, batch 3600, loss[loss=0.2407, ctc_loss=0.1594, cr_loss=0.4068, over 20962.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3705, over 4080706.61 frames. ], batch size: 64, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:28:58,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-18 08:29:31,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=734553.0, ans=0.035 2024-09-18 08:29:37,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=734553.0, ans=0.0 2024-09-18 08:29:38,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-18 08:29:52,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.227e+02 2.331e+02 2.551e+02 3.365e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-18 08:29:54,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2024-09-18 08:30:10,807 INFO [train.py:1198] (1/2) Epoch 41, batch 3650, loss[loss=0.2095, ctc_loss=0.1395, cr_loss=0.3501, over 20866.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3704, over 4097357.40 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:30:22,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-09-18 08:30:24,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734638.0, ans=0.1 2024-09-18 08:30:33,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734666.3333333334, ans=0.1 2024-09-18 08:30:53,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=734694.6666666666, ans=0.0 2024-09-18 08:30:59,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=734723.0, ans=0.125 2024-09-18 08:31:07,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734723.0, ans=0.1 2024-09-18 08:31:13,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734751.3333333334, ans=0.125 2024-09-18 08:31:20,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=734751.3333333334, ans=0.0 2024-09-18 08:31:27,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-18 08:31:29,457 INFO [train.py:1198] (1/2) Epoch 41, batch 3700, loss[loss=0.2283, ctc_loss=0.1509, cr_loss=0.3871, over 20699.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1445, cr_loss=0.3681, over 4089189.94 frames. ], batch size: 71, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:32:05,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=734836.3333333334, ans=0.125 2024-09-18 08:32:10,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=734836.3333333334, ans=0.07 2024-09-18 08:32:26,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.259e+02 2.355e+02 2.544e+02 4.484e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 08:32:31,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=734893.0, ans=0.125 2024-09-18 08:32:39,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=734893.0, ans=0.0 2024-09-18 08:32:42,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=734893.0, ans=0.125 2024-09-18 08:32:45,209 INFO [train.py:1198] (1/2) Epoch 41, batch 3750, loss[loss=0.2373, ctc_loss=0.1551, cr_loss=0.4109, over 20648.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1446, cr_loss=0.3683, over 4096573.85 frames. ], batch size: 66, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:32:46,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=734921.3333333334, ans=0.125 2024-09-18 08:32:48,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=734921.3333333334, ans=0.125 2024-09-18 08:32:57,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734921.3333333334, ans=0.125 2024-09-18 08:33:15,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=734978.0, ans=0.125 2024-09-18 08:33:23,031 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:34:04,071 INFO [train.py:1198] (1/2) Epoch 41, batch 3800, loss[loss=0.201, ctc_loss=0.1329, cr_loss=0.3409, over 20837.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1435, cr_loss=0.3666, over 4100062.20 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:34:05,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735063.0, ans=0.1 2024-09-18 08:34:18,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2024-09-18 08:34:27,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=735091.3333333334, ans=0.5 2024-09-18 08:34:39,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-18 08:34:51,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=735148.0, ans=0.125 2024-09-18 08:35:01,307 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.236e+02 2.355e+02 2.550e+02 3.423e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 08:35:12,268 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:35:15,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-18 08:35:19,263 INFO [train.py:1198] (1/2) Epoch 41, batch 3850, loss[loss=0.2297, ctc_loss=0.1537, cr_loss=0.3798, over 20677.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3681, over 4076032.27 frames. ], batch size: 71, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:35:42,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-18 08:36:06,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735289.6666666666, ans=0.125 2024-09-18 08:36:11,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735289.6666666666, ans=0.1 2024-09-18 08:36:36,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=735346.3333333334, ans=0.2 2024-09-18 08:36:37,947 INFO [train.py:1198] (1/2) Epoch 41, batch 3900, loss[loss=0.2095, ctc_loss=0.137, cr_loss=0.3625, over 21034.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3695, over 4082468.16 frames. ], batch size: 62, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:37:10,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735403.0, ans=0.1 2024-09-18 08:37:35,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.250e+02 2.371e+02 2.535e+02 8.170e+02, threshold=4.742e+02, percent-clipped=2.0 2024-09-18 08:37:38,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=735459.6666666666, ans=0.025 2024-09-18 08:37:40,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=735459.6666666666, ans=0.125 2024-09-18 08:37:53,734 INFO [train.py:1198] (1/2) Epoch 41, batch 3950, loss[loss=0.232, ctc_loss=0.1517, cr_loss=0.4015, over 21067.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3698, over 4086484.00 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:38:14,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=735516.3333333334, ans=0.0 2024-09-18 08:39:03,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=735601.3333333334, ans=0.0 2024-09-18 08:39:06,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735601.3333333334, ans=0.125 2024-09-18 08:39:11,576 INFO [train.py:1198] (1/2) Epoch 41, batch 4000, loss[loss=0.199, ctc_loss=0.129, cr_loss=0.3498, over 20986.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3688, over 4068829.91 frames. ], batch size: 52, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:39:34,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735658.0, ans=0.1 2024-09-18 08:39:37,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735658.0, ans=0.125 2024-09-18 08:39:38,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-18 08:40:03,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=735714.6666666666, ans=0.0 2024-09-18 08:40:09,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.255e+02 2.383e+02 2.587e+02 3.614e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 08:40:13,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2024-09-18 08:40:22,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=735743.0, ans=0.125 2024-09-18 08:40:22,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-18 08:40:24,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-18 08:40:28,121 INFO [train.py:1198] (1/2) Epoch 41, batch 4050, loss[loss=0.2153, ctc_loss=0.1412, cr_loss=0.3704, over 21043.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3696, over 4078585.04 frames. ], batch size: 62, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:40:37,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735771.3333333334, ans=0.125 2024-09-18 08:40:59,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 08:41:01,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=735828.0, ans=0.125 2024-09-18 08:41:03,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=735828.0, ans=0.2 2024-09-18 08:41:24,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=735856.3333333334, ans=0.0 2024-09-18 08:41:42,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-18 08:41:43,245 INFO [train.py:1198] (1/2) Epoch 41, batch 4100, loss[loss=0.2408, ctc_loss=0.1607, cr_loss=0.4004, over 19379.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.37, over 4073011.72 frames. ], batch size: 90, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:41:57,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-18 08:41:58,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=735913.0, ans=0.0 2024-09-18 08:42:25,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735969.6666666666, ans=0.1 2024-09-18 08:42:33,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=735998.0, ans=0.125 2024-09-18 08:42:43,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.220e+02 2.353e+02 2.500e+02 3.228e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 08:42:54,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-18 08:42:58,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=736026.3333333334, ans=0.04949747468305833 2024-09-18 08:43:01,522 INFO [train.py:1198] (1/2) Epoch 41, batch 4150, loss[loss=0.203, ctc_loss=0.1332, cr_loss=0.3488, over 20997.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3698, over 4079204.22 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:43:06,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=736054.6666666666, ans=0.125 2024-09-18 08:43:44,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-18 08:44:03,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=736168.0, ans=0.125 2024-09-18 08:44:08,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 08:44:17,037 INFO [train.py:1198] (1/2) Epoch 41, batch 4200, loss[loss=0.2056, ctc_loss=0.1353, cr_loss=0.3518, over 21017.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3706, over 4087888.53 frames. ], batch size: 63, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:44:38,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=736224.6666666666, ans=0.0 2024-09-18 08:44:59,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=736253.0, ans=0.0 2024-09-18 08:45:01,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=736253.0, ans=0.125 2024-09-18 08:45:17,478 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.249e+02 2.413e+02 2.618e+02 4.722e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-18 08:45:19,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=736309.6666666666, ans=0.0 2024-09-18 08:45:27,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=736309.6666666666, ans=15.0 2024-09-18 08:45:35,318 INFO [train.py:1198] (1/2) Epoch 41, batch 4250, loss[loss=0.1936, ctc_loss=0.1265, cr_loss=0.3357, over 21059.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3692, over 4090024.10 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:45:51,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-09-18 08:45:55,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=736366.3333333334, ans=0.125 2024-09-18 08:46:40,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=736451.3333333334, ans=0.125 2024-09-18 08:46:42,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736451.3333333334, ans=0.0 2024-09-18 08:46:44,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=736451.3333333334, ans=0.0 2024-09-18 08:46:51,652 INFO [train.py:1198] (1/2) Epoch 41, batch 4300, loss[loss=0.2108, ctc_loss=0.1386, cr_loss=0.3611, over 20814.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3691, over 4094173.37 frames. ], batch size: 59, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:46:59,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-09-18 08:47:45,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=736564.6666666666, ans=0.025 2024-09-18 08:47:51,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.275e+02 2.403e+02 2.606e+02 4.181e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 08:48:09,302 INFO [train.py:1198] (1/2) Epoch 41, batch 4350, loss[loss=0.237, ctc_loss=0.1573, cr_loss=0.3982, over 20976.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3698, over 4100431.06 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:49:22,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736734.6666666666, ans=0.1 2024-09-18 08:49:25,574 INFO [train.py:1198] (1/2) Epoch 41, batch 4400, loss[loss=0.1836, ctc_loss=0.1164, cr_loss=0.336, over 20989.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3702, over 4105728.14 frames. ], batch size: 51, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:49:54,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736819.6666666666, ans=0.1 2024-09-18 08:50:22,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.223e+02 2.365e+02 2.507e+02 2.832e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 08:50:29,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736876.3333333334, ans=0.0 2024-09-18 08:50:43,953 INFO [train.py:1198] (1/2) Epoch 41, batch 4450, loss[loss=0.1965, ctc_loss=0.1325, cr_loss=0.3202, over 20955.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3685, over 4115741.34 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:51:20,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=736961.3333333334, ans=0.125 2024-09-18 08:51:34,329 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:51:56,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737018.0, ans=0.125 2024-09-18 08:51:59,732 INFO [train.py:1198] (1/2) Epoch 41, batch 4500, loss[loss=0.2142, ctc_loss=0.1385, cr_loss=0.3783, over 20889.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3702, over 4100708.90 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:52:03,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=737046.3333333334, ans=0.0 2024-09-18 08:52:13,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=737074.6666666666, ans=0.125 2024-09-18 08:52:19,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=737074.6666666666, ans=0.125 2024-09-18 08:52:30,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-18 08:52:42,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=737103.0, ans=0.025 2024-09-18 08:52:58,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.193e+02 2.308e+02 2.500e+02 3.939e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-18 08:53:07,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=737159.6666666666, ans=0.2 2024-09-18 08:53:17,937 INFO [train.py:1198] (1/2) Epoch 41, batch 4550, loss[loss=0.2181, ctc_loss=0.1429, cr_loss=0.3764, over 20876.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3698, over 4114352.87 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:53:27,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.05 vs. limit=6.0 2024-09-18 08:54:00,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=737244.6666666666, ans=0.0 2024-09-18 08:54:33,840 INFO [train.py:1198] (1/2) Epoch 41, batch 4600, loss[loss=0.2293, ctc_loss=0.1503, cr_loss=0.395, over 20887.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1444, cr_loss=0.3712, over 4123578.43 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:54:51,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737358.0, ans=0.125 2024-09-18 08:55:13,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=737386.3333333334, ans=0.2 2024-09-18 08:55:26,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=737414.6666666666, ans=0.0 2024-09-18 08:55:29,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=12.0 2024-09-18 08:55:33,379 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.274e+02 2.413e+02 2.588e+02 3.283e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-18 08:55:33,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737443.0, ans=0.125 2024-09-18 08:55:41,573 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:55:41,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=737443.0, ans=0.0 2024-09-18 08:55:45,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=737443.0, ans=0.0 2024-09-18 08:55:50,104 INFO [train.py:1198] (1/2) Epoch 41, batch 4650, loss[loss=0.1964, ctc_loss=0.1291, cr_loss=0.3363, over 21043.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3686, over 4102501.61 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:56:10,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737499.6666666666, ans=0.125 2024-09-18 08:56:10,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=737499.6666666666, ans=0.2 2024-09-18 08:56:13,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=737499.6666666666, ans=0.2 2024-09-18 08:56:44,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=737556.3333333334, ans=0.2 2024-09-18 08:56:46,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-18 08:57:08,543 INFO [train.py:1198] (1/2) Epoch 41, batch 4700, loss[loss=0.207, ctc_loss=0.1357, cr_loss=0.3565, over 20881.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3675, over 4103733.45 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:57:10,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=737613.0, ans=22.5 2024-09-18 08:58:07,068 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.188e+02 2.340e+02 2.521e+02 3.459e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 08:58:23,715 INFO [train.py:1198] (1/2) Epoch 41, batch 4750, loss[loss=0.204, ctc_loss=0.1333, cr_loss=0.3534, over 19919.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1438, cr_loss=0.3675, over 4077999.42 frames. ], batch size: 44, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:58:34,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737754.6666666666, ans=0.125 2024-09-18 08:58:56,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=737811.3333333334, ans=0.025 2024-09-18 08:59:01,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=737811.3333333334, ans=0.125 2024-09-18 08:59:04,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-18 08:59:10,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=737839.6666666666, ans=10.0 2024-09-18 08:59:35,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737868.0, ans=0.125 2024-09-18 08:59:35,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=737868.0, ans=0.0 2024-09-18 08:59:36,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=737868.0, ans=0.2 2024-09-18 08:59:39,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=737868.0, ans=0.09899494936611666 2024-09-18 08:59:42,543 INFO [train.py:1198] (1/2) Epoch 41, batch 4800, loss[loss=0.2027, ctc_loss=0.1342, cr_loss=0.3425, over 20966.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1442, cr_loss=0.3675, over 4066845.78 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:00:25,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=22.5 2024-09-18 09:00:31,589 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:00:34,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2024-09-18 09:00:41,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.225e+02 2.346e+02 2.523e+02 3.172e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-18 09:00:48,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-09-18 09:00:58,205 INFO [train.py:1198] (1/2) Epoch 41, batch 4850, loss[loss=0.2446, ctc_loss=0.1664, cr_loss=0.3914, over 18207.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3693, over 4063798.84 frames. ], batch size: 108, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:01:07,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738038.0, ans=0.1 2024-09-18 09:01:34,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=738094.6666666666, ans=0.0 2024-09-18 09:02:06,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=738151.3333333334, ans=0.125 2024-09-18 09:02:15,544 INFO [train.py:1198] (1/2) Epoch 41, batch 4900, loss[loss=0.1856, ctc_loss=0.1203, cr_loss=0.3267, over 20960.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3701, over 4051566.97 frames. ], batch size: 49, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:02:17,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738179.6666666666, ans=0.1 2024-09-18 09:02:48,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=738236.3333333334, ans=0.125 2024-09-18 09:02:50,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738236.3333333334, ans=0.0 2024-09-18 09:02:54,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738236.3333333334, ans=0.125 2024-09-18 09:03:03,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738264.6666666666, ans=0.1 2024-09-18 09:03:08,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2024-09-18 09:03:13,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.274e+02 2.389e+02 2.557e+02 4.192e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-18 09:03:30,174 INFO [train.py:1198] (1/2) Epoch 41, batch 4950, loss[loss=0.2039, ctc_loss=0.137, cr_loss=0.3346, over 21060.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3694, over 4066831.78 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:03:45,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=738349.6666666666, ans=0.0 2024-09-18 09:03:50,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 09:04:44,870 INFO [train.py:1198] (1/2) Epoch 41, batch 5000, loss[loss=0.2477, ctc_loss=0.1657, cr_loss=0.4099, over 18433.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3687, over 4084791.72 frames. ], batch size: 108, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:04:48,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=738463.0, ans=0.125 2024-09-18 09:05:07,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738491.3333333334, ans=0.0 2024-09-18 09:05:28,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=738548.0, ans=0.125 2024-09-18 09:05:42,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.229e+02 2.329e+02 2.482e+02 3.017e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 09:05:53,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-18 09:05:59,071 INFO [train.py:1198] (1/2) Epoch 41, batch 5050, loss[loss=0.2305, ctc_loss=0.1539, cr_loss=0.3829, over 20774.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3693, over 4083645.54 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:06:21,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738633.0, ans=0.0 2024-09-18 09:06:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738633.0, ans=0.125 2024-09-18 09:06:23,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-18 09:07:01,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=738718.0, ans=0.07 2024-09-18 09:07:16,867 INFO [train.py:1198] (1/2) Epoch 41, batch 5100, loss[loss=0.2311, ctc_loss=0.1561, cr_loss=0.3747, over 20327.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3696, over 4090743.38 frames. ], batch size: 74, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:08:11,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738831.3333333334, ans=0.1 2024-09-18 09:08:14,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.236e+02 2.364e+02 2.502e+02 9.245e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-18 09:08:30,730 INFO [train.py:1198] (1/2) Epoch 41, batch 5150, loss[loss=0.2352, ctc_loss=0.1591, cr_loss=0.3801, over 21056.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1449, cr_loss=0.3691, over 4089287.21 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:09:09,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-18 09:09:34,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=739001.3333333334, ans=0.025 2024-09-18 09:09:44,317 INFO [train.py:1198] (1/2) Epoch 41, batch 5200, loss[loss=0.2424, ctc_loss=0.1606, cr_loss=0.4089, over 20638.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3689, over 4095729.66 frames. ], batch size: 68, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:10:23,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=739086.3333333334, ans=0.125 2024-09-18 09:10:33,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=739114.6666666666, ans=0.125 2024-09-18 09:10:42,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.243e+02 2.374e+02 2.484e+02 3.795e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 09:10:58,407 INFO [train.py:1198] (1/2) Epoch 41, batch 5250, loss[loss=0.206, ctc_loss=0.1362, cr_loss=0.3491, over 21003.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3706, over 4080069.00 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:11:18,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-18 09:11:32,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=739228.0, ans=0.0 2024-09-18 09:11:50,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=739256.3333333334, ans=0.125 2024-09-18 09:12:14,735 INFO [train.py:1198] (1/2) Epoch 41, batch 5300, loss[loss=0.2278, ctc_loss=0.1517, cr_loss=0.3806, over 20797.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1459, cr_loss=0.3716, over 4066743.77 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:12:15,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-18 09:12:25,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=739313.0, ans=0.025 2024-09-18 09:13:01,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=12.0 2024-09-18 09:13:07,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-09-18 09:13:09,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=739398.0, ans=0.125 2024-09-18 09:13:12,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.236e+02 2.374e+02 2.566e+02 3.810e+02, threshold=4.747e+02, percent-clipped=0.0 2024-09-18 09:13:13,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-18 09:13:17,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=739426.3333333334, ans=0.5 2024-09-18 09:13:24,611 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:13:28,796 INFO [train.py:1198] (1/2) Epoch 41, batch 5350, loss[loss=0.2272, ctc_loss=0.1502, cr_loss=0.3852, over 21041.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3713, over 4084232.52 frames. ], batch size: 62, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:13:39,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=739454.6666666666, ans=0.125 2024-09-18 09:14:18,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=739539.6666666666, ans=0.125 2024-09-18 09:14:36,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=739568.0, ans=0.05 2024-09-18 09:14:41,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=739568.0, ans=0.0 2024-09-18 09:14:43,635 INFO [train.py:1198] (1/2) Epoch 41, batch 5400, loss[loss=0.2385, ctc_loss=0.1615, cr_loss=0.385, over 20673.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3702, over 4094508.28 frames. ], batch size: 68, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:14:48,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=739596.3333333334, ans=0.125 2024-09-18 09:14:51,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=22.5 2024-09-18 09:15:12,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=739653.0, ans=0.05 2024-09-18 09:15:19,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=739653.0, ans=0.125 2024-09-18 09:15:25,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=739653.0, ans=0.05 2024-09-18 09:15:26,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=739681.3333333334, ans=0.0 2024-09-18 09:15:38,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-09-18 09:15:43,017 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.240e+02 2.378e+02 2.529e+02 4.836e+02, threshold=4.755e+02, percent-clipped=1.0 2024-09-18 09:15:58,166 INFO [train.py:1198] (1/2) Epoch 41, batch 5450, loss[loss=0.2265, ctc_loss=0.1523, cr_loss=0.3706, over 21008.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3708, over 4095446.60 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:16:13,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=739766.3333333334, ans=0.125 2024-09-18 09:16:25,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739766.3333333334, ans=0.1 2024-09-18 09:16:55,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=739823.0, ans=0.04949747468305833 2024-09-18 09:16:57,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=739823.0, ans=0.04949747468305833 2024-09-18 09:17:03,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-18 09:17:14,997 INFO [train.py:1198] (1/2) Epoch 41, batch 5500, loss[loss=0.2493, ctc_loss=0.1695, cr_loss=0.3989, over 20670.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3705, over 4094218.61 frames. ], batch size: 66, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:17:27,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2024-09-18 09:18:14,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.213e+02 2.368e+02 2.525e+02 3.505e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 09:18:29,468 INFO [train.py:1198] (1/2) Epoch 41, batch 5550, loss[loss=0.186, ctc_loss=0.1218, cr_loss=0.3207, over 21011.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3699, over 4103899.83 frames. ], batch size: 52, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:18:52,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.05 vs. limit=10.0 2024-09-18 09:19:35,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=12.0 2024-09-18 09:19:43,451 INFO [train.py:1198] (1/2) Epoch 41, batch 5600, loss[loss=0.2032, ctc_loss=0.132, cr_loss=0.3562, over 20787.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3711, over 4085029.88 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:19:57,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=740191.3333333334, ans=0.0 2024-09-18 09:20:19,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740219.6666666666, ans=0.1 2024-09-18 09:20:32,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=740248.0, ans=0.1 2024-09-18 09:20:45,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.191e+02 2.374e+02 2.584e+02 3.616e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 09:20:50,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740276.3333333334, ans=0.1 2024-09-18 09:20:53,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=740276.3333333334, ans=0.0 2024-09-18 09:20:59,961 INFO [train.py:1198] (1/2) Epoch 41, batch 5650, loss[loss=0.1904, ctc_loss=0.1242, cr_loss=0.3307, over 20991.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3693, over 4084665.67 frames. ], batch size: 48, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:21:02,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-18 09:21:11,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 09:21:16,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-18 09:21:22,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740333.0, ans=0.1 2024-09-18 09:21:28,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=740361.3333333334, ans=15.0 2024-09-18 09:21:52,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=740389.6666666666, ans=0.025 2024-09-18 09:22:14,380 INFO [train.py:1198] (1/2) Epoch 41, batch 5700, loss[loss=0.1975, ctc_loss=0.1285, cr_loss=0.3449, over 21052.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4084468.59 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:22:28,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=740474.6666666666, ans=0.0 2024-09-18 09:22:39,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=740474.6666666666, ans=0.125 2024-09-18 09:23:13,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.191e+02 2.352e+02 2.519e+02 5.380e+02, threshold=4.703e+02, percent-clipped=1.0 2024-09-18 09:23:20,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740559.6666666666, ans=0.1 2024-09-18 09:23:28,660 INFO [train.py:1198] (1/2) Epoch 41, batch 5750, loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3735, over 21020.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1438, cr_loss=0.3677, over 4082065.06 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:24:10,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=740644.6666666666, ans=0.0 2024-09-18 09:24:36,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=740701.3333333334, ans=0.0 2024-09-18 09:24:42,457 INFO [train.py:1198] (1/2) Epoch 41, batch 5800, loss[loss=0.2414, ctc_loss=0.1626, cr_loss=0.3941, over 20624.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3677, over 4091661.82 frames. ], batch size: 66, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:24:44,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=740729.6666666666, ans=0.0 2024-09-18 09:25:01,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=740758.0, ans=0.125 2024-09-18 09:25:06,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-18 09:25:16,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740786.3333333334, ans=0.1 2024-09-18 09:25:18,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=740786.3333333334, ans=0.125 2024-09-18 09:25:44,640 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.230e+02 2.360e+02 2.568e+02 4.519e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 09:25:59,364 INFO [train.py:1198] (1/2) Epoch 41, batch 5850, loss[loss=0.2244, ctc_loss=0.1505, cr_loss=0.3696, over 20638.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1428, cr_loss=0.3665, over 4096787.37 frames. ], batch size: 68, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:26:29,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740928.0, ans=0.0 2024-09-18 09:26:34,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740928.0, ans=0.1 2024-09-18 09:27:13,545 INFO [train.py:1198] (1/2) Epoch 41, batch 5900, loss[loss=0.22, ctc_loss=0.1463, cr_loss=0.3685, over 21035.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1427, cr_loss=0.3662, over 4103121.88 frames. ], batch size: 62, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:27:18,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=741013.0, ans=0.035 2024-09-18 09:27:23,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-18 09:27:34,874 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:28:04,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=741098.0, ans=0.2 2024-09-18 09:28:13,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.246e+02 2.395e+02 2.503e+02 3.331e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 09:28:22,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=741126.3333333334, ans=0.025 2024-09-18 09:28:28,058 INFO [train.py:1198] (1/2) Epoch 41, batch 5950, loss[loss=0.2095, ctc_loss=0.1367, cr_loss=0.364, over 20832.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3677, over 4110217.07 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:28:28,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741154.6666666666, ans=0.125 2024-09-18 09:29:01,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=741211.3333333334, ans=0.025 2024-09-18 09:29:01,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=22.5 2024-09-18 09:29:44,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2024-09-18 09:29:44,854 INFO [train.py:1198] (1/2) Epoch 41, batch 6000, loss[loss=0.231, ctc_loss=0.1537, cr_loss=0.3865, over 20927.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3692, over 4099423.64 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:29:44,855 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 09:30:04,279 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.03985, ctc_loss=0.03985, cr_loss=1.46e-14, over 944034.00 frames. 2024-09-18 09:30:04,279 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 09:30:55,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741381.3333333334, ans=0.1 2024-09-18 09:31:04,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.233e+02 2.355e+02 2.496e+02 4.074e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 09:31:10,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=741409.6666666666, ans=0.0 2024-09-18 09:31:13,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=741409.6666666666, ans=0.0 2024-09-18 09:31:19,445 INFO [train.py:1198] (1/2) Epoch 41, batch 6050, loss[loss=0.2472, ctc_loss=0.1654, cr_loss=0.4092, over 20950.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3689, over 4095345.97 frames. ], batch size: 64, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:31:19,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=741438.0, ans=0.125 2024-09-18 09:31:51,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741494.6666666666, ans=0.1 2024-09-18 09:31:52,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741494.6666666666, ans=0.1 2024-09-18 09:32:09,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=741523.0, ans=0.125 2024-09-18 09:32:34,421 INFO [train.py:1198] (1/2) Epoch 41, batch 6100, loss[loss=0.2448, ctc_loss=0.1632, cr_loss=0.408, over 21011.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3686, over 4085963.21 frames. ], batch size: 63, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:32:36,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-18 09:32:43,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=741579.6666666666, ans=0.0 2024-09-18 09:32:59,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 09:33:10,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=741636.3333333334, ans=0.0 2024-09-18 09:33:17,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=741636.3333333334, ans=0.125 2024-09-18 09:33:21,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741664.6666666666, ans=0.125 2024-09-18 09:33:23,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=741664.6666666666, ans=0.125 2024-09-18 09:33:24,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=741664.6666666666, ans=0.0 2024-09-18 09:33:34,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.242e+02 2.359e+02 2.548e+02 5.006e+02, threshold=4.719e+02, percent-clipped=1.0 2024-09-18 09:33:38,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741693.0, ans=0.125 2024-09-18 09:33:47,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=741693.0, ans=0.09899494936611666 2024-09-18 09:33:49,871 INFO [train.py:1198] (1/2) Epoch 41, batch 6150, loss[loss=0.1724, ctc_loss=0.1115, cr_loss=0.3044, over 20966.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.368, over 4080266.08 frames. ], batch size: 52, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:33:59,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=741721.3333333334, ans=0.0 2024-09-18 09:34:12,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=741749.6666666666, ans=0.0 2024-09-18 09:34:14,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 09:34:24,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=741778.0, ans=0.025 2024-09-18 09:34:29,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 09:34:33,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=741806.3333333334, ans=0.035 2024-09-18 09:34:47,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741806.3333333334, ans=0.1 2024-09-18 09:34:47,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.59 vs. limit=10.0 2024-09-18 09:34:48,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741834.6666666666, ans=0.125 2024-09-18 09:34:56,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-18 09:35:04,551 INFO [train.py:1198] (1/2) Epoch 41, batch 6200, loss[loss=0.233, ctc_loss=0.1518, cr_loss=0.406, over 20681.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3688, over 4074371.35 frames. ], batch size: 71, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:35:13,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=741863.0, ans=0.125 2024-09-18 09:36:03,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.224e+02 2.396e+02 2.713e+02 4.642e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 09:36:07,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741976.3333333334, ans=0.1 2024-09-18 09:36:10,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=741976.3333333334, ans=0.2 2024-09-18 09:36:14,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=741976.3333333334, ans=0.125 2024-09-18 09:36:18,817 INFO [train.py:1198] (1/2) Epoch 41, batch 6250, loss[loss=0.1896, ctc_loss=0.1234, cr_loss=0.3309, over 20807.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3707, over 4052244.04 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:36:20,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=742004.6666666666, ans=0.125 2024-09-18 09:36:28,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2024-09-18 09:36:33,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=742033.0, ans=0.0 2024-09-18 09:36:51,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=742061.3333333334, ans=0.2 2024-09-18 09:36:59,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=742061.3333333334, ans=0.2 2024-09-18 09:37:03,334 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-18 09:37:34,201 INFO [train.py:1198] (1/2) Epoch 41, batch 6300, loss[loss=0.2539, ctc_loss=0.1802, cr_loss=0.3687, over 14244.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.3692, over 4014595.59 frames. ], batch size: 149, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:37:55,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-18 09:38:31,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.330e+02 2.542e+02 2.772e+02 5.785e+02, threshold=5.083e+02, percent-clipped=1.0 2024-09-18 09:38:34,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=742259.6666666666, ans=0.125 2024-09-18 09:38:37,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=12.0 2024-09-18 09:38:43,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742259.6666666666, ans=0.1 2024-09-18 09:38:45,558 INFO [train.py:1198] (1/2) Epoch 41, batch 6350, loss[loss=0.2771, ctc_loss=0.1941, cr_loss=0.4153, over 14187.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1489, cr_loss=0.3721, over 3877809.26 frames. ], batch size: 149, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:39:16,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-18 09:40:34,173 INFO [train.py:1198] (1/2) Epoch 42, batch 0, loss[loss=0.2269, ctc_loss=0.1492, cr_loss=0.388, over 21021.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1492, cr_loss=0.388, over 21021.00 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:40:34,174 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 09:40:52,721 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.03936, ctc_loss=0.03936, cr_loss=1.479e-14, over 944034.00 frames. 2024-09-18 09:40:52,722 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 09:40:56,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=742404.1666666666, ans=0.125 2024-09-18 09:41:02,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=742404.1666666666, ans=0.0 2024-09-18 09:41:08,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-09-18 09:41:12,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=742432.5, ans=0.125 2024-09-18 09:41:14,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742432.5, ans=0.0 2024-09-18 09:41:23,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=742460.8333333334, ans=0.5 2024-09-18 09:41:34,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=742460.8333333334, ans=0.2 2024-09-18 09:42:06,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.356e+02 2.604e+02 2.851e+02 3.491e+02, threshold=5.207e+02, percent-clipped=0.0 2024-09-18 09:42:08,086 INFO [train.py:1198] (1/2) Epoch 42, batch 50, loss[loss=0.2017, ctc_loss=0.1332, cr_loss=0.3427, over 20882.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3687, over 909869.01 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:42:14,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=742545.8333333334, ans=0.0 2024-09-18 09:42:22,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=742574.1666666666, ans=0.0 2024-09-18 09:42:55,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=742630.8333333334, ans=0.0 2024-09-18 09:43:13,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-18 09:43:23,794 INFO [train.py:1198] (1/2) Epoch 42, batch 100, loss[loss=0.2104, ctc_loss=0.1366, cr_loss=0.3691, over 20783.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1439, cr_loss=0.3669, over 1619014.21 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:44:07,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=742772.5, ans=0.125 2024-09-18 09:44:23,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=742800.8333333334, ans=0.2 2024-09-18 09:44:37,277 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.184e+02 2.334e+02 2.514e+02 3.309e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 09:44:38,767 INFO [train.py:1198] (1/2) Epoch 42, batch 150, loss[loss=0.239, ctc_loss=0.1565, cr_loss=0.4121, over 20945.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3684, over 2180916.04 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:45:02,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=742857.5, ans=0.125 2024-09-18 09:45:07,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742857.5, ans=0.0 2024-09-18 09:45:15,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-18 09:45:17,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=742885.8333333334, ans=0.125 2024-09-18 09:45:22,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=742885.8333333334, ans=0.125 2024-09-18 09:45:36,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=742914.1666666666, ans=0.0 2024-09-18 09:45:40,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-18 09:45:44,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=742942.5, ans=0.125 2024-09-18 09:45:59,225 INFO [train.py:1198] (1/2) Epoch 42, batch 200, loss[loss=0.2449, ctc_loss=0.1633, cr_loss=0.408, over 20631.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.37, over 2597067.25 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:46:17,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=742999.1666666666, ans=0.125 2024-09-18 09:46:58,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=743084.1666666666, ans=0.0 2024-09-18 09:46:59,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=743084.1666666666, ans=0.125 2024-09-18 09:47:13,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.237e+02 2.378e+02 2.544e+02 3.361e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 09:47:14,503 INFO [train.py:1198] (1/2) Epoch 42, batch 250, loss[loss=0.2361, ctc_loss=0.159, cr_loss=0.3856, over 21085.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3679, over 2930640.98 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:47:48,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=743169.1666666666, ans=0.2 2024-09-18 09:47:55,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743169.1666666666, ans=0.1 2024-09-18 09:48:30,788 INFO [train.py:1198] (1/2) Epoch 42, batch 300, loss[loss=0.1818, ctc_loss=0.1182, cr_loss=0.3182, over 20975.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3679, over 3193026.68 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:48:31,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743254.1666666666, ans=0.1 2024-09-18 09:48:31,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743254.1666666666, ans=0.125 2024-09-18 09:48:49,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=743282.5, ans=10.0 2024-09-18 09:48:52,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2024-09-18 09:49:44,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.207e+02 2.319e+02 2.513e+02 3.537e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 09:49:46,034 INFO [train.py:1198] (1/2) Epoch 42, batch 350, loss[loss=0.1901, ctc_loss=0.1249, cr_loss=0.3261, over 21076.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.3687, over 3398289.40 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:49:51,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-09-18 09:50:01,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743424.1666666666, ans=0.1 2024-09-18 09:50:03,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-18 09:50:10,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=743424.1666666666, ans=0.125 2024-09-18 09:50:19,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=743452.5, ans=0.0 2024-09-18 09:50:59,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-18 09:51:04,994 INFO [train.py:1198] (1/2) Epoch 42, batch 400, loss[loss=0.2132, ctc_loss=0.1379, cr_loss=0.3767, over 20886.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3691, over 3554510.96 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:51:21,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=743565.8333333334, ans=0.125 2024-09-18 09:51:43,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=743594.1666666666, ans=0.0 2024-09-18 09:51:44,099 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:52:15,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=743650.8333333334, ans=0.125 2024-09-18 09:52:22,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.219e+02 2.344e+02 2.535e+02 3.383e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 09:52:24,437 INFO [train.py:1198] (1/2) Epoch 42, batch 450, loss[loss=0.2129, ctc_loss=0.1409, cr_loss=0.3601, over 21088.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3686, over 3667430.18 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:52:39,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=743707.5, ans=0.125 2024-09-18 09:53:13,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743764.1666666666, ans=0.1 2024-09-18 09:53:36,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=743792.5, ans=0.125 2024-09-18 09:53:40,830 INFO [train.py:1198] (1/2) Epoch 42, batch 500, loss[loss=0.1809, ctc_loss=0.1194, cr_loss=0.3072, over 20956.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.368, over 3769843.46 frames. ], batch size: 52, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:54:07,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743849.1666666666, ans=0.125 2024-09-18 09:54:15,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743877.5, ans=0.1 2024-09-18 09:54:19,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-18 09:54:29,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=743905.8333333334, ans=0.035 2024-09-18 09:54:31,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=743905.8333333334, ans=0.0 2024-09-18 09:54:32,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=743905.8333333334, ans=0.125 2024-09-18 09:54:37,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=743905.8333333334, ans=0.07 2024-09-18 09:54:55,133 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.194e+02 2.364e+02 2.460e+02 3.175e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 09:54:56,717 INFO [train.py:1198] (1/2) Epoch 42, batch 550, loss[loss=0.2293, ctc_loss=0.1518, cr_loss=0.3874, over 20661.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3685, over 3841479.99 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:55:25,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=744019.1666666666, ans=0.125 2024-09-18 09:55:54,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=744047.5, ans=10.0 2024-09-18 09:56:15,184 INFO [train.py:1198] (1/2) Epoch 42, batch 600, loss[loss=0.225, ctc_loss=0.1513, cr_loss=0.3682, over 20644.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3695, over 3909516.90 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:56:30,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=744132.5, ans=0.05 2024-09-18 09:57:07,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=744189.1666666666, ans=0.0 2024-09-18 09:57:18,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744217.5, ans=0.1 2024-09-18 09:57:24,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 09:57:28,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=744217.5, ans=0.5 2024-09-18 09:57:30,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=744217.5, ans=0.0 2024-09-18 09:57:32,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.188e+02 2.304e+02 2.442e+02 3.148e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-18 09:57:33,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2024-09-18 09:57:34,363 INFO [train.py:1198] (1/2) Epoch 42, batch 650, loss[loss=0.2185, ctc_loss=0.1462, cr_loss=0.3614, over 20769.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3702, over 3942815.25 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:58:08,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=744302.5, ans=0.125 2024-09-18 09:58:15,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=744302.5, ans=0.2 2024-09-18 09:58:40,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744359.1666666666, ans=0.1 2024-09-18 09:58:50,074 INFO [train.py:1198] (1/2) Epoch 42, batch 700, loss[loss=0.1991, ctc_loss=0.1305, cr_loss=0.343, over 20779.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3686, over 3980243.21 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:58:51,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744387.5, ans=0.125 2024-09-18 09:59:00,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=744387.5, ans=0.125 2024-09-18 09:59:06,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-18 09:59:57,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744500.8333333334, ans=0.125 2024-09-18 10:00:04,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.223e+02 2.320e+02 2.480e+02 3.406e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-18 10:00:06,221 INFO [train.py:1198] (1/2) Epoch 42, batch 750, loss[loss=0.2285, ctc_loss=0.1529, cr_loss=0.3781, over 20935.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3705, over 3993950.49 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:00:06,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=744529.1666666666, ans=0.125 2024-09-18 10:00:15,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=744529.1666666666, ans=0.125 2024-09-18 10:00:32,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744557.5, ans=0.1 2024-09-18 10:00:53,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=744614.1666666666, ans=10.0 2024-09-18 10:01:11,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2024-09-18 10:01:21,070 INFO [train.py:1198] (1/2) Epoch 42, batch 800, loss[loss=0.2337, ctc_loss=0.1599, cr_loss=0.3691, over 19339.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3707, over 4020772.32 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:01:29,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=744670.8333333334, ans=0.125 2024-09-18 10:02:05,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744727.5, ans=0.1 2024-09-18 10:02:22,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=12.0 2024-09-18 10:02:30,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=744784.1666666666, ans=0.025 2024-09-18 10:02:31,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-18 10:02:37,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.174e+02 2.338e+02 2.463e+02 3.185e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-18 10:02:39,474 INFO [train.py:1198] (1/2) Epoch 42, batch 850, loss[loss=0.1893, ctc_loss=0.1237, cr_loss=0.3281, over 20933.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3699, over 4047933.43 frames. ], batch size: 49, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:02:45,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=744812.5, ans=0.0 2024-09-18 10:02:47,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=12.0 2024-09-18 10:03:02,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=744840.8333333334, ans=0.0 2024-09-18 10:03:05,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-18 10:03:48,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=744925.8333333334, ans=0.2 2024-09-18 10:03:57,394 INFO [train.py:1198] (1/2) Epoch 42, batch 900, loss[loss=0.2377, ctc_loss=0.1581, cr_loss=0.398, over 20654.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3697, over 4045742.27 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:04:14,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=744982.5, ans=0.0 2024-09-18 10:05:05,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=745067.5, ans=0.04949747468305833 2024-09-18 10:05:11,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.215e+02 2.380e+02 2.533e+02 5.796e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-18 10:05:13,012 INFO [train.py:1198] (1/2) Epoch 42, batch 950, loss[loss=0.2367, ctc_loss=0.1567, cr_loss=0.4002, over 20842.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3692, over 4046810.40 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:05:23,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=745095.8333333334, ans=0.0 2024-09-18 10:05:30,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745124.1666666666, ans=0.1 2024-09-18 10:05:37,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=745124.1666666666, ans=0.0 2024-09-18 10:06:04,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=745180.8333333334, ans=0.0 2024-09-18 10:06:23,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-09-18 10:06:28,476 INFO [train.py:1198] (1/2) Epoch 42, batch 1000, loss[loss=0.2274, ctc_loss=0.1502, cr_loss=0.3858, over 20641.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3688, over 4062097.78 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:06:31,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=745237.5, ans=0.2 2024-09-18 10:07:14,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=745322.5, ans=0.0 2024-09-18 10:07:18,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=745322.5, ans=0.2 2024-09-18 10:07:44,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.247e+02 2.357e+02 2.504e+02 3.558e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 10:07:46,304 INFO [train.py:1198] (1/2) Epoch 42, batch 1050, loss[loss=0.2357, ctc_loss=0.1646, cr_loss=0.3556, over 18048.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3684, over 4069378.61 frames. ], batch size: 108, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:08:01,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=745407.5, ans=0.125 2024-09-18 10:08:15,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=745435.8333333334, ans=0.125 2024-09-18 10:08:40,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=745464.1666666666, ans=0.025 2024-09-18 10:08:55,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-18 10:09:00,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745492.5, ans=0.1 2024-09-18 10:09:04,863 INFO [train.py:1198] (1/2) Epoch 42, batch 1100, loss[loss=0.1884, ctc_loss=0.1224, cr_loss=0.3301, over 20962.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3682, over 4076691.21 frames. ], batch size: 49, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:09:05,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=22.5 2024-09-18 10:09:30,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=745549.1666666666, ans=0.125 2024-09-18 10:09:34,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=745577.5, ans=0.2 2024-09-18 10:09:39,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=745577.5, ans=0.2 2024-09-18 10:10:08,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=745634.1666666666, ans=0.2 2024-09-18 10:10:20,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.218e+02 2.350e+02 2.561e+02 3.606e+02, threshold=4.699e+02, percent-clipped=0.0 2024-09-18 10:10:20,256 INFO [train.py:1198] (1/2) Epoch 42, batch 1150, loss[loss=0.2147, ctc_loss=0.139, cr_loss=0.3784, over 21068.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3676, over 4090719.03 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:10:22,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745662.5, ans=0.1 2024-09-18 10:10:26,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745662.5, ans=0.1 2024-09-18 10:10:34,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-18 10:10:37,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=745690.8333333334, ans=0.025 2024-09-18 10:11:35,742 INFO [train.py:1198] (1/2) Epoch 42, batch 1200, loss[loss=0.1904, ctc_loss=0.1251, cr_loss=0.3262, over 20960.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3672, over 4080717.10 frames. ], batch size: 49, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:11:48,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2024-09-18 10:11:59,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745832.5, ans=0.0 2024-09-18 10:12:02,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2024-09-18 10:12:39,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=745917.5, ans=0.5 2024-09-18 10:12:51,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.269e+02 2.420e+02 2.563e+02 3.088e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-18 10:12:51,533 INFO [train.py:1198] (1/2) Epoch 42, batch 1250, loss[loss=0.2445, ctc_loss=0.1637, cr_loss=0.4039, over 20972.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3681, over 4088367.07 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:14:02,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746059.1666666666, ans=0.1 2024-09-18 10:14:02,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=746059.1666666666, ans=0.0 2024-09-18 10:14:13,315 INFO [train.py:1198] (1/2) Epoch 42, batch 1300, loss[loss=0.223, ctc_loss=0.1497, cr_loss=0.3667, over 20652.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3677, over 4092612.69 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:14:18,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=746087.5, ans=0.125 2024-09-18 10:14:25,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=746087.5, ans=0.125 2024-09-18 10:14:48,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746144.1666666666, ans=0.1 2024-09-18 10:15:00,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746172.5, ans=0.1 2024-09-18 10:15:11,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-18 10:15:16,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2024-09-18 10:15:18,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=746200.8333333334, ans=0.0 2024-09-18 10:15:28,709 INFO [train.py:1198] (1/2) Epoch 42, batch 1350, loss[loss=0.2179, ctc_loss=0.1428, cr_loss=0.3755, over 20778.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3684, over 4099016.83 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:15:30,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.222e+02 2.363e+02 2.515e+02 4.459e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-18 10:15:45,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746257.5, ans=0.1 2024-09-18 10:16:14,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=746314.1666666666, ans=0.04949747468305833 2024-09-18 10:16:21,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=746314.1666666666, ans=0.125 2024-09-18 10:16:33,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746342.5, ans=0.1 2024-09-18 10:16:40,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=746342.5, ans=0.0 2024-09-18 10:16:45,035 INFO [train.py:1198] (1/2) Epoch 42, batch 1400, loss[loss=0.265, ctc_loss=0.1878, cr_loss=0.3858, over 14285.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3689, over 4094845.06 frames. ], batch size: 149, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:17:03,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=746399.1666666666, ans=0.04949747468305833 2024-09-18 10:17:05,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746399.1666666666, ans=0.125 2024-09-18 10:17:05,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746399.1666666666, ans=0.125 2024-09-18 10:17:30,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746455.8333333334, ans=0.1 2024-09-18 10:17:36,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=746455.8333333334, ans=0.0 2024-09-18 10:17:44,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=22.5 2024-09-18 10:17:47,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-18 10:18:00,747 INFO [train.py:1198] (1/2) Epoch 42, batch 1450, loss[loss=0.2463, ctc_loss=0.1617, cr_loss=0.4231, over 20717.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3689, over 4103727.48 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:18:02,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.239e+02 2.394e+02 2.518e+02 6.410e+02, threshold=4.788e+02, percent-clipped=1.0 2024-09-18 10:18:07,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746512.5, ans=0.125 2024-09-18 10:19:16,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746625.8333333334, ans=0.1 2024-09-18 10:19:19,598 INFO [train.py:1198] (1/2) Epoch 42, batch 1500, loss[loss=0.2234, ctc_loss=0.1491, cr_loss=0.3718, over 20799.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3681, over 4106588.46 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:20:34,922 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:20:37,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=746795.8333333334, ans=0.0 2024-09-18 10:20:38,997 INFO [train.py:1198] (1/2) Epoch 42, batch 1550, loss[loss=0.2107, ctc_loss=0.1406, cr_loss=0.3504, over 20959.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3679, over 4104815.20 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:20:40,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.218e+02 2.353e+02 2.529e+02 5.470e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-18 10:20:49,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=746795.8333333334, ans=0.0 2024-09-18 10:21:16,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2024-09-18 10:21:29,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746880.8333333334, ans=0.1 2024-09-18 10:21:55,015 INFO [train.py:1198] (1/2) Epoch 42, batch 1600, loss[loss=0.1906, ctc_loss=0.1214, cr_loss=0.346, over 20924.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3671, over 4110026.76 frames. ], batch size: 49, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:22:18,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746965.8333333334, ans=0.1 2024-09-18 10:22:22,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=746965.8333333334, ans=0.2 2024-09-18 10:23:02,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=747050.8333333334, ans=0.125 2024-09-18 10:23:03,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=22.5 2024-09-18 10:23:10,078 INFO [train.py:1198] (1/2) Epoch 42, batch 1650, loss[loss=0.2406, ctc_loss=0.1612, cr_loss=0.3974, over 20972.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3675, over 4103438.49 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:23:11,622 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.220e+02 2.344e+02 2.507e+02 5.464e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-18 10:23:16,453 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:23:32,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=747107.5, ans=0.0 2024-09-18 10:23:46,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=747135.8333333334, ans=0.2 2024-09-18 10:24:25,484 INFO [train.py:1198] (1/2) Epoch 42, batch 1700, loss[loss=0.1813, ctc_loss=0.116, cr_loss=0.3265, over 20965.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3681, over 4103704.37 frames. ], batch size: 48, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:24:53,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747249.1666666666, ans=0.1 2024-09-18 10:25:22,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=747305.8333333334, ans=0.0 2024-09-18 10:25:47,578 INFO [train.py:1198] (1/2) Epoch 42, batch 1750, loss[loss=0.2327, ctc_loss=0.1537, cr_loss=0.3951, over 20666.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1433, cr_loss=0.3673, over 4105177.93 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:25:50,641 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.217e+02 2.340e+02 2.503e+02 9.604e+02, threshold=4.680e+02, percent-clipped=1.0 2024-09-18 10:25:54,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=747362.5, ans=0.2 2024-09-18 10:26:34,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-09-18 10:26:41,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=747447.5, ans=0.0 2024-09-18 10:27:03,948 INFO [train.py:1198] (1/2) Epoch 42, batch 1800, loss[loss=0.2517, ctc_loss=0.1714, cr_loss=0.4015, over 20663.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.3681, over 4095498.01 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:27:07,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-18 10:27:15,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747504.1666666666, ans=0.1 2024-09-18 10:27:19,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=747532.5, ans=0.0 2024-09-18 10:27:33,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=747560.8333333334, ans=0.0 2024-09-18 10:27:43,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=747560.8333333334, ans=0.0 2024-09-18 10:28:12,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=747617.5, ans=0.125 2024-09-18 10:28:19,998 INFO [train.py:1198] (1/2) Epoch 42, batch 1850, loss[loss=0.2409, ctc_loss=0.1596, cr_loss=0.4062, over 20837.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3681, over 4099009.30 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:28:23,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.198e+02 2.365e+02 2.541e+02 3.359e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 10:28:27,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=747645.8333333334, ans=15.0 2024-09-18 10:28:29,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747645.8333333334, ans=0.1 2024-09-18 10:28:34,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747674.1666666666, ans=0.125 2024-09-18 10:29:29,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-18 10:29:33,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=747759.1666666666, ans=0.125 2024-09-18 10:29:36,350 INFO [train.py:1198] (1/2) Epoch 42, batch 1900, loss[loss=0.252, ctc_loss=0.168, cr_loss=0.4202, over 20992.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3692, over 4109089.46 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:29:36,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747787.5, ans=0.125 2024-09-18 10:29:41,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:29:45,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=747787.5, ans=0.125 2024-09-18 10:29:50,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747815.8333333334, ans=0.1 2024-09-18 10:30:20,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=747844.1666666666, ans=0.07 2024-09-18 10:30:55,058 INFO [train.py:1198] (1/2) Epoch 42, batch 1950, loss[loss=0.1892, ctc_loss=0.1231, cr_loss=0.3306, over 20999.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3685, over 4106782.51 frames. ], batch size: 52, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:30:58,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.228e+02 2.376e+02 2.477e+02 7.835e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 10:31:04,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=747929.1666666666, ans=0.0 2024-09-18 10:31:14,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=747957.5, ans=0.2 2024-09-18 10:31:15,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=747957.5, ans=0.125 2024-09-18 10:32:15,418 INFO [train.py:1198] (1/2) Epoch 42, batch 2000, loss[loss=0.207, ctc_loss=0.1336, cr_loss=0.3671, over 21006.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3677, over 4112149.24 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:32:49,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=22.5 2024-09-18 10:33:19,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748184.1666666666, ans=0.125 2024-09-18 10:33:30,868 INFO [train.py:1198] (1/2) Epoch 42, batch 2050, loss[loss=0.2178, ctc_loss=0.1431, cr_loss=0.3739, over 20780.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3687, over 4119603.95 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:33:33,831 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.267e+02 2.395e+02 2.497e+02 3.540e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-18 10:34:39,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=748325.8333333334, ans=0.0 2024-09-18 10:34:46,600 INFO [train.py:1198] (1/2) Epoch 42, batch 2100, loss[loss=0.2083, ctc_loss=0.1354, cr_loss=0.3649, over 20963.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3705, over 4104278.01 frames. ], batch size: 49, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:35:08,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748382.5, ans=0.1 2024-09-18 10:35:23,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748410.8333333334, ans=0.1 2024-09-18 10:35:24,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-18 10:35:36,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=748439.1666666666, ans=0.0 2024-09-18 10:35:44,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=748439.1666666666, ans=0.125 2024-09-18 10:35:44,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=748439.1666666666, ans=0.0 2024-09-18 10:36:02,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748467.5, ans=0.125 2024-09-18 10:36:05,146 INFO [train.py:1198] (1/2) Epoch 42, batch 2150, loss[loss=0.2318, ctc_loss=0.1517, cr_loss=0.4004, over 20976.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3691, over 4078585.01 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:36:09,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.228e+02 2.319e+02 2.504e+02 3.288e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 10:36:21,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748524.1666666666, ans=0.0 2024-09-18 10:36:28,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2024-09-18 10:36:50,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=748552.5, ans=0.125 2024-09-18 10:37:19,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=748609.1666666666, ans=0.1 2024-09-18 10:37:23,747 INFO [train.py:1198] (1/2) Epoch 42, batch 2200, loss[loss=0.2112, ctc_loss=0.1412, cr_loss=0.3499, over 20768.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3694, over 4077790.80 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:37:30,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=748637.5, ans=0.2 2024-09-18 10:38:33,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-18 10:38:39,177 INFO [train.py:1198] (1/2) Epoch 42, batch 2250, loss[loss=0.2284, ctc_loss=0.1499, cr_loss=0.3924, over 20839.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1443, cr_loss=0.37, over 4082203.95 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:38:42,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748779.1666666666, ans=0.125 2024-09-18 10:38:43,830 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.301e+02 2.453e+02 2.611e+02 4.533e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-18 10:38:47,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=748779.1666666666, ans=0.125 2024-09-18 10:39:02,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=748807.5, ans=0.025 2024-09-18 10:39:03,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=748807.5, ans=0.0 2024-09-18 10:39:55,009 INFO [train.py:1198] (1/2) Epoch 42, batch 2300, loss[loss=0.2635, ctc_loss=0.1797, cr_loss=0.4191, over 18083.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3694, over 4094052.63 frames. ], batch size: 108, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:39:55,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=748920.8333333334, ans=0.125 2024-09-18 10:40:27,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=748977.5, ans=0.0 2024-09-18 10:41:10,792 INFO [train.py:1198] (1/2) Epoch 42, batch 2350, loss[loss=0.1925, ctc_loss=0.1241, cr_loss=0.3422, over 20953.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3707, over 4092178.70 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:41:15,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.218e+02 2.333e+02 2.492e+02 6.561e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 10:41:24,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=749090.8333333334, ans=0.125 2024-09-18 10:41:40,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-18 10:41:57,728 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:42:13,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=749175.8333333334, ans=0.125 2024-09-18 10:42:17,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=749175.8333333334, ans=0.07 2024-09-18 10:42:24,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=749175.8333333334, ans=0.125 2024-09-18 10:42:26,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-18 10:42:28,944 INFO [train.py:1198] (1/2) Epoch 42, batch 2400, loss[loss=0.2344, ctc_loss=0.155, cr_loss=0.3969, over 20305.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3718, over 4082721.28 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:42:30,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=749204.1666666666, ans=0.09899494936611666 2024-09-18 10:42:47,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749232.5, ans=0.125 2024-09-18 10:42:48,998 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:42:53,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=749232.5, ans=0.125 2024-09-18 10:42:59,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=749232.5, ans=0.125 2024-09-18 10:43:13,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-18 10:43:16,729 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:43:26,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-18 10:43:47,834 INFO [train.py:1198] (1/2) Epoch 42, batch 2450, loss[loss=0.2498, ctc_loss=0.1705, cr_loss=0.3966, over 20824.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3717, over 4073067.45 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:43:52,431 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.197e+02 2.347e+02 2.522e+02 5.398e+02, threshold=4.693e+02, percent-clipped=1.0 2024-09-18 10:43:54,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-09-18 10:43:55,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749345.8333333334, ans=0.1 2024-09-18 10:44:12,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-09-18 10:44:23,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=749402.5, ans=0.125 2024-09-18 10:44:30,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=749402.5, ans=0.125 2024-09-18 10:45:04,000 INFO [train.py:1198] (1/2) Epoch 42, batch 2500, loss[loss=0.2414, ctc_loss=0.1619, cr_loss=0.3977, over 20967.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1453, cr_loss=0.3718, over 4088556.28 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:46:00,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-18 10:46:19,304 INFO [train.py:1198] (1/2) Epoch 42, batch 2550, loss[loss=0.2165, ctc_loss=0.1448, cr_loss=0.3583, over 20833.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3704, over 4093650.83 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:46:23,755 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.217e+02 2.341e+02 2.487e+02 4.101e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 10:46:24,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=749629.1666666666, ans=0.0 2024-09-18 10:46:36,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=749657.5, ans=0.125 2024-09-18 10:46:46,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=749657.5, ans=0.2 2024-09-18 10:46:57,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=749685.8333333334, ans=0.0 2024-09-18 10:47:33,855 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:47:37,929 INFO [train.py:1198] (1/2) Epoch 42, batch 2600, loss[loss=0.195, ctc_loss=0.1281, cr_loss=0.3345, over 20935.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3703, over 4081288.45 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:47:59,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=749799.1666666666, ans=0.0 2024-09-18 10:48:17,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=749827.5, ans=0.125 2024-09-18 10:48:56,553 INFO [train.py:1198] (1/2) Epoch 42, batch 2650, loss[loss=0.1726, ctc_loss=0.1108, cr_loss=0.3091, over 20956.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.371, over 4093910.23 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:49:01,209 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.245e+02 2.356e+02 2.477e+02 3.406e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 10:49:21,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2024-09-18 10:49:24,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=749940.8333333334, ans=0.04949747468305833 2024-09-18 10:50:13,038 INFO [train.py:1198] (1/2) Epoch 42, batch 2700, loss[loss=0.2103, ctc_loss=0.1381, cr_loss=0.3609, over 21031.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3699, over 4105244.66 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:50:18,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-18 10:50:31,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=750082.5, ans=0.0 2024-09-18 10:50:42,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750110.8333333334, ans=0.125 2024-09-18 10:50:51,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=750110.8333333334, ans=0.125 2024-09-18 10:51:13,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=750167.5, ans=0.125 2024-09-18 10:51:24,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=22.5 2024-09-18 10:51:28,552 INFO [train.py:1198] (1/2) Epoch 42, batch 2750, loss[loss=0.2489, ctc_loss=0.1667, cr_loss=0.4114, over 20646.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3707, over 4099928.04 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:51:33,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.208e+02 2.324e+02 2.458e+02 3.285e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 10:51:49,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-09-18 10:51:51,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750224.1666666666, ans=0.1 2024-09-18 10:52:04,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=750252.5, ans=0.035 2024-09-18 10:52:17,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=750280.8333333334, ans=0.025 2024-09-18 10:52:20,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750280.8333333334, ans=0.1 2024-09-18 10:52:41,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750309.1666666666, ans=0.1 2024-09-18 10:52:44,679 INFO [train.py:1198] (1/2) Epoch 42, batch 2800, loss[loss=0.1875, ctc_loss=0.1215, cr_loss=0.3301, over 21077.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3702, over 4096119.82 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:53:19,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=750394.1666666666, ans=0.125 2024-09-18 10:53:22,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=750394.1666666666, ans=0.2 2024-09-18 10:54:03,345 INFO [train.py:1198] (1/2) Epoch 42, batch 2850, loss[loss=0.1816, ctc_loss=0.1188, cr_loss=0.3137, over 20916.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.369, over 4109696.48 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:54:10,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.233e+02 2.385e+02 2.523e+02 4.065e+02, threshold=4.770e+02, percent-clipped=0.0 2024-09-18 10:55:14,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750592.5, ans=0.0 2024-09-18 10:55:17,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=750592.5, ans=0.125 2024-09-18 10:55:21,813 INFO [train.py:1198] (1/2) Epoch 42, batch 2900, loss[loss=0.1945, ctc_loss=0.1302, cr_loss=0.3213, over 20863.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3673, over 4111155.26 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:55:44,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750649.1666666666, ans=0.0 2024-09-18 10:55:47,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=750649.1666666666, ans=0.125 2024-09-18 10:55:50,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=750677.5, ans=0.0 2024-09-18 10:55:59,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=750677.5, ans=0.1 2024-09-18 10:56:18,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=750705.8333333334, ans=0.125 2024-09-18 10:56:37,839 INFO [train.py:1198] (1/2) Epoch 42, batch 2950, loss[loss=0.1842, ctc_loss=0.1181, cr_loss=0.3304, over 21069.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3675, over 4106175.61 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:56:42,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.223e+02 2.424e+02 2.596e+02 3.606e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-18 10:56:51,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750790.8333333334, ans=0.0 2024-09-18 10:57:53,916 INFO [train.py:1198] (1/2) Epoch 42, batch 3000, loss[loss=0.2061, ctc_loss=0.1349, cr_loss=0.3562, over 20961.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3672, over 4100964.49 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:57:53,917 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 10:58:25,080 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.03965, ctc_loss=0.03965, cr_loss=1.495e-14, over 944034.00 frames. 2024-09-18 10:58:25,081 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 10:58:31,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750904.1666666666, ans=0.1 2024-09-18 10:58:42,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=750932.5, ans=15.0 2024-09-18 10:58:48,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-18 10:58:51,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=750932.5, ans=0.2 2024-09-18 10:59:16,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=750989.1666666666, ans=0.0 2024-09-18 10:59:44,271 INFO [train.py:1198] (1/2) Epoch 42, batch 3050, loss[loss=0.2261, ctc_loss=0.1473, cr_loss=0.394, over 20864.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.368, over 4102362.26 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:59:51,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.222e+02 2.355e+02 2.517e+02 4.209e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 11:01:02,844 INFO [train.py:1198] (1/2) Epoch 42, batch 3100, loss[loss=0.2046, ctc_loss=0.1352, cr_loss=0.3469, over 20988.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3691, over 4103064.83 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:01:12,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=751187.5, ans=0.125 2024-09-18 11:02:08,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=751300.8333333334, ans=0.125 2024-09-18 11:02:18,416 INFO [train.py:1198] (1/2) Epoch 42, batch 3150, loss[loss=0.2421, ctc_loss=0.1614, cr_loss=0.4034, over 20864.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.369, over 4093153.49 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:02:22,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.210e+02 2.304e+02 2.494e+02 3.038e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-18 11:02:28,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=751329.1666666666, ans=0.0 2024-09-18 11:02:51,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=751385.8333333334, ans=0.2 2024-09-18 11:03:34,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=22.5 2024-09-18 11:03:34,672 INFO [train.py:1198] (1/2) Epoch 42, batch 3200, loss[loss=0.2304, ctc_loss=0.1527, cr_loss=0.3883, over 20330.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3687, over 4076341.10 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:03:42,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=751470.8333333334, ans=0.125 2024-09-18 11:04:49,078 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:04:53,059 INFO [train.py:1198] (1/2) Epoch 42, batch 3250, loss[loss=0.2201, ctc_loss=0.1438, cr_loss=0.3812, over 21031.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3684, over 4072413.91 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:04:57,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.263e+02 2.396e+02 2.527e+02 4.570e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 11:05:08,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2024-09-18 11:05:39,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=751697.5, ans=0.125 2024-09-18 11:05:51,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=751697.5, ans=0.0 2024-09-18 11:05:56,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=751725.8333333334, ans=0.2 2024-09-18 11:05:59,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751725.8333333334, ans=0.0 2024-09-18 11:06:11,276 INFO [train.py:1198] (1/2) Epoch 42, batch 3300, loss[loss=0.1938, ctc_loss=0.1273, cr_loss=0.3324, over 20936.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1451, cr_loss=0.3709, over 4085024.93 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:06:23,836 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:06:43,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-18 11:07:08,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-18 11:07:27,022 INFO [train.py:1198] (1/2) Epoch 42, batch 3350, loss[loss=0.2336, ctc_loss=0.1563, cr_loss=0.3865, over 20244.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3704, over 4094152.62 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:07:31,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.215e+02 2.369e+02 2.477e+02 3.546e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 11:08:22,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-18 11:08:42,103 INFO [train.py:1198] (1/2) Epoch 42, batch 3400, loss[loss=0.2253, ctc_loss=0.149, cr_loss=0.3813, over 20927.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3704, over 4094788.91 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:09:10,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=752094.1666666666, ans=0.025 2024-09-18 11:10:00,512 INFO [train.py:1198] (1/2) Epoch 42, batch 3450, loss[loss=0.2109, ctc_loss=0.1377, cr_loss=0.3658, over 20876.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3704, over 4089674.95 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:10:05,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.220e+02 2.371e+02 2.527e+02 4.053e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 11:10:53,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-18 11:11:07,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=752292.5, ans=0.125 2024-09-18 11:11:19,761 INFO [train.py:1198] (1/2) Epoch 42, batch 3500, loss[loss=0.1886, ctc_loss=0.1228, cr_loss=0.3293, over 20952.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.37, over 4093943.54 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:11:21,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 11:12:11,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=752405.8333333334, ans=0.125 2024-09-18 11:12:13,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-18 11:12:23,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=752434.1666666666, ans=0.125 2024-09-18 11:12:35,250 INFO [train.py:1198] (1/2) Epoch 42, batch 3550, loss[loss=0.2356, ctc_loss=0.158, cr_loss=0.3885, over 20951.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3711, over 4100570.10 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:12:39,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.270e+02 2.399e+02 2.600e+02 3.964e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 11:13:43,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=752575.8333333334, ans=0.125 2024-09-18 11:13:50,307 INFO [train.py:1198] (1/2) Epoch 42, batch 3600, loss[loss=0.2389, ctc_loss=0.1587, cr_loss=0.4012, over 20977.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1453, cr_loss=0.3715, over 4102246.19 frames. ], batch size: 67, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:14:08,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=752632.5, ans=0.0 2024-09-18 11:14:55,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=752717.5, ans=0.125 2024-09-18 11:14:59,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=752717.5, ans=0.0 2024-09-18 11:15:06,104 INFO [train.py:1198] (1/2) Epoch 42, batch 3650, loss[loss=0.23, ctc_loss=0.1517, cr_loss=0.3916, over 20948.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3696, over 4108794.35 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:15:10,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.202e+02 2.348e+02 2.464e+02 3.004e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 11:16:25,387 INFO [train.py:1198] (1/2) Epoch 42, batch 3700, loss[loss=0.2253, ctc_loss=0.1499, cr_loss=0.3769, over 20661.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3683, over 4102943.14 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:16:44,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=752915.8333333334, ans=0.2 2024-09-18 11:17:05,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=752944.1666666666, ans=0.95 2024-09-18 11:17:16,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=752972.5, ans=0.0 2024-09-18 11:17:43,512 INFO [train.py:1198] (1/2) Epoch 42, batch 3750, loss[loss=0.2065, ctc_loss=0.1351, cr_loss=0.3569, over 20990.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3697, over 4110110.08 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:17:47,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.231e+02 2.360e+02 2.537e+02 3.045e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-18 11:17:48,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=753029.1666666666, ans=0.2 2024-09-18 11:18:17,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=753085.8333333334, ans=0.025 2024-09-18 11:18:19,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753085.8333333334, ans=0.125 2024-09-18 11:18:39,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753114.1666666666, ans=0.125 2024-09-18 11:18:58,566 INFO [train.py:1198] (1/2) Epoch 42, batch 3800, loss[loss=0.2116, ctc_loss=0.1379, cr_loss=0.3683, over 20387.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3691, over 4111466.07 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:19:02,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=753170.8333333334, ans=0.0 2024-09-18 11:19:23,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=753199.1666666666, ans=0.125 2024-09-18 11:19:25,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753199.1666666666, ans=0.125 2024-09-18 11:19:42,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=753255.8333333334, ans=0.125 2024-09-18 11:19:50,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=753255.8333333334, ans=0.125 2024-09-18 11:19:53,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=753255.8333333334, ans=0.0 2024-09-18 11:20:13,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=753312.5, ans=0.2 2024-09-18 11:20:14,568 INFO [train.py:1198] (1/2) Epoch 42, batch 3850, loss[loss=0.2126, ctc_loss=0.1409, cr_loss=0.3586, over 21002.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3693, over 4079761.39 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:20:19,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.299e+02 2.403e+02 2.610e+02 3.655e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 11:20:26,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=753312.5, ans=0.125 2024-09-18 11:20:36,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-18 11:20:39,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-18 11:21:19,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=753425.8333333334, ans=0.125 2024-09-18 11:21:32,734 INFO [train.py:1198] (1/2) Epoch 42, batch 3900, loss[loss=0.2678, ctc_loss=0.177, cr_loss=0.454, over 20958.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3702, over 4081080.57 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:21:48,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=753482.5, ans=0.125 2024-09-18 11:21:57,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-18 11:22:03,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=753510.8333333334, ans=0.0 2024-09-18 11:22:51,260 INFO [train.py:1198] (1/2) Epoch 42, batch 3950, loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3367, over 20974.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3708, over 4093535.40 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:22:53,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-18 11:22:55,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.232e+02 2.359e+02 2.467e+02 3.381e+02, threshold=4.718e+02, percent-clipped=0.0 2024-09-18 11:23:10,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-18 11:23:20,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=753652.5, ans=0.0 2024-09-18 11:23:40,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-18 11:23:46,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=753680.8333333334, ans=0.125 2024-09-18 11:24:01,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=753709.1666666666, ans=15.0 2024-09-18 11:24:06,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=753737.5, ans=0.0 2024-09-18 11:24:07,205 INFO [train.py:1198] (1/2) Epoch 42, batch 4000, loss[loss=0.2564, ctc_loss=0.173, cr_loss=0.4171, over 20972.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3702, over 4097423.98 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:24:14,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=22.5 2024-09-18 11:24:30,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753765.8333333334, ans=0.1 2024-09-18 11:24:37,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=753794.1666666666, ans=0.125 2024-09-18 11:24:49,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753794.1666666666, ans=0.1 2024-09-18 11:25:02,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=15.0 2024-09-18 11:25:22,407 INFO [train.py:1198] (1/2) Epoch 42, batch 4050, loss[loss=0.1892, ctc_loss=0.1252, cr_loss=0.3199, over 20997.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3696, over 4103728.97 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:25:28,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.209e+02 2.341e+02 2.493e+02 2.885e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 11:25:31,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=753879.1666666666, ans=0.2 2024-09-18 11:25:33,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753879.1666666666, ans=0.1 2024-09-18 11:25:37,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=753907.5, ans=0.0 2024-09-18 11:25:54,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=753935.8333333334, ans=0.0 2024-09-18 11:26:06,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=753964.1666666666, ans=0.025 2024-09-18 11:26:38,523 INFO [train.py:1198] (1/2) Epoch 42, batch 4100, loss[loss=0.2017, ctc_loss=0.1328, cr_loss=0.3447, over 20846.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3712, over 4088457.77 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:26:40,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-18 11:26:46,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=754020.8333333334, ans=0.2 2024-09-18 11:27:00,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=754049.1666666666, ans=0.125 2024-09-18 11:27:01,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754049.1666666666, ans=0.1 2024-09-18 11:27:22,879 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:28:00,915 INFO [train.py:1198] (1/2) Epoch 42, batch 4150, loss[loss=0.1949, ctc_loss=0.126, cr_loss=0.3446, over 20980.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3708, over 4096329.41 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:28:06,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.211e+02 2.332e+02 2.484e+02 4.088e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-18 11:28:20,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754190.8333333334, ans=0.1 2024-09-18 11:28:21,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 11:28:41,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=754219.1666666666, ans=0.2 2024-09-18 11:28:42,656 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:29:01,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754275.8333333334, ans=0.1 2024-09-18 11:29:04,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=12.0 2024-09-18 11:29:10,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754275.8333333334, ans=0.125 2024-09-18 11:29:17,591 INFO [train.py:1198] (1/2) Epoch 42, batch 4200, loss[loss=0.2028, ctc_loss=0.1352, cr_loss=0.3381, over 20975.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.37, over 4102459.89 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:29:31,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=754332.5, ans=0.125 2024-09-18 11:29:47,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-09-18 11:30:06,867 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:30:33,466 INFO [train.py:1198] (1/2) Epoch 42, batch 4250, loss[loss=0.2276, ctc_loss=0.1508, cr_loss=0.3837, over 20777.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3694, over 4113839.39 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:30:39,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.235e+02 2.374e+02 2.515e+02 4.071e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 11:30:56,627 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:30:59,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=754474.1666666666, ans=0.125 2024-09-18 11:31:09,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-18 11:31:13,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754502.5, ans=0.1 2024-09-18 11:31:29,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=754530.8333333334, ans=0.0 2024-09-18 11:31:32,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=754559.1666666666, ans=0.125 2024-09-18 11:31:48,905 INFO [train.py:1198] (1/2) Epoch 42, batch 4300, loss[loss=0.2394, ctc_loss=0.1581, cr_loss=0.4061, over 20722.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3701, over 4120253.54 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:31:50,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=754587.5, ans=0.125 2024-09-18 11:31:58,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-18 11:32:48,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754672.5, ans=0.1 2024-09-18 11:32:55,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=754700.8333333334, ans=0.0 2024-09-18 11:32:58,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=754700.8333333334, ans=0.125 2024-09-18 11:33:07,179 INFO [train.py:1198] (1/2) Epoch 42, batch 4350, loss[loss=0.2302, ctc_loss=0.1508, cr_loss=0.3974, over 20766.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3689, over 4114635.97 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:33:13,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.238e+02 2.376e+02 2.514e+02 4.883e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 11:33:15,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=754729.1666666666, ans=0.2 2024-09-18 11:33:25,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754757.5, ans=0.125 2024-09-18 11:33:28,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=754757.5, ans=0.0 2024-09-18 11:33:32,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=754757.5, ans=0.0 2024-09-18 11:33:57,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-09-18 11:33:58,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=754814.1666666666, ans=0.125 2024-09-18 11:34:01,909 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:34:25,889 INFO [train.py:1198] (1/2) Epoch 42, batch 4400, loss[loss=0.2053, ctc_loss=0.1361, cr_loss=0.3459, over 21075.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3695, over 4102568.44 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:34:27,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=754870.8333333334, ans=0.125 2024-09-18 11:34:46,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-09-18 11:35:04,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=754927.5, ans=0.0 2024-09-18 11:35:41,512 INFO [train.py:1198] (1/2) Epoch 42, batch 4450, loss[loss=0.1979, ctc_loss=0.1308, cr_loss=0.3353, over 20962.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.3683, over 4116193.71 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:35:46,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=755012.5, ans=0.035 2024-09-18 11:35:46,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=755012.5, ans=0.0 2024-09-18 11:35:47,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.246e+02 2.370e+02 2.552e+02 4.794e+02, threshold=4.740e+02, percent-clipped=1.0 2024-09-18 11:35:55,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=755040.8333333334, ans=0.025 2024-09-18 11:35:59,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=755040.8333333334, ans=0.125 2024-09-18 11:36:10,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755069.1666666666, ans=0.125 2024-09-18 11:36:22,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:36:57,192 INFO [train.py:1198] (1/2) Epoch 42, batch 4500, loss[loss=0.237, ctc_loss=0.1558, cr_loss=0.4056, over 20689.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3698, over 4104133.21 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:37:06,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.08 vs. limit=6.0 2024-09-18 11:37:57,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=755267.5, ans=0.125 2024-09-18 11:38:13,892 INFO [train.py:1198] (1/2) Epoch 42, batch 4550, loss[loss=0.2148, ctc_loss=0.1404, cr_loss=0.3719, over 20998.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3691, over 4103810.39 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:38:19,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.212e+02 2.352e+02 2.623e+02 4.675e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 11:39:06,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=755380.8333333334, ans=0.5 2024-09-18 11:39:19,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=755409.1666666666, ans=0.2 2024-09-18 11:39:22,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755409.1666666666, ans=0.125 2024-09-18 11:39:35,509 INFO [train.py:1198] (1/2) Epoch 42, batch 4600, loss[loss=0.2517, ctc_loss=0.1727, cr_loss=0.3952, over 13946.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3704, over 4104892.04 frames. ], batch size: 150, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:40:22,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755522.5, ans=0.1 2024-09-18 11:40:51,816 INFO [train.py:1198] (1/2) Epoch 42, batch 4650, loss[loss=0.2089, ctc_loss=0.1378, cr_loss=0.3552, over 21019.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3691, over 4104138.59 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:40:57,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.215e+02 2.347e+02 2.486e+02 3.653e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 11:41:02,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=755579.1666666666, ans=0.125 2024-09-18 11:41:19,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755607.5, ans=0.1 2024-09-18 11:42:07,860 INFO [train.py:1198] (1/2) Epoch 42, batch 4700, loss[loss=0.2202, ctc_loss=0.1484, cr_loss=0.3587, over 21016.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3689, over 4099976.02 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:42:08,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755720.8333333334, ans=0.0 2024-09-18 11:42:55,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2024-09-18 11:43:02,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=755805.8333333334, ans=0.0 2024-09-18 11:43:23,386 INFO [train.py:1198] (1/2) Epoch 42, batch 4750, loss[loss=0.2113, ctc_loss=0.1413, cr_loss=0.3497, over 20794.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3687, over 4093640.02 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:43:29,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.249e+02 2.385e+02 2.589e+02 4.538e+02, threshold=4.770e+02, percent-clipped=0.0 2024-09-18 11:43:41,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=755890.8333333334, ans=0.0 2024-09-18 11:44:41,859 INFO [train.py:1198] (1/2) Epoch 42, batch 4800, loss[loss=0.1993, ctc_loss=0.1284, cr_loss=0.3549, over 21050.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1442, cr_loss=0.3694, over 4104675.07 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:44:51,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-18 11:45:16,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756060.8333333334, ans=0.125 2024-09-18 11:45:35,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=756089.1666666666, ans=0.07 2024-09-18 11:45:39,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=756089.1666666666, ans=0.2 2024-09-18 11:45:56,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=756117.5, ans=0.125 2024-09-18 11:46:00,379 INFO [train.py:1198] (1/2) Epoch 42, batch 4850, loss[loss=0.2086, ctc_loss=0.1375, cr_loss=0.3557, over 20880.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3707, over 4115433.04 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:46:06,353 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.238e+02 2.388e+02 2.577e+02 3.502e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-18 11:46:23,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=756174.1666666666, ans=0.0 2024-09-18 11:46:23,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756174.1666666666, ans=0.1 2024-09-18 11:46:30,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=756202.5, ans=0.1 2024-09-18 11:46:38,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=756202.5, ans=0.025 2024-09-18 11:46:58,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756230.8333333334, ans=0.125 2024-09-18 11:47:00,945 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:47:15,402 INFO [train.py:1198] (1/2) Epoch 42, batch 4900, loss[loss=0.1826, ctc_loss=0.1168, cr_loss=0.3291, over 20954.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3705, over 4097733.43 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:47:15,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=756287.5, ans=0.2 2024-09-18 11:47:44,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756344.1666666666, ans=0.1 2024-09-18 11:47:46,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=756344.1666666666, ans=0.125 2024-09-18 11:47:52,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=756344.1666666666, ans=0.125 2024-09-18 11:48:14,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=756400.8333333334, ans=0.0 2024-09-18 11:48:30,373 INFO [train.py:1198] (1/2) Epoch 42, batch 4950, loss[loss=0.2121, ctc_loss=0.1391, cr_loss=0.3652, over 20648.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3694, over 4092588.32 frames. ], batch size: 71, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:48:32,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=756429.1666666666, ans=0.2 2024-09-18 11:48:36,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.228e+02 2.343e+02 2.516e+02 5.057e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-18 11:49:27,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-18 11:49:44,173 INFO [train.py:1198] (1/2) Epoch 42, batch 5000, loss[loss=0.236, ctc_loss=0.1554, cr_loss=0.4029, over 19351.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.369, over 4104867.82 frames. ], batch size: 90, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:49:48,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=756570.8333333334, ans=0.0 2024-09-18 11:50:11,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=756599.1666666666, ans=0.125 2024-09-18 11:50:12,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-18 11:50:59,009 INFO [train.py:1198] (1/2) Epoch 42, batch 5050, loss[loss=0.2584, ctc_loss=0.177, cr_loss=0.4072, over 18257.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3688, over 4108067.40 frames. ], batch size: 108, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:51:04,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.162e+02 2.307e+02 2.467e+02 3.044e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-18 11:52:01,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=8.0 2024-09-18 11:52:13,343 INFO [train.py:1198] (1/2) Epoch 42, batch 5100, loss[loss=0.2168, ctc_loss=0.1406, cr_loss=0.3813, over 21072.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3708, over 4115368.11 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:53:30,731 INFO [train.py:1198] (1/2) Epoch 42, batch 5150, loss[loss=0.2118, ctc_loss=0.1371, cr_loss=0.3735, over 20862.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1439, cr_loss=0.3708, over 4117135.89 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:53:36,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.200e+02 2.336e+02 2.475e+02 3.055e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-18 11:53:46,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=757024.1666666666, ans=0.125 2024-09-18 11:53:58,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:54:03,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=757052.5, ans=0.125 2024-09-18 11:54:06,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=757052.5, ans=0.0 2024-09-18 11:54:13,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=757052.5, ans=0.125 2024-09-18 11:54:15,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757052.5, ans=0.1 2024-09-18 11:54:27,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=757080.8333333334, ans=0.125 2024-09-18 11:54:30,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-18 11:54:47,764 INFO [train.py:1198] (1/2) Epoch 42, batch 5200, loss[loss=0.2103, ctc_loss=0.1382, cr_loss=0.3606, over 20829.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.37, over 4115921.38 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:55:14,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=757165.8333333334, ans=0.125 2024-09-18 11:56:01,687 INFO [train.py:1198] (1/2) Epoch 42, batch 5250, loss[loss=0.244, ctc_loss=0.1625, cr_loss=0.4075, over 20957.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3695, over 4121878.88 frames. ], batch size: 64, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:56:07,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.203e+02 2.338e+02 2.493e+02 9.904e+02, threshold=4.677e+02, percent-clipped=1.0 2024-09-18 11:56:38,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-09-18 11:56:52,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757364.1666666666, ans=0.1 2024-09-18 11:56:55,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=757364.1666666666, ans=0.125 2024-09-18 11:57:15,612 INFO [train.py:1198] (1/2) Epoch 42, batch 5300, loss[loss=0.235, ctc_loss=0.1568, cr_loss=0.3909, over 20877.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.3689, over 4118946.47 frames. ], batch size: 65, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:57:19,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=757420.8333333334, ans=0.125 2024-09-18 11:57:20,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-18 11:58:05,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757505.8333333334, ans=0.1 2024-09-18 11:58:22,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=757534.1666666666, ans=0.015 2024-09-18 11:58:29,999 INFO [train.py:1198] (1/2) Epoch 42, batch 5350, loss[loss=0.1925, ctc_loss=0.1268, cr_loss=0.3283, over 21059.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3689, over 4103649.51 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:58:35,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.227e+02 2.362e+02 2.544e+02 3.275e+02, threshold=4.724e+02, percent-clipped=0.0 2024-09-18 11:58:40,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757562.5, ans=0.125 2024-09-18 11:58:51,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=12.0 2024-09-18 11:58:54,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=757590.8333333334, ans=0.2 2024-09-18 11:58:58,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=757619.1666666666, ans=0.025 2024-09-18 11:59:11,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757619.1666666666, ans=0.1 2024-09-18 11:59:17,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=757647.5, ans=0.125 2024-09-18 11:59:22,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=757647.5, ans=0.0 2024-09-18 11:59:23,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=757647.5, ans=0.0 2024-09-18 11:59:26,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757647.5, ans=0.125 2024-09-18 11:59:34,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=757675.8333333334, ans=0.035 2024-09-18 11:59:44,550 INFO [train.py:1198] (1/2) Epoch 42, batch 5400, loss[loss=0.2177, ctc_loss=0.1418, cr_loss=0.3793, over 20994.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3693, over 4092143.53 frames. ], batch size: 61, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:59:54,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-18 12:00:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=757732.5, ans=0.2 2024-09-18 12:00:19,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:00:35,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=757789.1666666666, ans=0.2 2024-09-18 12:00:46,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.57 vs. limit=10.0 2024-09-18 12:00:59,375 INFO [train.py:1198] (1/2) Epoch 42, batch 5450, loss[loss=0.1944, ctc_loss=0.1266, cr_loss=0.3393, over 21000.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1442, cr_loss=0.3691, over 4093653.88 frames. ], batch size: 52, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:01:05,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.223e+02 2.334e+02 2.480e+02 3.545e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-18 12:01:16,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757874.1666666666, ans=0.1 2024-09-18 12:01:16,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=757874.1666666666, ans=0.125 2024-09-18 12:01:23,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=757874.1666666666, ans=0.125 2024-09-18 12:01:32,468 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-18 12:01:46,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=757930.8333333334, ans=0.125 2024-09-18 12:01:49,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=757930.8333333334, ans=0.0 2024-09-18 12:02:15,415 INFO [train.py:1198] (1/2) Epoch 42, batch 5500, loss[loss=0.2461, ctc_loss=0.1645, cr_loss=0.408, over 21068.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3693, over 4099324.73 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:02:24,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=757987.5, ans=0.025 2024-09-18 12:02:30,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758015.8333333334, ans=0.1 2024-09-18 12:02:44,243 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:02:56,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-18 12:03:32,568 INFO [train.py:1198] (1/2) Epoch 42, batch 5550, loss[loss=0.1704, ctc_loss=0.1076, cr_loss=0.3141, over 20922.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3692, over 4097411.11 frames. ], batch size: 49, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:03:38,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.215e+02 2.333e+02 2.541e+02 3.490e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-18 12:04:17,487 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:04:22,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=22.5 2024-09-18 12:04:28,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-18 12:04:35,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=758242.5, ans=0.0 2024-09-18 12:04:35,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-18 12:04:36,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=758242.5, ans=0.0 2024-09-18 12:04:46,823 INFO [train.py:1198] (1/2) Epoch 42, batch 5600, loss[loss=0.22, ctc_loss=0.1438, cr_loss=0.3808, over 21063.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.37, over 4100758.96 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:04:57,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758270.8333333334, ans=0.0 2024-09-18 12:05:36,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-18 12:06:00,838 INFO [train.py:1198] (1/2) Epoch 42, batch 5650, loss[loss=0.1948, ctc_loss=0.1266, cr_loss=0.3409, over 21078.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3691, over 4108114.16 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:06:06,679 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.227e+02 2.333e+02 2.449e+02 4.721e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-18 12:06:14,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=758440.8333333334, ans=0.125 2024-09-18 12:06:19,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=758440.8333333334, ans=0.125 2024-09-18 12:06:51,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=758497.5, ans=0.125 2024-09-18 12:06:57,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=758497.5, ans=0.125 2024-09-18 12:07:04,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.04 vs. limit=10.0 2024-09-18 12:07:14,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=758554.1666666666, ans=0.2 2024-09-18 12:07:15,416 INFO [train.py:1198] (1/2) Epoch 42, batch 5700, loss[loss=0.2012, ctc_loss=0.1296, cr_loss=0.3577, over 21000.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3705, over 4118423.55 frames. ], batch size: 52, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:07:26,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=758554.1666666666, ans=0.125 2024-09-18 12:08:30,140 INFO [train.py:1198] (1/2) Epoch 42, batch 5750, loss[loss=0.2494, ctc_loss=0.1665, cr_loss=0.4145, over 20824.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3704, over 4109157.18 frames. ], batch size: 65, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:08:32,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-18 12:08:36,185 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.229e+02 2.357e+02 2.498e+02 3.142e+02, threshold=4.714e+02, percent-clipped=0.0 2024-09-18 12:08:41,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=758695.8333333334, ans=0.125 2024-09-18 12:09:19,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=758780.8333333334, ans=0.125 2024-09-18 12:09:37,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=758809.1666666666, ans=0.07 2024-09-18 12:09:45,197 INFO [train.py:1198] (1/2) Epoch 42, batch 5800, loss[loss=0.2518, ctc_loss=0.1703, cr_loss=0.4076, over 20810.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3706, over 4102698.62 frames. ], batch size: 65, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:10:19,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=758894.1666666666, ans=0.0 2024-09-18 12:10:31,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758922.5, ans=0.1 2024-09-18 12:11:01,290 INFO [train.py:1198] (1/2) Epoch 42, batch 5850, loss[loss=0.2338, ctc_loss=0.154, cr_loss=0.3986, over 20874.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3687, over 4108847.09 frames. ], batch size: 54, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:11:07,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.208e+02 2.371e+02 2.563e+02 6.085e+02, threshold=4.743e+02, percent-clipped=1.0 2024-09-18 12:12:18,066 INFO [train.py:1198] (1/2) Epoch 42, batch 5900, loss[loss=0.2343, ctc_loss=0.1541, cr_loss=0.4013, over 20885.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.3683, over 4098794.99 frames. ], batch size: 57, lr: 1.95e-03, grad_scale: 16.0 2024-09-18 12:12:37,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759149.1666666666, ans=0.1 2024-09-18 12:12:56,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=759177.5, ans=0.125 2024-09-18 12:12:59,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=759177.5, ans=0.2 2024-09-18 12:13:28,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=759234.1666666666, ans=15.0 2024-09-18 12:13:32,353 INFO [train.py:1198] (1/2) Epoch 42, batch 5950, loss[loss=0.2354, ctc_loss=0.1565, cr_loss=0.3946, over 20728.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3699, over 4095920.55 frames. ], batch size: 71, lr: 1.95e-03, grad_scale: 16.0 2024-09-18 12:13:40,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.274e+02 2.405e+02 2.547e+02 4.054e+02, threshold=4.810e+02, percent-clipped=0.0 2024-09-18 12:13:40,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=759262.5, ans=0.2 2024-09-18 12:14:48,095 INFO [train.py:1198] (1/2) Epoch 42, batch 6000, loss[loss=0.2227, ctc_loss=0.1453, cr_loss=0.3866, over 21084.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.369, over 4107540.38 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:14:48,095 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 12:15:10,456 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5417, 1.7924, 1.7971, 2.2227, 2.4431, 2.4376, 2.1470, 2.1122], device='cuda:1') 2024-09-18 12:15:13,592 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.03968, ctc_loss=0.03968, cr_loss=1.49e-14, over 944034.00 frames. 2024-09-18 12:15:13,592 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 12:15:24,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=759404.1666666666, ans=0.125 2024-09-18 12:15:25,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759404.1666666666, ans=0.125 2024-09-18 12:15:41,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-18 12:15:51,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=759460.8333333334, ans=0.025 2024-09-18 12:15:53,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=759460.8333333334, ans=0.125 2024-09-18 12:16:07,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=759489.1666666666, ans=0.2 2024-09-18 12:16:10,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759489.1666666666, ans=0.1 2024-09-18 12:16:29,685 INFO [train.py:1198] (1/2) Epoch 42, batch 6050, loss[loss=0.2082, ctc_loss=0.1386, cr_loss=0.3482, over 20890.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3693, over 4104383.88 frames. ], batch size: 54, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:16:31,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=759545.8333333334, ans=0.2 2024-09-18 12:16:32,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=759545.8333333334, ans=0.0 2024-09-18 12:16:36,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.236e+02 2.362e+02 2.572e+02 5.022e+02, threshold=4.725e+02, percent-clipped=1.0 2024-09-18 12:16:40,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759545.8333333334, ans=0.1 2024-09-18 12:16:53,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=22.5 2024-09-18 12:17:01,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759602.5, ans=0.125 2024-09-18 12:17:28,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=759659.1666666666, ans=0.0 2024-09-18 12:17:43,968 INFO [train.py:1198] (1/2) Epoch 42, batch 6100, loss[loss=0.2589, ctc_loss=0.1736, cr_loss=0.4265, over 18422.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3704, over 4086862.84 frames. ], batch size: 108, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:17:45,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=759687.5, ans=0.125 2024-09-18 12:17:57,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=759715.8333333334, ans=0.2 2024-09-18 12:17:59,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=759715.8333333334, ans=0.125 2024-09-18 12:18:08,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=22.5 2024-09-18 12:19:00,114 INFO [train.py:1198] (1/2) Epoch 42, batch 6150, loss[loss=0.1859, ctc_loss=0.1216, cr_loss=0.3215, over 20965.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1443, cr_loss=0.3689, over 4092130.05 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:19:07,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.231e+02 2.384e+02 2.516e+02 6.261e+02, threshold=4.768e+02, percent-clipped=1.0 2024-09-18 12:19:40,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-09-18 12:19:55,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-18 12:20:05,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-18 12:20:12,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=759942.5, ans=0.125 2024-09-18 12:20:12,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=759942.5, ans=0.02 2024-09-18 12:20:12,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=759942.5, ans=0.025 2024-09-18 12:20:12,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759942.5, ans=0.1 2024-09-18 12:20:15,546 INFO [train.py:1198] (1/2) Epoch 42, batch 6200, loss[loss=0.2423, ctc_loss=0.1593, cr_loss=0.415, over 20658.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1451, cr_loss=0.3712, over 4090805.09 frames. ], batch size: 71, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:20:23,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=759970.8333333334, ans=6.0 2024-09-18 12:20:30,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=759999.1666666666, ans=0.125 2024-09-18 12:20:35,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=759999.1666666666, ans=0.0 2024-09-18 12:20:48,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=760027.5, ans=0.1 2024-09-18 12:20:56,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-09-18 12:21:29,514 INFO [train.py:1198] (1/2) Epoch 42, batch 6250, loss[loss=0.1983, ctc_loss=0.1271, cr_loss=0.356, over 19887.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3701, over 4065402.85 frames. ], batch size: 44, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:21:31,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760112.5, ans=0.125 2024-09-18 12:21:36,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.273e+02 2.403e+02 2.616e+02 3.818e+02, threshold=4.806e+02, percent-clipped=0.0 2024-09-18 12:22:19,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2024-09-18 12:22:28,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760225.8333333334, ans=0.1 2024-09-18 12:22:44,268 INFO [train.py:1198] (1/2) Epoch 42, batch 6300, loss[loss=0.1826, ctc_loss=0.1187, cr_loss=0.3195, over 20961.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3685, over 4043400.06 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:23:09,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-18 12:23:16,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760310.8333333334, ans=0.125 2024-09-18 12:23:19,312 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:23:29,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760339.1666666666, ans=0.1 2024-09-18 12:23:46,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=760367.5, ans=0.125 2024-09-18 12:23:55,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=760395.8333333334, ans=0.025 2024-09-18 12:23:55,956 INFO [train.py:1198] (1/2) Epoch 42, batch 6350, loss[loss=0.3017, ctc_loss=0.2136, cr_loss=0.4407, over 14409.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1469, cr_loss=0.3693, over 3880519.29 frames. ], batch size: 150, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:24:02,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.356e+02 2.647e+02 2.907e+02 3.900e+02, threshold=5.294e+02, percent-clipped=0.0 2024-09-18 12:25:44,428 INFO [train.py:1198] (1/2) Epoch 43, batch 0, loss[loss=0.2253, ctc_loss=0.1511, cr_loss=0.3711, over 20133.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1511, cr_loss=0.3711, over 20133.00 frames. ], batch size: 80, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:25:44,428 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 12:25:54,090 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4221, 4.9514, 4.8967, 5.2247], device='cuda:1') 2024-09-18 12:26:02,558 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.03933, ctc_loss=0.03933, cr_loss=1.486e-14, over 944034.00 frames. 2024-09-18 12:26:02,559 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 12:26:20,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760540.3333333334, ans=0.125 2024-09-18 12:26:42,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=760568.6666666666, ans=0.2 2024-09-18 12:26:46,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760568.6666666666, ans=0.125 2024-09-18 12:27:01,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=760597.0, ans=0.125 2024-09-18 12:27:21,239 INFO [train.py:1198] (1/2) Epoch 43, batch 50, loss[loss=0.2349, ctc_loss=0.1568, cr_loss=0.3903, over 20832.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3682, over 929853.29 frames. ], batch size: 65, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:27:42,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.311e+02 2.515e+02 2.748e+02 4.191e+02, threshold=5.029e+02, percent-clipped=0.0 2024-09-18 12:28:08,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=760738.6666666666, ans=0.0 2024-09-18 12:28:37,373 INFO [train.py:1198] (1/2) Epoch 43, batch 100, loss[loss=0.2179, ctc_loss=0.1447, cr_loss=0.3657, over 21014.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1415, cr_loss=0.3653, over 1637140.22 frames. ], batch size: 63, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:29:08,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=760852.0, ans=0.025 2024-09-18 12:29:21,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=760880.3333333334, ans=0.125 2024-09-18 12:29:24,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760880.3333333334, ans=0.1 2024-09-18 12:29:27,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 12:29:45,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=760908.6666666666, ans=0.025 2024-09-18 12:29:52,660 INFO [train.py:1198] (1/2) Epoch 43, batch 150, loss[loss=0.2351, ctc_loss=0.1568, cr_loss=0.3915, over 19273.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3669, over 2175851.69 frames. ], batch size: 90, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:30:00,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=760937.0, ans=0.125 2024-09-18 12:30:01,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=760937.0, ans=0.5 2024-09-18 12:30:11,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760965.3333333334, ans=0.1 2024-09-18 12:30:13,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.219e+02 2.315e+02 2.493e+02 7.656e+02, threshold=4.631e+02, percent-clipped=1.0 2024-09-18 12:30:41,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=761022.0, ans=0.0 2024-09-18 12:31:11,946 INFO [train.py:1198] (1/2) Epoch 43, batch 200, loss[loss=0.2091, ctc_loss=0.1368, cr_loss=0.3619, over 20831.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 2603053.46 frames. ], batch size: 59, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:31:13,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=761078.6666666666, ans=0.125 2024-09-18 12:31:21,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=22.5 2024-09-18 12:31:22,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=761078.6666666666, ans=0.125 2024-09-18 12:31:56,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=761163.6666666666, ans=0.0 2024-09-18 12:32:07,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=761163.6666666666, ans=0.125 2024-09-18 12:32:30,284 INFO [train.py:1198] (1/2) Epoch 43, batch 250, loss[loss=0.2496, ctc_loss=0.169, cr_loss=0.4027, over 20945.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3703, over 2926613.89 frames. ], batch size: 64, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:32:36,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=761220.3333333334, ans=0.0 2024-09-18 12:32:53,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.235e+02 2.347e+02 2.466e+02 3.759e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-18 12:32:57,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761248.6666666666, ans=0.0 2024-09-18 12:33:27,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=761305.3333333334, ans=0.125 2024-09-18 12:33:45,868 INFO [train.py:1198] (1/2) Epoch 43, batch 300, loss[loss=0.1685, ctc_loss=0.1088, cr_loss=0.2982, over 19504.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3696, over 3188278.16 frames. ], batch size: 43, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:33:55,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=761362.0, ans=0.2 2024-09-18 12:34:38,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-18 12:34:39,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2024-09-18 12:34:40,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761447.0, ans=0.125 2024-09-18 12:34:44,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=761447.0, ans=0.0 2024-09-18 12:34:52,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=761475.3333333334, ans=0.0 2024-09-18 12:34:54,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=761475.3333333334, ans=0.125 2024-09-18 12:35:01,556 INFO [train.py:1198] (1/2) Epoch 43, batch 350, loss[loss=0.2175, ctc_loss=0.1459, cr_loss=0.3582, over 20615.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3686, over 3392288.51 frames. ], batch size: 68, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:35:06,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-18 12:35:24,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.225e+02 2.365e+02 2.483e+02 3.085e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 12:36:19,882 INFO [train.py:1198] (1/2) Epoch 43, batch 400, loss[loss=0.1937, ctc_loss=0.1259, cr_loss=0.3389, over 20989.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.369, over 3561562.14 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:36:21,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761645.3333333334, ans=0.1 2024-09-18 12:36:32,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=761645.3333333334, ans=0.0 2024-09-18 12:36:55,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-18 12:37:17,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=761730.3333333334, ans=0.125 2024-09-18 12:37:34,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761787.0, ans=0.1 2024-09-18 12:37:35,661 INFO [train.py:1198] (1/2) Epoch 43, batch 450, loss[loss=0.1841, ctc_loss=0.1195, cr_loss=0.3226, over 20944.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3676, over 3683710.49 frames. ], batch size: 49, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:38:02,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.276e+02 2.380e+02 2.495e+02 3.272e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 12:38:07,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=761815.3333333334, ans=0.2 2024-09-18 12:38:38,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761900.3333333334, ans=0.1 2024-09-18 12:38:50,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=761900.3333333334, ans=0.125 2024-09-18 12:38:55,023 INFO [train.py:1198] (1/2) Epoch 43, batch 500, loss[loss=0.2042, ctc_loss=0.1322, cr_loss=0.36, over 20762.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.369, over 3755997.42 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:39:06,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=761928.6666666666, ans=0.125 2024-09-18 12:39:18,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=22.5 2024-09-18 12:39:19,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=761957.0, ans=0.2 2024-09-18 12:39:19,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=22.5 2024-09-18 12:39:22,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761957.0, ans=0.125 2024-09-18 12:39:35,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=761985.3333333334, ans=0.125 2024-09-18 12:40:08,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=762042.0, ans=0.125 2024-09-18 12:40:10,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=762070.3333333334, ans=0.0 2024-09-18 12:40:11,549 INFO [train.py:1198] (1/2) Epoch 43, batch 550, loss[loss=0.2275, ctc_loss=0.1508, cr_loss=0.3839, over 20835.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3686, over 3846061.45 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:40:24,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=762070.3333333334, ans=0.025 2024-09-18 12:40:36,315 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.265e+02 2.358e+02 2.495e+02 4.134e+02, threshold=4.716e+02, percent-clipped=0.0 2024-09-18 12:41:28,292 INFO [train.py:1198] (1/2) Epoch 43, batch 600, loss[loss=0.22, ctc_loss=0.1477, cr_loss=0.3618, over 20518.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3695, over 3901187.94 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:41:31,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=762212.0, ans=0.125 2024-09-18 12:41:33,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=762212.0, ans=0.025 2024-09-18 12:41:41,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=762212.0, ans=0.09899494936611666 2024-09-18 12:41:48,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 12:41:59,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=762240.3333333334, ans=0.0 2024-09-18 12:42:24,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=762297.0, ans=0.025 2024-09-18 12:42:46,869 INFO [train.py:1198] (1/2) Epoch 43, batch 650, loss[loss=0.1909, ctc_loss=0.1251, cr_loss=0.3288, over 19978.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3704, over 3940538.87 frames. ], batch size: 44, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:42:57,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=762353.6666666666, ans=0.0 2024-09-18 12:43:08,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=762382.0, ans=0.125 2024-09-18 12:43:11,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.202e+02 2.371e+02 2.474e+02 3.120e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 12:43:41,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=762438.6666666666, ans=0.125 2024-09-18 12:43:51,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=762467.0, ans=0.0 2024-09-18 12:44:06,127 INFO [train.py:1198] (1/2) Epoch 43, batch 700, loss[loss=0.1844, ctc_loss=0.121, cr_loss=0.317, over 21012.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3686, over 3974483.09 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:44:20,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=762523.6666666666, ans=0.125 2024-09-18 12:44:53,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762580.3333333334, ans=0.1 2024-09-18 12:44:59,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=762580.3333333334, ans=0.07 2024-09-18 12:45:03,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=762580.3333333334, ans=0.125 2024-09-18 12:45:05,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762608.6666666666, ans=0.125 2024-09-18 12:45:08,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-18 12:45:21,824 INFO [train.py:1198] (1/2) Epoch 43, batch 750, loss[loss=0.2838, ctc_loss=0.1981, cr_loss=0.4287, over 14255.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3687, over 3989758.37 frames. ], batch size: 150, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:45:41,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762665.3333333334, ans=0.1 2024-09-18 12:45:46,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.195e+02 2.345e+02 2.486e+02 3.872e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 12:46:25,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=762750.3333333334, ans=0.2 2024-09-18 12:46:38,167 INFO [train.py:1198] (1/2) Epoch 43, batch 800, loss[loss=0.1874, ctc_loss=0.1236, cr_loss=0.319, over 20380.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3684, over 4014407.14 frames. ], batch size: 45, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:46:50,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762778.6666666666, ans=0.1 2024-09-18 12:46:59,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=762807.0, ans=0.125 2024-09-18 12:47:13,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=762835.3333333334, ans=0.0 2024-09-18 12:47:26,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-18 12:47:29,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=762863.6666666666, ans=0.125 2024-09-18 12:47:30,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=762863.6666666666, ans=0.125 2024-09-18 12:47:56,966 INFO [train.py:1198] (1/2) Epoch 43, batch 850, loss[loss=0.1835, ctc_loss=0.1196, cr_loss=0.3195, over 19909.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3683, over 4030825.62 frames. ], batch size: 44, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:48:11,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 12:48:21,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.267e+02 2.379e+02 2.498e+02 4.743e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 12:48:34,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-18 12:48:52,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=763005.3333333334, ans=0.0 2024-09-18 12:49:09,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=763033.6666666666, ans=0.035 2024-09-18 12:49:12,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-18 12:49:13,297 INFO [train.py:1198] (1/2) Epoch 43, batch 900, loss[loss=0.1879, ctc_loss=0.1226, cr_loss=0.3265, over 20964.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1423, cr_loss=0.366, over 4049334.91 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:49:36,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=763090.3333333334, ans=0.125 2024-09-18 12:49:48,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=763118.6666666666, ans=0.015 2024-09-18 12:50:31,576 INFO [train.py:1198] (1/2) Epoch 43, batch 950, loss[loss=0.2334, ctc_loss=0.1567, cr_loss=0.3836, over 20870.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3682, over 4058491.78 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:50:56,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.275e+02 2.427e+02 2.617e+02 7.443e+02, threshold=4.854e+02, percent-clipped=1.0 2024-09-18 12:51:32,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=763317.0, ans=0.125 2024-09-18 12:51:33,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-18 12:51:47,511 INFO [train.py:1198] (1/2) Epoch 43, batch 1000, loss[loss=0.2145, ctc_loss=0.1402, cr_loss=0.3716, over 20912.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3684, over 4062693.53 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:51:55,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=763345.3333333334, ans=0.125 2024-09-18 12:52:00,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=763345.3333333334, ans=0.95 2024-09-18 12:52:07,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=763373.6666666666, ans=0.2 2024-09-18 12:52:18,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=763402.0, ans=0.125 2024-09-18 12:53:02,640 INFO [train.py:1198] (1/2) Epoch 43, batch 1050, loss[loss=0.2763, ctc_loss=0.1921, cr_loss=0.4208, over 14216.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.368, over 4076252.03 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:53:26,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.289e+02 2.431e+02 2.556e+02 3.579e+02, threshold=4.862e+02, percent-clipped=0.0 2024-09-18 12:53:38,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=763543.6666666666, ans=0.125 2024-09-18 12:53:55,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2024-09-18 12:54:19,779 INFO [train.py:1198] (1/2) Epoch 43, batch 1100, loss[loss=0.2484, ctc_loss=0.1634, cr_loss=0.4251, over 21011.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1452, cr_loss=0.3712, over 4074891.31 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:55:06,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=763713.6666666666, ans=0.0 2024-09-18 12:55:07,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=763713.6666666666, ans=0.125 2024-09-18 12:55:16,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=763713.6666666666, ans=0.1 2024-09-18 12:55:24,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-18 12:55:28,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=763742.0, ans=0.035 2024-09-18 12:55:28,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763742.0, ans=0.1 2024-09-18 12:55:37,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=763770.3333333334, ans=0.0 2024-09-18 12:55:38,339 INFO [train.py:1198] (1/2) Epoch 43, batch 1150, loss[loss=0.169, ctc_loss=0.1111, cr_loss=0.2894, over 19861.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1451, cr_loss=0.371, over 4075448.06 frames. ], batch size: 44, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:56:02,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.226e+02 2.401e+02 2.541e+02 3.971e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 12:56:40,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=763883.6666666666, ans=0.2 2024-09-18 12:56:46,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=763883.6666666666, ans=0.125 2024-09-18 12:56:53,598 INFO [train.py:1198] (1/2) Epoch 43, batch 1200, loss[loss=0.223, ctc_loss=0.1487, cr_loss=0.3711, over 20950.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.37, over 4088648.71 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:57:25,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=763968.6666666666, ans=0.0 2024-09-18 12:57:44,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-18 12:57:49,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=8.0 2024-09-18 12:58:05,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764025.3333333334, ans=0.125 2024-09-18 12:58:09,445 INFO [train.py:1198] (1/2) Epoch 43, batch 1250, loss[loss=0.186, ctc_loss=0.1202, cr_loss=0.329, over 20981.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3689, over 4091911.51 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:58:15,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=764053.6666666666, ans=0.0 2024-09-18 12:58:30,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-09-18 12:58:34,045 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.215e+02 2.350e+02 2.505e+02 4.533e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 12:58:34,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-18 12:59:28,288 INFO [train.py:1198] (1/2) Epoch 43, batch 1300, loss[loss=0.1728, ctc_loss=0.1095, cr_loss=0.3166, over 20949.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3684, over 4090715.98 frames. ], batch size: 48, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:59:34,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 12:59:51,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=12.0 2024-09-18 13:00:22,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=764280.3333333334, ans=0.5 2024-09-18 13:00:29,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=764308.6666666666, ans=0.125 2024-09-18 13:00:43,164 INFO [train.py:1198] (1/2) Epoch 43, batch 1350, loss[loss=0.219, ctc_loss=0.1469, cr_loss=0.3605, over 20720.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3695, over 4078153.79 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:00:43,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=764337.0, ans=0.2 2024-09-18 13:01:07,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=764365.3333333334, ans=0.125 2024-09-18 13:01:09,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=764365.3333333334, ans=0.05 2024-09-18 13:01:10,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.271e+02 2.416e+02 2.564e+02 3.017e+02, threshold=4.832e+02, percent-clipped=0.0 2024-09-18 13:01:18,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764393.6666666666, ans=0.1 2024-09-18 13:01:21,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764393.6666666666, ans=0.1 2024-09-18 13:02:02,023 INFO [train.py:1198] (1/2) Epoch 43, batch 1400, loss[loss=0.182, ctc_loss=0.1188, cr_loss=0.3159, over 20977.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.37, over 4084827.74 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:02:46,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=764563.6666666666, ans=0.0 2024-09-18 13:02:46,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764563.6666666666, ans=0.1 2024-09-18 13:03:18,093 INFO [train.py:1198] (1/2) Epoch 43, batch 1450, loss[loss=0.2314, ctc_loss=0.1519, cr_loss=0.3974, over 20941.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3692, over 4092827.55 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:03:18,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=764620.3333333334, ans=0.0 2024-09-18 13:03:18,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764620.3333333334, ans=0.1 2024-09-18 13:03:41,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.233e+02 2.357e+02 2.504e+02 4.834e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-18 13:03:50,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=22.5 2024-09-18 13:04:00,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-18 13:04:12,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2024-09-18 13:04:33,332 INFO [train.py:1198] (1/2) Epoch 43, batch 1500, loss[loss=0.2524, ctc_loss=0.1675, cr_loss=0.4242, over 19546.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3695, over 4101861.51 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:04:53,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-18 13:05:34,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2024-09-18 13:05:49,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2024-09-18 13:05:51,700 INFO [train.py:1198] (1/2) Epoch 43, batch 1550, loss[loss=0.2008, ctc_loss=0.1309, cr_loss=0.3492, over 20971.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3696, over 4109299.10 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:05:53,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=764903.6666666666, ans=0.025 2024-09-18 13:06:15,984 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.255e+02 2.371e+02 2.529e+02 3.069e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 13:06:25,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=764960.3333333334, ans=0.125 2024-09-18 13:06:32,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-18 13:06:42,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=764988.6666666666, ans=0.125 2024-09-18 13:06:48,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=764988.6666666666, ans=0.125 2024-09-18 13:07:10,823 INFO [train.py:1198] (1/2) Epoch 43, batch 1600, loss[loss=0.2098, ctc_loss=0.1394, cr_loss=0.3518, over 20770.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3706, over 4099821.78 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:07:24,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=765073.6666666666, ans=0.125 2024-09-18 13:07:34,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=765073.6666666666, ans=0.125 2024-09-18 13:07:35,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=765073.6666666666, ans=0.2 2024-09-18 13:08:26,679 INFO [train.py:1198] (1/2) Epoch 43, batch 1650, loss[loss=0.2355, ctc_loss=0.1559, cr_loss=0.3979, over 21075.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.3699, over 4110123.61 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:08:28,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=765187.0, ans=0.125 2024-09-18 13:08:50,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.206e+02 2.330e+02 2.465e+02 8.182e+02, threshold=4.660e+02, percent-clipped=1.0 2024-09-18 13:09:15,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=765272.0, ans=0.2 2024-09-18 13:09:31,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=765300.3333333334, ans=0.0 2024-09-18 13:09:37,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=765300.3333333334, ans=0.125 2024-09-18 13:09:39,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-18 13:09:41,789 INFO [train.py:1198] (1/2) Epoch 43, batch 1700, loss[loss=0.1891, ctc_loss=0.1219, cr_loss=0.3356, over 20376.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3682, over 4103873.94 frames. ], batch size: 45, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:10:03,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=765357.0, ans=0.125 2024-09-18 13:10:18,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=765385.3333333334, ans=0.125 2024-09-18 13:10:41,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=765442.0, ans=0.0 2024-09-18 13:11:00,954 INFO [train.py:1198] (1/2) Epoch 43, batch 1750, loss[loss=0.24, ctc_loss=0.1617, cr_loss=0.3915, over 20145.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3676, over 4095885.02 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:11:17,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=765498.6666666666, ans=0.0 2024-09-18 13:11:26,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.225e+02 2.337e+02 2.490e+02 3.252e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-18 13:12:16,551 INFO [train.py:1198] (1/2) Epoch 43, batch 1800, loss[loss=0.2685, ctc_loss=0.1925, cr_loss=0.3804, over 14072.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.3667, over 4093568.48 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:12:59,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=765668.6666666666, ans=10.0 2024-09-18 13:13:05,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=765697.0, ans=0.0 2024-09-18 13:13:35,109 INFO [train.py:1198] (1/2) Epoch 43, batch 1850, loss[loss=0.2434, ctc_loss=0.1645, cr_loss=0.3949, over 20682.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1431, cr_loss=0.3668, over 4092327.51 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:13:37,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-18 13:13:46,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765753.6666666666, ans=0.125 2024-09-18 13:14:00,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.224e+02 2.325e+02 2.496e+02 3.243e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 13:14:18,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=765838.6666666666, ans=0.125 2024-09-18 13:14:20,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-18 13:14:49,934 INFO [train.py:1198] (1/2) Epoch 43, batch 1900, loss[loss=0.2553, ctc_loss=0.1711, cr_loss=0.4209, over 18376.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3688, over 4094071.39 frames. ], batch size: 108, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:15:03,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=765923.6666666666, ans=0.125 2024-09-18 13:15:55,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-18 13:16:05,626 INFO [train.py:1198] (1/2) Epoch 43, batch 1950, loss[loss=0.2048, ctc_loss=0.1354, cr_loss=0.347, over 21054.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3693, over 4095384.82 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:16:34,182 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.258e+02 2.401e+02 2.571e+02 3.138e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 13:17:10,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766150.3333333334, ans=0.1 2024-09-18 13:17:16,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766150.3333333334, ans=0.1 2024-09-18 13:17:20,053 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:17:23,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=766178.6666666666, ans=0.2 2024-09-18 13:17:24,135 INFO [train.py:1198] (1/2) Epoch 43, batch 2000, loss[loss=0.2142, ctc_loss=0.1435, cr_loss=0.3539, over 21085.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3697, over 4090677.42 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:17:30,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=766178.6666666666, ans=0.09899494936611666 2024-09-18 13:17:33,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=766178.6666666666, ans=0.04949747468305833 2024-09-18 13:17:56,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=766235.3333333334, ans=0.07 2024-09-18 13:18:42,751 INFO [train.py:1198] (1/2) Epoch 43, batch 2050, loss[loss=0.2138, ctc_loss=0.1396, cr_loss=0.3712, over 20889.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3683, over 4090681.79 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:18:47,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-09-18 13:19:01,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=766348.6666666666, ans=0.125 2024-09-18 13:19:09,890 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.229e+02 2.367e+02 2.513e+02 4.426e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 13:19:16,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=766377.0, ans=0.05 2024-09-18 13:19:54,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2024-09-18 13:19:58,243 INFO [train.py:1198] (1/2) Epoch 43, batch 2100, loss[loss=0.2469, ctc_loss=0.1626, cr_loss=0.4215, over 20984.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.3698, over 4090537.70 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:20:08,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-09-18 13:20:18,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-09-18 13:20:21,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=766490.3333333334, ans=0.125 2024-09-18 13:21:14,254 INFO [train.py:1198] (1/2) Epoch 43, batch 2150, loss[loss=0.1752, ctc_loss=0.113, cr_loss=0.3111, over 20967.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3689, over 4102543.22 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:21:41,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.228e+02 2.360e+02 2.498e+02 9.463e+02, threshold=4.720e+02, percent-clipped=1.0 2024-09-18 13:21:43,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=766660.3333333334, ans=0.125 2024-09-18 13:21:46,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=22.5 2024-09-18 13:22:16,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=766717.0, ans=0.125 2024-09-18 13:22:16,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=766717.0, ans=0.2 2024-09-18 13:22:32,901 INFO [train.py:1198] (1/2) Epoch 43, batch 2200, loss[loss=0.1922, ctc_loss=0.1281, cr_loss=0.3204, over 20991.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3685, over 4095593.83 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:22:47,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766773.6666666666, ans=0.1 2024-09-18 13:23:09,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=766802.0, ans=0.125 2024-09-18 13:23:43,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766858.6666666666, ans=0.1 2024-09-18 13:23:48,831 INFO [train.py:1198] (1/2) Epoch 43, batch 2250, loss[loss=0.1843, ctc_loss=0.1208, cr_loss=0.3173, over 20986.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3677, over 4104171.90 frames. ], batch size: 49, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:23:50,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=766887.0, ans=0.125 2024-09-18 13:24:19,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.235e+02 2.370e+02 2.558e+02 3.752e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 13:24:31,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=766943.6666666666, ans=0.0 2024-09-18 13:24:34,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766943.6666666666, ans=0.1 2024-09-18 13:24:47,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=766972.0, ans=0.125 2024-09-18 13:25:08,190 INFO [train.py:1198] (1/2) Epoch 43, batch 2300, loss[loss=0.1843, ctc_loss=0.1187, cr_loss=0.3279, over 20962.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1422, cr_loss=0.366, over 4110432.92 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:25:14,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-18 13:25:54,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767113.6666666666, ans=0.1 2024-09-18 13:25:59,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2024-09-18 13:26:24,634 INFO [train.py:1198] (1/2) Epoch 43, batch 2350, loss[loss=0.227, ctc_loss=0.1535, cr_loss=0.3675, over 20335.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.3662, over 4118595.15 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:26:51,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.207e+02 2.330e+02 2.481e+02 3.401e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-18 13:27:11,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=767255.3333333334, ans=0.0 2024-09-18 13:27:28,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=22.5 2024-09-18 13:27:39,656 INFO [train.py:1198] (1/2) Epoch 43, batch 2400, loss[loss=0.2224, ctc_loss=0.1483, cr_loss=0.3705, over 20333.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1419, cr_loss=0.3648, over 4107388.40 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:27:53,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=767340.3333333334, ans=0.125 2024-09-18 13:28:13,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767368.6666666666, ans=0.125 2024-09-18 13:28:24,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-18 13:28:43,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-18 13:28:55,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767425.3333333334, ans=0.1 2024-09-18 13:28:55,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=767425.3333333334, ans=0.0 2024-09-18 13:28:58,190 INFO [train.py:1198] (1/2) Epoch 43, batch 2450, loss[loss=0.2439, ctc_loss=0.1623, cr_loss=0.408, over 20808.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.3658, over 4101089.11 frames. ], batch size: 65, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:29:22,273 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:29:24,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.236e+02 2.402e+02 2.533e+02 4.948e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 13:30:10,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=767567.0, ans=0.0 2024-09-18 13:30:15,866 INFO [train.py:1198] (1/2) Epoch 43, batch 2500, loss[loss=0.1953, ctc_loss=0.1299, cr_loss=0.3269, over 20989.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.143, cr_loss=0.3661, over 4083689.56 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:30:18,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-18 13:31:03,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=767680.3333333334, ans=0.125 2024-09-18 13:31:13,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-09-18 13:31:19,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=767708.6666666666, ans=0.125 2024-09-18 13:31:22,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=767708.6666666666, ans=0.2 2024-09-18 13:31:31,300 INFO [train.py:1198] (1/2) Epoch 43, batch 2550, loss[loss=0.2122, ctc_loss=0.1409, cr_loss=0.3566, over 20786.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1427, cr_loss=0.3654, over 4083134.37 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:31:58,432 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.306e+02 2.404e+02 2.637e+02 4.725e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 13:32:15,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=767822.0, ans=0.0 2024-09-18 13:32:35,139 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:32:46,778 INFO [train.py:1198] (1/2) Epoch 43, batch 2600, loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.374, over 20625.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1434, cr_loss=0.3663, over 4080180.73 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:32:53,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=767878.6666666666, ans=0.0 2024-09-18 13:33:06,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=767907.0, ans=0.125 2024-09-18 13:33:26,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=767935.3333333334, ans=0.07 2024-09-18 13:33:35,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767963.6666666666, ans=0.125 2024-09-18 13:33:46,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=767963.6666666666, ans=0.125 2024-09-18 13:34:01,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=767992.0, ans=0.0 2024-09-18 13:34:05,733 INFO [train.py:1198] (1/2) Epoch 43, batch 2650, loss[loss=0.2089, ctc_loss=0.1376, cr_loss=0.3564, over 21091.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.143, cr_loss=0.3653, over 4070680.44 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:34:20,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=768048.6666666666, ans=0.0 2024-09-18 13:34:21,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-09-18 13:34:22,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=768048.6666666666, ans=0.125 2024-09-18 13:34:34,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.262e+02 2.370e+02 2.506e+02 3.502e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 13:34:41,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=22.5 2024-09-18 13:35:08,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=768133.6666666666, ans=0.125 2024-09-18 13:35:12,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=768133.6666666666, ans=0.0 2024-09-18 13:35:20,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=768162.0, ans=0.0 2024-09-18 13:35:21,843 INFO [train.py:1198] (1/2) Epoch 43, batch 2700, loss[loss=0.1971, ctc_loss=0.1309, cr_loss=0.3309, over 20792.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1411, cr_loss=0.363, over 4091580.36 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:35:46,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-18 13:36:40,457 INFO [train.py:1198] (1/2) Epoch 43, batch 2750, loss[loss=0.1663, ctc_loss=0.1079, cr_loss=0.2922, over 20946.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1409, cr_loss=0.3626, over 4091319.40 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:37:09,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.240e+02 2.347e+02 2.535e+02 3.931e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 13:37:27,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=768388.6666666666, ans=0.0 2024-09-18 13:37:56,224 INFO [train.py:1198] (1/2) Epoch 43, batch 2800, loss[loss=0.2095, ctc_loss=0.1373, cr_loss=0.3612, over 20778.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.141, cr_loss=0.3632, over 4092153.59 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:37:58,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=768445.3333333334, ans=0.0 2024-09-18 13:38:02,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-18 13:38:10,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=22.5 2024-09-18 13:38:11,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=768473.6666666666, ans=0.125 2024-09-18 13:38:26,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=768502.0, ans=0.0 2024-09-18 13:38:26,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768502.0, ans=0.1 2024-09-18 13:39:12,093 INFO [train.py:1198] (1/2) Epoch 43, batch 2850, loss[loss=0.2078, ctc_loss=0.139, cr_loss=0.3436, over 21061.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1419, cr_loss=0.3652, over 4095411.40 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:39:12,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=768587.0, ans=0.125 2024-09-18 13:39:14,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=768587.0, ans=0.09899494936611666 2024-09-18 13:39:33,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2024-09-18 13:39:43,055 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.235e+02 2.352e+02 2.505e+02 3.686e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 13:40:10,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=768672.0, ans=0.0 2024-09-18 13:40:23,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-18 13:40:30,164 INFO [train.py:1198] (1/2) Epoch 43, batch 2900, loss[loss=0.2371, ctc_loss=0.1556, cr_loss=0.4074, over 20961.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1432, cr_loss=0.3674, over 4094260.71 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:41:03,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=768785.3333333334, ans=0.05 2024-09-18 13:41:12,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=768785.3333333334, ans=0.035 2024-09-18 13:41:32,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=768842.0, ans=0.125 2024-09-18 13:41:34,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=768842.0, ans=0.125 2024-09-18 13:41:48,981 INFO [train.py:1198] (1/2) Epoch 43, batch 2950, loss[loss=0.2349, ctc_loss=0.1574, cr_loss=0.3875, over 20680.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1425, cr_loss=0.3667, over 4104639.37 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:41:50,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=768870.3333333334, ans=0.0 2024-09-18 13:42:07,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=768898.6666666666, ans=0.125 2024-09-18 13:42:17,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.226e+02 2.369e+02 2.532e+02 3.433e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 13:42:28,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=768927.0, ans=0.125 2024-09-18 13:42:29,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=768927.0, ans=0.0 2024-09-18 13:43:00,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=768983.6666666666, ans=0.125 2024-09-18 13:43:04,468 INFO [train.py:1198] (1/2) Epoch 43, batch 3000, loss[loss=0.2402, ctc_loss=0.1693, cr_loss=0.3544, over 14527.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.3662, over 4092797.22 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:43:04,468 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 13:43:25,973 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.469e-14, over 944034.00 frames. 2024-09-18 13:43:25,974 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 13:43:36,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769012.0, ans=0.125 2024-09-18 13:44:19,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=769097.0, ans=0.125 2024-09-18 13:44:28,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=769125.3333333334, ans=0.125 2024-09-18 13:44:30,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769125.3333333334, ans=0.125 2024-09-18 13:44:33,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=769125.3333333334, ans=0.125 2024-09-18 13:44:39,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769125.3333333334, ans=0.1 2024-09-18 13:44:42,299 INFO [train.py:1198] (1/2) Epoch 43, batch 3050, loss[loss=0.2311, ctc_loss=0.1542, cr_loss=0.3845, over 20634.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.3664, over 4103825.47 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:44:48,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=769153.6666666666, ans=0.0 2024-09-18 13:45:14,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.197e+02 2.343e+02 2.553e+02 3.195e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 13:45:29,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=769238.6666666666, ans=0.125 2024-09-18 13:45:44,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=769267.0, ans=0.0 2024-09-18 13:46:00,960 INFO [train.py:1198] (1/2) Epoch 43, batch 3100, loss[loss=0.1908, ctc_loss=0.1257, cr_loss=0.3251, over 20986.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1426, cr_loss=0.3665, over 4096630.01 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:46:10,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2024-09-18 13:46:11,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=769295.3333333334, ans=0.125 2024-09-18 13:46:13,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769295.3333333334, ans=0.0 2024-09-18 13:47:18,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-09-18 13:47:19,106 INFO [train.py:1198] (1/2) Epoch 43, batch 3150, loss[loss=0.2399, ctc_loss=0.1598, cr_loss=0.4007, over 20968.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3678, over 4099138.32 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:47:47,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.286e+02 2.429e+02 2.564e+02 3.660e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-18 13:48:35,302 INFO [train.py:1198] (1/2) Epoch 43, batch 3200, loss[loss=0.1829, ctc_loss=0.1187, cr_loss=0.3211, over 20960.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.3671, over 4103901.79 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:48:43,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=769578.6666666666, ans=0.0 2024-09-18 13:49:32,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-18 13:49:51,028 INFO [train.py:1198] (1/2) Epoch 43, batch 3250, loss[loss=0.2334, ctc_loss=0.1539, cr_loss=0.3975, over 20673.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3673, over 4098939.00 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:49:55,918 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:50:12,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=769748.6666666666, ans=0.125 2024-09-18 13:50:19,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.252e+02 2.376e+02 2.544e+02 3.528e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 13:50:41,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=769805.3333333334, ans=0.125 2024-09-18 13:51:09,690 INFO [train.py:1198] (1/2) Epoch 43, batch 3300, loss[loss=0.1912, ctc_loss=0.125, cr_loss=0.3314, over 20904.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.366, over 4101294.07 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:51:11,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-18 13:52:21,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769975.3333333334, ans=0.125 2024-09-18 13:52:25,221 INFO [train.py:1198] (1/2) Epoch 43, batch 3350, loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3704, over 20874.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1414, cr_loss=0.3651, over 4105237.45 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:52:43,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=12.0 2024-09-18 13:52:56,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.168e+02 2.314e+02 2.443e+02 3.479e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-18 13:53:07,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=770060.3333333334, ans=0.125 2024-09-18 13:53:14,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=770088.6666666666, ans=0.125 2024-09-18 13:53:23,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=770088.6666666666, ans=0.0 2024-09-18 13:53:29,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-18 13:53:31,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-09-18 13:53:44,226 INFO [train.py:1198] (1/2) Epoch 43, batch 3400, loss[loss=0.2084, ctc_loss=0.1377, cr_loss=0.3531, over 21074.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1415, cr_loss=0.3653, over 4103396.50 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:53:56,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=770145.3333333334, ans=0.0 2024-09-18 13:54:04,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=770173.6666666666, ans=0.025 2024-09-18 13:54:27,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=770202.0, ans=0.125 2024-09-18 13:54:36,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=770230.3333333334, ans=0.07 2024-09-18 13:54:49,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770258.6666666666, ans=0.1 2024-09-18 13:54:59,950 INFO [train.py:1198] (1/2) Epoch 43, batch 3450, loss[loss=0.2206, ctc_loss=0.1451, cr_loss=0.3772, over 20656.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1412, cr_loss=0.3643, over 4101014.22 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:55:03,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=770287.0, ans=0.125 2024-09-18 13:55:05,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-18 13:55:15,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770315.3333333334, ans=0.125 2024-09-18 13:55:28,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.184e+02 2.337e+02 2.487e+02 5.895e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-18 13:55:29,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-18 13:55:47,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2024-09-18 13:56:15,461 INFO [train.py:1198] (1/2) Epoch 43, batch 3500, loss[loss=0.2406, ctc_loss=0.1643, cr_loss=0.3813, over 14765.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1417, cr_loss=0.3654, over 4090339.68 frames. ], batch size: 150, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:56:18,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770428.6666666666, ans=0.1 2024-09-18 13:56:24,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770428.6666666666, ans=0.125 2024-09-18 13:56:26,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=770428.6666666666, ans=0.0 2024-09-18 13:56:38,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=12.0 2024-09-18 13:57:09,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=22.5 2024-09-18 13:57:34,523 INFO [train.py:1198] (1/2) Epoch 43, batch 3550, loss[loss=0.2106, ctc_loss=0.1389, cr_loss=0.3587, over 20688.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3667, over 4097460.89 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:58:02,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.200e+02 2.321e+02 2.480e+02 3.083e+02, threshold=4.642e+02, percent-clipped=0.0 2024-09-18 13:58:21,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=770655.3333333334, ans=0.125 2024-09-18 13:58:32,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-09-18 13:58:45,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=770683.6666666666, ans=0.125 2024-09-18 13:58:54,119 INFO [train.py:1198] (1/2) Epoch 43, batch 3600, loss[loss=0.2255, ctc_loss=0.1512, cr_loss=0.3715, over 20780.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3671, over 4104115.26 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:58:55,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=770712.0, ans=0.025 2024-09-18 13:58:58,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=770712.0, ans=0.07 2024-09-18 13:59:01,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=770712.0, ans=0.2 2024-09-18 13:59:29,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=770768.6666666666, ans=0.0 2024-09-18 13:59:40,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-09-18 13:59:43,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770797.0, ans=0.125 2024-09-18 13:59:56,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=770825.3333333334, ans=0.1 2024-09-18 13:59:58,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770825.3333333334, ans=0.125 2024-09-18 14:00:09,871 INFO [train.py:1198] (1/2) Epoch 43, batch 3650, loss[loss=0.2185, ctc_loss=0.1456, cr_loss=0.3647, over 21004.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1427, cr_loss=0.367, over 4103256.46 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:00:38,661 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.228e+02 2.379e+02 2.572e+02 3.473e+02, threshold=4.759e+02, percent-clipped=0.0 2024-09-18 14:00:39,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:00:39,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=770910.3333333334, ans=0.2 2024-09-18 14:00:58,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=8.0 2024-09-18 14:01:03,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=770938.6666666666, ans=0.125 2024-09-18 14:01:25,996 INFO [train.py:1198] (1/2) Epoch 43, batch 3700, loss[loss=0.2377, ctc_loss=0.1614, cr_loss=0.3816, over 20942.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3673, over 4101015.96 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:01:56,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=771052.0, ans=0.125 2024-09-18 14:02:00,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=771052.0, ans=0.0 2024-09-18 14:02:11,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=771080.3333333334, ans=0.025 2024-09-18 14:02:37,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=771108.6666666666, ans=0.0 2024-09-18 14:02:41,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771108.6666666666, ans=0.1 2024-09-18 14:02:44,077 INFO [train.py:1198] (1/2) Epoch 43, batch 3750, loss[loss=0.1901, ctc_loss=0.1214, cr_loss=0.3434, over 20976.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3675, over 4099803.53 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:02:53,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=771137.0, ans=0.125 2024-09-18 14:03:07,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771165.3333333334, ans=0.125 2024-09-18 14:03:13,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.209e+02 2.315e+02 2.491e+02 3.607e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 14:03:18,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=771193.6666666666, ans=15.0 2024-09-18 14:03:39,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=12.0 2024-09-18 14:03:48,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=771250.3333333334, ans=0.125 2024-09-18 14:03:55,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=771250.3333333334, ans=0.125 2024-09-18 14:03:59,952 INFO [train.py:1198] (1/2) Epoch 43, batch 3800, loss[loss=0.1911, ctc_loss=0.1235, cr_loss=0.3381, over 20984.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3674, over 4093863.77 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:04:21,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=771307.0, ans=0.04949747468305833 2024-09-18 14:05:05,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-18 14:05:18,832 INFO [train.py:1198] (1/2) Epoch 43, batch 3850, loss[loss=0.2139, ctc_loss=0.1399, cr_loss=0.3699, over 20989.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3676, over 4102455.09 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:05:48,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-18 14:05:49,392 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.271e+02 2.397e+02 2.599e+02 4.820e+02, threshold=4.793e+02, percent-clipped=1.0 2024-09-18 14:06:00,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5 2024-09-18 14:06:05,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=22.5 2024-09-18 14:06:09,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=771505.3333333334, ans=0.125 2024-09-18 14:06:32,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=771533.6666666666, ans=0.125 2024-09-18 14:06:33,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=771562.0, ans=0.0 2024-09-18 14:06:34,761 INFO [train.py:1198] (1/2) Epoch 43, batch 3900, loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3395, over 20350.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3688, over 4108286.10 frames. ], batch size: 45, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:06:36,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771562.0, ans=0.1 2024-09-18 14:07:08,442 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:07:35,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771675.3333333334, ans=0.1 2024-09-18 14:07:50,645 INFO [train.py:1198] (1/2) Epoch 43, batch 3950, loss[loss=0.241, ctc_loss=0.1624, cr_loss=0.3931, over 20670.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3686, over 4113240.31 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:08:22,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=771760.3333333334, ans=0.2 2024-09-18 14:08:23,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.259e+02 2.381e+02 2.565e+02 3.615e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-18 14:08:27,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=771760.3333333334, ans=0.125 2024-09-18 14:08:38,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-09-18 14:08:57,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=771817.0, ans=0.0 2024-09-18 14:08:59,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=22.5 2024-09-18 14:09:09,459 INFO [train.py:1198] (1/2) Epoch 43, batch 4000, loss[loss=0.2307, ctc_loss=0.1513, cr_loss=0.3971, over 21084.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3693, over 4103769.15 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:09:22,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=15.0 2024-09-18 14:10:11,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=771958.6666666666, ans=0.0 2024-09-18 14:10:17,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=771958.6666666666, ans=0.2 2024-09-18 14:10:25,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=771958.6666666666, ans=0.125 2024-09-18 14:10:28,031 INFO [train.py:1198] (1/2) Epoch 43, batch 4050, loss[loss=0.1875, ctc_loss=0.1211, cr_loss=0.3321, over 20987.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.371, over 4091657.42 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:10:28,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=771987.0, ans=0.125 2024-09-18 14:10:58,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.209e+02 2.348e+02 2.489e+02 3.597e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 14:11:02,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=772043.6666666666, ans=0.125 2024-09-18 14:11:17,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=772072.0, ans=0.125 2024-09-18 14:11:22,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=772072.0, ans=0.2 2024-09-18 14:11:38,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-18 14:11:43,630 INFO [train.py:1198] (1/2) Epoch 43, batch 4100, loss[loss=0.1933, ctc_loss=0.1277, cr_loss=0.3283, over 21056.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3705, over 4090507.34 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:11:45,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=772128.6666666666, ans=0.125 2024-09-18 14:11:52,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=772128.6666666666, ans=0.1 2024-09-18 14:11:53,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-18 14:11:56,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=772128.6666666666, ans=0.0 2024-09-18 14:12:08,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772157.0, ans=0.125 2024-09-18 14:12:18,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=772185.3333333334, ans=0.2 2024-09-18 14:12:18,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=772185.3333333334, ans=0.2 2024-09-18 14:12:23,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=772185.3333333334, ans=0.0 2024-09-18 14:12:37,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=772213.6666666666, ans=0.125 2024-09-18 14:12:46,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-09-18 14:12:54,045 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:12:59,575 INFO [train.py:1198] (1/2) Epoch 43, batch 4150, loss[loss=0.2341, ctc_loss=0.1562, cr_loss=0.3892, over 19538.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3694, over 4098866.53 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:13:25,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=772298.6666666666, ans=0.125 2024-09-18 14:13:29,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.236e+02 2.358e+02 2.482e+02 3.933e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:13:58,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=15.0 2024-09-18 14:13:58,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772355.3333333334, ans=0.125 2024-09-18 14:14:17,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=772412.0, ans=0.2 2024-09-18 14:14:18,349 INFO [train.py:1198] (1/2) Epoch 43, batch 4200, loss[loss=0.2479, ctc_loss=0.1651, cr_loss=0.4141, over 20892.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3699, over 4098655.41 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:14:32,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=772440.3333333334, ans=0.0 2024-09-18 14:14:50,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=772468.6666666666, ans=0.125 2024-09-18 14:14:57,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-18 14:15:33,998 INFO [train.py:1198] (1/2) Epoch 43, batch 4250, loss[loss=0.2763, ctc_loss=0.1872, cr_loss=0.4458, over 18349.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3691, over 4102859.36 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:16:07,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.245e+02 2.375e+02 2.541e+02 3.751e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 14:16:27,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-18 14:16:50,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2024-09-18 14:16:52,661 INFO [train.py:1198] (1/2) Epoch 43, batch 4300, loss[loss=0.2415, ctc_loss=0.1608, cr_loss=0.4037, over 20005.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3678, over 4113057.16 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:17:17,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772723.6666666666, ans=0.1 2024-09-18 14:17:38,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772780.3333333334, ans=0.125 2024-09-18 14:18:07,979 INFO [train.py:1198] (1/2) Epoch 43, batch 4350, loss[loss=0.2941, ctc_loss=0.212, cr_loss=0.4108, over 13782.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3675, over 4097647.37 frames. ], batch size: 149, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:18:22,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2024-09-18 14:18:23,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=772865.3333333334, ans=0.2 2024-09-18 14:18:34,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=772865.3333333334, ans=0.0 2024-09-18 14:18:38,357 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.255e+02 2.375e+02 2.544e+02 5.936e+02, threshold=4.750e+02, percent-clipped=1.0 2024-09-18 14:18:40,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2024-09-18 14:19:27,223 INFO [train.py:1198] (1/2) Epoch 43, batch 4400, loss[loss=0.213, ctc_loss=0.1374, cr_loss=0.3778, over 20657.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3678, over 4105327.95 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:19:38,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=772978.6666666666, ans=0.125 2024-09-18 14:19:39,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=772978.6666666666, ans=0.2 2024-09-18 14:19:55,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=773007.0, ans=0.125 2024-09-18 14:19:56,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-18 14:20:03,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=773035.3333333334, ans=0.125 2024-09-18 14:20:43,124 INFO [train.py:1198] (1/2) Epoch 43, batch 4450, loss[loss=0.2287, ctc_loss=0.1499, cr_loss=0.3939, over 21016.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3679, over 4103551.60 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:20:43,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 14:20:57,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=773148.6666666666, ans=0.0 2024-09-18 14:21:02,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=773148.6666666666, ans=0.025 2024-09-18 14:21:13,095 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.242e+02 2.358e+02 2.562e+02 3.848e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:21:19,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-09-18 14:21:56,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-18 14:22:01,235 INFO [train.py:1198] (1/2) Epoch 43, batch 4500, loss[loss=0.1865, ctc_loss=0.1173, cr_loss=0.3464, over 19860.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.142, cr_loss=0.3663, over 4108919.14 frames. ], batch size: 44, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:22:48,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-18 14:23:16,384 INFO [train.py:1198] (1/2) Epoch 43, batch 4550, loss[loss=0.2489, ctc_loss=0.166, cr_loss=0.4142, over 19574.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3675, over 4102430.79 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:23:18,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=773403.6666666666, ans=0.125 2024-09-18 14:23:46,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.187e+02 2.341e+02 2.498e+02 6.645e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 14:24:08,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773488.6666666666, ans=0.125 2024-09-18 14:24:26,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=773517.0, ans=0.0 2024-09-18 14:24:32,270 INFO [train.py:1198] (1/2) Epoch 43, batch 4600, loss[loss=0.2177, ctc_loss=0.1425, cr_loss=0.376, over 20985.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3682, over 4111268.88 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:24:47,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=773573.6666666666, ans=0.035 2024-09-18 14:24:58,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773573.6666666666, ans=0.1 2024-09-18 14:25:11,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=773602.0, ans=0.0 2024-09-18 14:25:34,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=773658.6666666666, ans=0.125 2024-09-18 14:25:50,618 INFO [train.py:1198] (1/2) Epoch 43, batch 4650, loss[loss=0.2203, ctc_loss=0.1456, cr_loss=0.3734, over 21057.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3679, over 4106999.69 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:25:55,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773687.0, ans=0.1 2024-09-18 14:26:01,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=773687.0, ans=0.125 2024-09-18 14:26:10,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=773715.3333333334, ans=0.2 2024-09-18 14:26:12,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=773715.3333333334, ans=0.0 2024-09-18 14:26:20,821 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.314e+02 2.429e+02 2.559e+02 3.308e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-18 14:26:51,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=773800.3333333334, ans=0.2 2024-09-18 14:26:59,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-09-18 14:27:09,637 INFO [train.py:1198] (1/2) Epoch 43, batch 4700, loss[loss=0.2181, ctc_loss=0.1452, cr_loss=0.3642, over 21075.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3687, over 4098834.48 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:27:29,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=12.0 2024-09-18 14:27:37,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773857.0, ans=0.125 2024-09-18 14:28:24,960 INFO [train.py:1198] (1/2) Epoch 43, batch 4750, loss[loss=0.2245, ctc_loss=0.1482, cr_loss=0.3812, over 20976.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.367, over 4096972.77 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:28:55,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.217e+02 2.357e+02 2.516e+02 3.017e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:29:05,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=22.5 2024-09-18 14:29:06,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-18 14:29:19,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=774055.3333333334, ans=0.125 2024-09-18 14:29:40,666 INFO [train.py:1198] (1/2) Epoch 43, batch 4800, loss[loss=0.2295, ctc_loss=0.1553, cr_loss=0.3707, over 21059.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1424, cr_loss=0.3667, over 4102013.37 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:30:27,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-18 14:30:59,964 INFO [train.py:1198] (1/2) Epoch 43, batch 4850, loss[loss=0.2597, ctc_loss=0.1756, cr_loss=0.4208, over 19324.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.369, over 4102227.83 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:31:24,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=774282.0, ans=0.125 2024-09-18 14:31:30,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.252e+02 2.349e+02 2.491e+02 3.780e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 14:32:16,079 INFO [train.py:1198] (1/2) Epoch 43, batch 4900, loss[loss=0.2292, ctc_loss=0.1516, cr_loss=0.3878, over 21072.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3692, over 4105855.57 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:32:33,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774423.6666666666, ans=0.125 2024-09-18 14:33:10,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=774480.3333333334, ans=0.125 2024-09-18 14:33:10,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=774480.3333333334, ans=0.2 2024-09-18 14:33:34,754 INFO [train.py:1198] (1/2) Epoch 43, batch 4950, loss[loss=0.2543, ctc_loss=0.1738, cr_loss=0.4027, over 13803.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.369, over 4095697.72 frames. ], batch size: 149, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:33:36,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=774537.0, ans=0.0 2024-09-18 14:33:52,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774565.3333333334, ans=0.125 2024-09-18 14:34:05,980 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.233e+02 2.344e+02 2.447e+02 3.997e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 14:34:10,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=774593.6666666666, ans=0.0 2024-09-18 14:34:32,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=774622.0, ans=0.125 2024-09-18 14:34:38,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=774650.3333333334, ans=0.025 2024-09-18 14:34:40,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774650.3333333334, ans=0.125 2024-09-18 14:34:50,432 INFO [train.py:1198] (1/2) Epoch 43, batch 5000, loss[loss=0.2323, ctc_loss=0.1565, cr_loss=0.379, over 19376.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3699, over 4099833.09 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:35:06,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774707.0, ans=0.125 2024-09-18 14:35:57,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774792.0, ans=0.1 2024-09-18 14:36:00,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774792.0, ans=0.1 2024-09-18 14:36:04,297 INFO [train.py:1198] (1/2) Epoch 43, batch 5050, loss[loss=0.1854, ctc_loss=0.1216, cr_loss=0.319, over 20974.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 4097034.53 frames. ], batch size: 48, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:36:07,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=774820.3333333334, ans=0.2 2024-09-18 14:36:35,661 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.244e+02 2.351e+02 2.522e+02 4.464e+02, threshold=4.701e+02, percent-clipped=0.0 2024-09-18 14:36:53,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=774905.3333333334, ans=0.0 2024-09-18 14:36:55,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=774905.3333333334, ans=0.07 2024-09-18 14:36:56,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774905.3333333334, ans=0.125 2024-09-18 14:37:06,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=774933.6666666666, ans=0.125 2024-09-18 14:37:18,366 INFO [train.py:1198] (1/2) Epoch 43, batch 5100, loss[loss=0.2346, ctc_loss=0.1551, cr_loss=0.3978, over 20822.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3679, over 4094836.19 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:37:23,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774962.0, ans=0.1 2024-09-18 14:37:47,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=775018.6666666666, ans=0.0 2024-09-18 14:37:59,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=775018.6666666666, ans=0.125 2024-09-18 14:38:09,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=775047.0, ans=0.0 2024-09-18 14:38:31,709 INFO [train.py:1198] (1/2) Epoch 43, batch 5150, loss[loss=0.2196, ctc_loss=0.1473, cr_loss=0.3615, over 20852.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.368, over 4093370.53 frames. ], batch size: 65, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:38:34,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=775103.6666666666, ans=0.1 2024-09-18 14:38:34,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775103.6666666666, ans=0.1 2024-09-18 14:39:02,365 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.237e+02 2.354e+02 2.501e+02 3.053e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 14:39:48,121 INFO [train.py:1198] (1/2) Epoch 43, batch 5200, loss[loss=0.2386, ctc_loss=0.1572, cr_loss=0.4073, over 20946.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3688, over 4102945.21 frames. ], batch size: 67, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:40:12,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=775273.6666666666, ans=0.025 2024-09-18 14:40:34,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=775330.3333333334, ans=0.0 2024-09-18 14:41:02,542 INFO [train.py:1198] (1/2) Epoch 43, batch 5250, loss[loss=0.2651, ctc_loss=0.1798, cr_loss=0.4262, over 17917.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3682, over 4103938.77 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:41:35,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.242e+02 2.378e+02 2.558e+02 5.799e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 14:41:44,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=775443.6666666666, ans=0.04949747468305833 2024-09-18 14:42:17,332 INFO [train.py:1198] (1/2) Epoch 43, batch 5300, loss[loss=0.2153, ctc_loss=0.1396, cr_loss=0.3786, over 20270.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3673, over 4110299.56 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:42:20,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775528.6666666666, ans=0.1 2024-09-18 14:42:45,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=775557.0, ans=0.0 2024-09-18 14:42:47,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-09-18 14:42:54,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775585.3333333334, ans=0.125 2024-09-18 14:43:03,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=775613.6666666666, ans=0.125 2024-09-18 14:43:03,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-18 14:43:08,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=22.5 2024-09-18 14:43:12,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=775613.6666666666, ans=0.125 2024-09-18 14:43:34,275 INFO [train.py:1198] (1/2) Epoch 43, batch 5350, loss[loss=0.2081, ctc_loss=0.137, cr_loss=0.3555, over 21059.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1427, cr_loss=0.3674, over 4103528.08 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:43:49,648 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:43:49,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-18 14:43:56,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=775698.6666666666, ans=0.0 2024-09-18 14:44:07,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.243e+02 2.344e+02 2.525e+02 3.319e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 14:44:08,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775727.0, ans=0.125 2024-09-18 14:44:24,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2024-09-18 14:44:49,420 INFO [train.py:1198] (1/2) Epoch 43, batch 5400, loss[loss=0.2216, ctc_loss=0.1447, cr_loss=0.3848, over 20851.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3681, over 4102768.19 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:45:37,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=775897.0, ans=0.0 2024-09-18 14:46:03,616 INFO [train.py:1198] (1/2) Epoch 43, batch 5450, loss[loss=0.2145, ctc_loss=0.1425, cr_loss=0.3601, over 20972.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1422, cr_loss=0.3662, over 4107441.02 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:46:36,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.278e+02 2.397e+02 2.570e+02 3.769e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 14:46:36,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=776010.3333333334, ans=0.5 2024-09-18 14:47:18,174 INFO [train.py:1198] (1/2) Epoch 43, batch 5500, loss[loss=0.2161, ctc_loss=0.1432, cr_loss=0.3646, over 20834.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1422, cr_loss=0.3659, over 4104181.14 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:47:27,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776095.3333333334, ans=0.1 2024-09-18 14:47:50,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=776152.0, ans=0.125 2024-09-18 14:48:35,628 INFO [train.py:1198] (1/2) Epoch 43, batch 5550, loss[loss=0.1736, ctc_loss=0.1123, cr_loss=0.3066, over 20950.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1429, cr_loss=0.3667, over 4087928.06 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:49:05,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776293.6666666666, ans=0.1 2024-09-18 14:49:08,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.237e+02 2.345e+02 2.506e+02 4.825e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-18 14:49:48,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2024-09-18 14:49:49,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=776378.6666666666, ans=0.07 2024-09-18 14:49:50,332 INFO [train.py:1198] (1/2) Epoch 43, batch 5600, loss[loss=0.201, ctc_loss=0.1298, cr_loss=0.356, over 21053.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3685, over 4092505.87 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:50:08,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776407.0, ans=0.125 2024-09-18 14:50:21,901 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:50:26,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=776435.3333333334, ans=0.2 2024-09-18 14:50:29,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-18 14:50:51,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=776492.0, ans=0.2 2024-09-18 14:51:01,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=776492.0, ans=0.125 2024-09-18 14:51:04,579 INFO [train.py:1198] (1/2) Epoch 43, batch 5650, loss[loss=0.2024, ctc_loss=0.133, cr_loss=0.3473, over 21091.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1428, cr_loss=0.3672, over 4098508.13 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:51:24,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=776548.6666666666, ans=0.125 2024-09-18 14:51:34,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=776577.0, ans=10.0 2024-09-18 14:51:41,148 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.286e+02 2.408e+02 2.557e+02 4.624e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-18 14:51:53,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=776605.3333333334, ans=0.125 2024-09-18 14:52:09,241 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:52:20,639 INFO [train.py:1198] (1/2) Epoch 43, batch 5700, loss[loss=0.2414, ctc_loss=0.1607, cr_loss=0.4036, over 21009.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.3673, over 4097428.58 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:52:37,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=776690.3333333334, ans=0.125 2024-09-18 14:52:40,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-18 14:52:41,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=776690.3333333334, ans=0.125 2024-09-18 14:52:58,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776718.6666666666, ans=0.1 2024-09-18 14:53:21,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=776775.3333333334, ans=0.125 2024-09-18 14:53:29,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=776775.3333333334, ans=0.125 2024-09-18 14:53:35,239 INFO [train.py:1198] (1/2) Epoch 43, batch 5750, loss[loss=0.1986, ctc_loss=0.1296, cr_loss=0.3453, over 20930.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1423, cr_loss=0.3668, over 4103423.50 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:53:53,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-18 14:53:54,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=776832.0, ans=0.125 2024-09-18 14:53:54,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=776832.0, ans=0.125 2024-09-18 14:54:09,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.180e+02 2.325e+02 2.491e+02 3.054e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 14:54:20,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=776888.6666666666, ans=0.0 2024-09-18 14:54:29,260 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:54:39,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776917.0, ans=0.1 2024-09-18 14:54:49,851 INFO [train.py:1198] (1/2) Epoch 43, batch 5800, loss[loss=0.1919, ctc_loss=0.1235, cr_loss=0.3419, over 20966.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3683, over 4093055.50 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:55:12,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-18 14:55:18,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777002.0, ans=0.1 2024-09-18 14:56:00,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=777058.6666666666, ans=0.2 2024-09-18 14:56:00,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=777058.6666666666, ans=0.0 2024-09-18 14:56:05,030 INFO [train.py:1198] (1/2) Epoch 43, batch 5850, loss[loss=0.2379, ctc_loss=0.1571, cr_loss=0.4038, over 20953.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3679, over 4093859.71 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:56:08,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-09-18 14:56:39,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.219e+02 2.331e+02 2.519e+02 5.601e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-18 14:56:59,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777172.0, ans=0.125 2024-09-18 14:57:19,898 INFO [train.py:1198] (1/2) Epoch 43, batch 5900, loss[loss=0.226, ctc_loss=0.1501, cr_loss=0.3798, over 20756.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.367, over 4101990.85 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:57:31,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 14:58:15,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777313.6666666666, ans=0.1 2024-09-18 14:58:36,428 INFO [train.py:1198] (1/2) Epoch 43, batch 5950, loss[loss=0.2437, ctc_loss=0.1631, cr_loss=0.403, over 20079.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1421, cr_loss=0.3662, over 4095165.59 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:59:04,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=777427.0, ans=0.0 2024-09-18 14:59:10,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.295e+02 2.422e+02 2.594e+02 3.751e+02, threshold=4.844e+02, percent-clipped=0.0 2024-09-18 14:59:34,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=777483.6666666666, ans=0.125 2024-09-18 14:59:50,746 INFO [train.py:1198] (1/2) Epoch 43, batch 6000, loss[loss=0.2153, ctc_loss=0.1417, cr_loss=0.3679, over 20986.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1433, cr_loss=0.3672, over 4095003.61 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:59:50,746 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 15:00:05,014 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6998, 4.7167, 4.6257, 4.1661], device='cuda:1') 2024-09-18 15:00:11,066 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.039, ctc_loss=0.039, cr_loss=1.514e-14, over 944034.00 frames. 2024-09-18 15:00:11,067 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 15:00:29,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=777540.3333333334, ans=0.125 2024-09-18 15:00:46,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=777568.6666666666, ans=0.0 2024-09-18 15:00:54,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=777597.0, ans=0.2 2024-09-18 15:00:59,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 15:01:24,843 INFO [train.py:1198] (1/2) Epoch 43, batch 6050, loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3734, over 21014.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1426, cr_loss=0.3667, over 4099403.17 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 15:01:59,005 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.298e+02 2.460e+02 2.615e+02 3.264e+02, threshold=4.919e+02, percent-clipped=0.0 2024-09-18 15:02:00,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=777710.3333333334, ans=0.0 2024-09-18 15:02:32,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-18 15:02:36,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777767.0, ans=0.1 2024-09-18 15:02:40,534 INFO [train.py:1198] (1/2) Epoch 43, batch 6100, loss[loss=0.2652, ctc_loss=0.1826, cr_loss=0.4133, over 14759.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3685, over 4100965.82 frames. ], batch size: 149, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:02:58,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=777823.6666666666, ans=0.125 2024-09-18 15:03:35,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=777880.3333333334, ans=0.125 2024-09-18 15:03:55,030 INFO [train.py:1198] (1/2) Epoch 43, batch 6150, loss[loss=0.2029, ctc_loss=0.132, cr_loss=0.3542, over 20980.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3694, over 4081997.80 frames. ], batch size: 49, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:04:01,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-18 15:04:28,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.235e+02 2.400e+02 2.551e+02 3.329e+02, threshold=4.800e+02, percent-clipped=0.0 2024-09-18 15:04:45,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=778022.0, ans=0.125 2024-09-18 15:05:08,116 INFO [train.py:1198] (1/2) Epoch 43, batch 6200, loss[loss=0.222, ctc_loss=0.145, cr_loss=0.385, over 20124.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3691, over 4057090.55 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:05:28,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=778107.0, ans=0.2 2024-09-18 15:05:38,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2024-09-18 15:06:23,318 INFO [train.py:1198] (1/2) Epoch 43, batch 6250, loss[loss=0.2723, ctc_loss=0.1832, cr_loss=0.4457, over 18286.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3714, over 4027788.23 frames. ], batch size: 108, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:06:47,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778248.6666666666, ans=0.1 2024-09-18 15:06:54,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778277.0, ans=0.1 2024-09-18 15:06:55,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2024-09-18 15:06:57,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.292e+02 2.418e+02 2.579e+02 4.063e+02, threshold=4.837e+02, percent-clipped=0.0 2024-09-18 15:07:14,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=778305.3333333334, ans=0.0 2024-09-18 15:07:35,717 INFO [train.py:1198] (1/2) Epoch 43, batch 6300, loss[loss=0.2586, ctc_loss=0.1797, cr_loss=0.3945, over 14395.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3712, over 3941600.11 frames. ], batch size: 150, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:07:37,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=778362.0, ans=0.0 2024-09-18 15:07:55,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2024-09-18 15:08:13,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=778418.6666666666, ans=0.0 2024-09-18 15:08:15,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=778418.6666666666, ans=0.125 2024-09-18 15:08:25,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=778447.0, ans=0.125 2024-09-18 15:08:38,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778475.3333333334, ans=0.1 2024-09-18 15:08:48,037 INFO [train.py:1198] (1/2) Epoch 43, batch 6350, loss[loss=0.2835, ctc_loss=0.1983, cr_loss=0.4259, over 14830.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3753, over 3717277.82 frames. ], batch size: 149, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:08:48,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=778503.6666666666, ans=0.0 2024-09-18 15:09:21,174 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.681e+02 2.898e+02 3.118e+02 4.424e+02, threshold=5.797e+02, percent-clipped=0.0 2024-09-18 15:09:44,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=778617.0, ans=0.125 2024-09-18 15:10:38,528 INFO [train.py:1198] (1/2) Epoch 44, batch 0, loss[loss=0.2549, ctc_loss=0.1712, cr_loss=0.4185, over 20926.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1712, cr_loss=0.4185, over 20926.00 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:10:38,529 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 15:10:56,830 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.03933, ctc_loss=0.03933, cr_loss=1.444e-14, over 944034.00 frames. 2024-09-18 15:10:56,831 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 15:11:22,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778648.1666666666, ans=0.0 2024-09-18 15:12:03,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-18 15:12:14,859 INFO [train.py:1198] (1/2) Epoch 44, batch 50, loss[loss=0.2302, ctc_loss=0.1529, cr_loss=0.3869, over 19536.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3676, over 930505.92 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:12:21,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=778761.5, ans=0.025 2024-09-18 15:12:28,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=778789.8333333334, ans=0.025 2024-09-18 15:12:32,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=22.5 2024-09-18 15:13:05,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.200e+02 2.332e+02 2.528e+02 3.841e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-18 15:13:12,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=778846.5, ans=0.0 2024-09-18 15:13:17,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=778874.8333333334, ans=0.025 2024-09-18 15:13:23,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=778874.8333333334, ans=0.2 2024-09-18 15:13:30,708 INFO [train.py:1198] (1/2) Epoch 44, batch 100, loss[loss=0.22, ctc_loss=0.1439, cr_loss=0.3803, over 20649.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3652, over 1640546.53 frames. ], batch size: 66, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:13:34,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=778903.1666666666, ans=0.025 2024-09-18 15:13:46,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=778931.5, ans=0.0 2024-09-18 15:14:05,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=778959.8333333334, ans=0.025 2024-09-18 15:14:18,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778988.1666666666, ans=0.1 2024-09-18 15:14:34,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779016.5, ans=0.125 2024-09-18 15:14:45,750 INFO [train.py:1198] (1/2) Epoch 44, batch 150, loss[loss=0.2179, ctc_loss=0.1422, cr_loss=0.3783, over 21001.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3708, over 2183684.24 frames. ], batch size: 52, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:15:09,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779073.1666666666, ans=0.1 2024-09-18 15:15:22,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-18 15:15:26,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=779101.5, ans=0.0 2024-09-18 15:15:28,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=779101.5, ans=0.125 2024-09-18 15:15:35,931 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.231e+02 2.391e+02 2.493e+02 5.095e+02, threshold=4.781e+02, percent-clipped=2.0 2024-09-18 15:15:50,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779158.1666666666, ans=0.125 2024-09-18 15:16:04,865 INFO [train.py:1198] (1/2) Epoch 44, batch 200, loss[loss=0.2134, ctc_loss=0.139, cr_loss=0.3723, over 20783.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1446, cr_loss=0.3721, over 2610958.90 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:16:08,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=779186.5, ans=0.125 2024-09-18 15:16:39,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779243.1666666666, ans=0.1 2024-09-18 15:16:57,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779271.5, ans=0.1 2024-09-18 15:17:16,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=779299.8333333334, ans=0.125 2024-09-18 15:17:19,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779299.8333333334, ans=0.125 2024-09-18 15:17:23,328 INFO [train.py:1198] (1/2) Epoch 44, batch 250, loss[loss=0.2119, ctc_loss=0.1386, cr_loss=0.3666, over 21046.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1438, cr_loss=0.3705, over 2937644.14 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:17:32,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=779328.1666666666, ans=0.125 2024-09-18 15:17:44,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-18 15:18:10,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=779413.1666666666, ans=0.125 2024-09-18 15:18:12,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.204e+02 2.348e+02 2.462e+02 5.022e+02, threshold=4.695e+02, percent-clipped=1.0 2024-09-18 15:18:23,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=779441.5, ans=0.125 2024-09-18 15:18:38,347 INFO [train.py:1198] (1/2) Epoch 44, batch 300, loss[loss=0.2105, ctc_loss=0.1385, cr_loss=0.3599, over 20962.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.144, cr_loss=0.3704, over 3198337.61 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:18:54,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-18 15:19:07,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779526.5, ans=0.125 2024-09-18 15:19:54,137 INFO [train.py:1198] (1/2) Epoch 44, batch 350, loss[loss=0.2177, ctc_loss=0.1461, cr_loss=0.358, over 20963.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1453, cr_loss=0.3721, over 3394597.08 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:20:09,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=779639.8333333334, ans=0.125 2024-09-18 15:20:40,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779696.5, ans=0.1 2024-09-18 15:20:43,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=779696.5, ans=0.0 2024-09-18 15:20:44,403 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.249e+02 2.382e+02 2.585e+02 6.896e+02, threshold=4.763e+02, percent-clipped=1.0 2024-09-18 15:20:46,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779696.5, ans=0.125 2024-09-18 15:21:10,226 INFO [train.py:1198] (1/2) Epoch 44, batch 400, loss[loss=0.2054, ctc_loss=0.1383, cr_loss=0.3357, over 21030.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3701, over 3543574.99 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:21:20,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=779753.1666666666, ans=0.2 2024-09-18 15:22:13,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=779866.5, ans=0.2 2024-09-18 15:22:24,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-09-18 15:22:28,643 INFO [train.py:1198] (1/2) Epoch 44, batch 450, loss[loss=0.2495, ctc_loss=0.168, cr_loss=0.4075, over 18372.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3693, over 3655114.72 frames. ], batch size: 109, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:22:33,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=779894.8333333334, ans=0.125 2024-09-18 15:23:22,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.261e+02 2.364e+02 2.599e+02 3.244e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 15:23:35,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=780008.1666666666, ans=0.0 2024-09-18 15:23:47,015 INFO [train.py:1198] (1/2) Epoch 44, batch 500, loss[loss=0.2015, ctc_loss=0.1316, cr_loss=0.3495, over 21014.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.366, over 3753799.08 frames. ], batch size: 63, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:24:43,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=780121.5, ans=0.125 2024-09-18 15:24:44,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=780121.5, ans=0.125 2024-09-18 15:25:01,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780178.1666666666, ans=0.1 2024-09-18 15:25:02,020 INFO [train.py:1198] (1/2) Epoch 44, batch 550, loss[loss=0.2231, ctc_loss=0.1491, cr_loss=0.3701, over 19389.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3678, over 3822313.92 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:25:10,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780178.1666666666, ans=0.1 2024-09-18 15:25:53,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.294e+02 2.377e+02 2.572e+02 1.494e+03, threshold=4.754e+02, percent-clipped=2.0 2024-09-18 15:26:17,803 INFO [train.py:1198] (1/2) Epoch 44, batch 600, loss[loss=0.2116, ctc_loss=0.1406, cr_loss=0.3552, over 20790.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 3885600.39 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:26:33,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=780348.1666666666, ans=0.125 2024-09-18 15:27:14,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=12.0 2024-09-18 15:27:36,976 INFO [train.py:1198] (1/2) Epoch 44, batch 650, loss[loss=0.296, ctc_loss=0.206, cr_loss=0.4499, over 14599.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3688, over 3929440.37 frames. ], batch size: 149, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:27:45,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-09-18 15:28:07,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=780518.1666666666, ans=0.5 2024-09-18 15:28:25,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=780546.5, ans=0.125 2024-09-18 15:28:31,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.233e+02 2.345e+02 2.494e+02 3.601e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 15:28:32,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780546.5, ans=0.1 2024-09-18 15:28:51,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=780574.8333333334, ans=0.0 2024-09-18 15:28:55,563 INFO [train.py:1198] (1/2) Epoch 44, batch 700, loss[loss=0.2068, ctc_loss=0.1363, cr_loss=0.3524, over 20801.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3696, over 3968134.00 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:28:57,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=780603.1666666666, ans=0.125 2024-09-18 15:29:16,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=780631.5, ans=0.125 2024-09-18 15:29:46,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=780688.1666666666, ans=0.125 2024-09-18 15:30:10,965 INFO [train.py:1198] (1/2) Epoch 44, batch 750, loss[loss=0.2379, ctc_loss=0.1576, cr_loss=0.4015, over 21039.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3683, over 3993668.84 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:30:33,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 15:30:37,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=780773.1666666666, ans=0.125 2024-09-18 15:30:50,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=780801.5, ans=0.2 2024-09-18 15:31:02,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.229e+02 2.376e+02 2.533e+02 3.347e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 15:31:26,652 INFO [train.py:1198] (1/2) Epoch 44, batch 800, loss[loss=0.2541, ctc_loss=0.1727, cr_loss=0.4071, over 20956.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3687, over 4010052.32 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:31:37,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:31:39,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-18 15:32:16,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=780971.5, ans=0.125 2024-09-18 15:32:30,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780999.8333333334, ans=0.125 2024-09-18 15:32:42,230 INFO [train.py:1198] (1/2) Epoch 44, batch 850, loss[loss=0.2736, ctc_loss=0.1895, cr_loss=0.4205, over 14374.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3692, over 4028419.03 frames. ], batch size: 150, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:32:42,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=781028.1666666666, ans=0.125 2024-09-18 15:33:18,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781084.8333333334, ans=0.125 2024-09-18 15:33:23,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781084.8333333334, ans=0.125 2024-09-18 15:33:38,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.222e+02 2.405e+02 2.536e+02 3.684e+02, threshold=4.811e+02, percent-clipped=0.0 2024-09-18 15:33:41,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=781113.1666666666, ans=0.125 2024-09-18 15:33:52,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781141.5, ans=0.1 2024-09-18 15:34:03,776 INFO [train.py:1198] (1/2) Epoch 44, batch 900, loss[loss=0.1819, ctc_loss=0.1203, cr_loss=0.308, over 19949.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.37, over 4033892.71 frames. ], batch size: 44, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:34:14,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=781169.8333333334, ans=0.125 2024-09-18 15:34:19,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781198.1666666666, ans=0.125 2024-09-18 15:35:07,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=781283.1666666666, ans=0.025 2024-09-18 15:35:20,475 INFO [train.py:1198] (1/2) Epoch 44, batch 950, loss[loss=0.2407, ctc_loss=0.1694, cr_loss=0.3563, over 14205.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3675, over 4041533.51 frames. ], batch size: 150, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:35:25,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=781311.5, ans=0.0 2024-09-18 15:35:42,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=781339.8333333334, ans=10.0 2024-09-18 15:35:56,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.10 vs. limit=6.0 2024-09-18 15:36:01,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=781368.1666666666, ans=0.125 2024-09-18 15:36:03,602 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-09-18 15:36:13,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.223e+02 2.386e+02 2.536e+02 3.288e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-18 15:36:18,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=781396.5, ans=0.0 2024-09-18 15:36:35,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:36:36,708 INFO [train.py:1198] (1/2) Epoch 44, batch 1000, loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3693, over 20954.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.368, over 4059819.29 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:36:52,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781481.5, ans=0.125 2024-09-18 15:36:58,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781481.5, ans=0.125 2024-09-18 15:37:15,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=781509.8333333334, ans=0.125 2024-09-18 15:37:23,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781538.1666666666, ans=0.1 2024-09-18 15:37:52,027 INFO [train.py:1198] (1/2) Epoch 44, batch 1050, loss[loss=0.2501, ctc_loss=0.1639, cr_loss=0.4309, over 20063.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3667, over 4074786.14 frames. ], batch size: 80, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:37:52,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=781594.8333333334, ans=0.0 2024-09-18 15:37:52,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=781594.8333333334, ans=0.0 2024-09-18 15:38:02,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=781594.8333333334, ans=0.125 2024-09-18 15:38:22,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=781651.5, ans=0.125 2024-09-18 15:38:25,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=781651.5, ans=0.125 2024-09-18 15:38:45,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.240e+02 2.329e+02 2.481e+02 2.734e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-18 15:38:58,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=781708.1666666666, ans=0.025 2024-09-18 15:39:06,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=781708.1666666666, ans=0.125 2024-09-18 15:39:07,664 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:39:11,748 INFO [train.py:1198] (1/2) Epoch 44, batch 1100, loss[loss=0.2217, ctc_loss=0.1445, cr_loss=0.3858, over 20925.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3679, over 4076590.41 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:39:42,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781793.1666666666, ans=0.1 2024-09-18 15:40:11,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.03 vs. limit=6.0 2024-09-18 15:40:17,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=781849.8333333334, ans=0.125 2024-09-18 15:40:26,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-09-18 15:40:30,691 INFO [train.py:1198] (1/2) Epoch 44, batch 1150, loss[loss=0.2271, ctc_loss=0.1511, cr_loss=0.3799, over 20830.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3684, over 4089176.80 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:40:46,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=781906.5, ans=0.09899494936611666 2024-09-18 15:40:58,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=781906.5, ans=0.125 2024-09-18 15:41:10,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781934.8333333334, ans=0.2 2024-09-18 15:41:23,202 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:41:24,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.278e+02 2.383e+02 2.503e+02 4.330e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 15:41:39,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=22.5 2024-09-18 15:41:47,977 INFO [train.py:1198] (1/2) Epoch 44, batch 1200, loss[loss=0.2555, ctc_loss=0.1737, cr_loss=0.409, over 18114.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3702, over 4095613.54 frames. ], batch size: 108, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:42:13,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=782048.1666666666, ans=0.2 2024-09-18 15:42:18,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=782076.5, ans=0.1 2024-09-18 15:42:28,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-18 15:42:33,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=782104.8333333334, ans=0.07 2024-09-18 15:42:35,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=782104.8333333334, ans=0.125 2024-09-18 15:42:42,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782104.8333333334, ans=0.0 2024-09-18 15:43:03,658 INFO [train.py:1198] (1/2) Epoch 44, batch 1250, loss[loss=0.2464, ctc_loss=0.162, cr_loss=0.4223, over 20948.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1448, cr_loss=0.3711, over 4090559.11 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:43:13,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=782161.5, ans=0.125 2024-09-18 15:43:23,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=782189.8333333334, ans=0.2 2024-09-18 15:43:29,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=782189.8333333334, ans=0.05 2024-09-18 15:43:56,982 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.222e+02 2.361e+02 2.534e+02 5.802e+02, threshold=4.722e+02, percent-clipped=2.0 2024-09-18 15:43:57,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=782246.5, ans=0.0 2024-09-18 15:44:19,788 INFO [train.py:1198] (1/2) Epoch 44, batch 1300, loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.375, over 20656.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3683, over 4098891.39 frames. ], batch size: 71, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:45:00,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=782359.8333333334, ans=0.125 2024-09-18 15:45:31,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=782416.5, ans=0.125 2024-09-18 15:45:38,305 INFO [train.py:1198] (1/2) Epoch 44, batch 1350, loss[loss=0.1717, ctc_loss=0.1122, cr_loss=0.2975, over 20980.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3692, over 4090161.78 frames. ], batch size: 49, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:46:34,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.263e+02 2.364e+02 2.570e+02 4.080e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 15:46:43,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=782558.1666666666, ans=0.2 2024-09-18 15:46:55,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=782586.5, ans=0.0 2024-09-18 15:46:56,902 INFO [train.py:1198] (1/2) Epoch 44, batch 1400, loss[loss=0.221, ctc_loss=0.1471, cr_loss=0.3694, over 21031.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3688, over 4079587.16 frames. ], batch size: 62, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:47:23,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=782614.8333333334, ans=0.2 2024-09-18 15:47:33,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-18 15:47:35,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-18 15:47:36,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=782643.1666666666, ans=0.125 2024-09-18 15:48:12,449 INFO [train.py:1198] (1/2) Epoch 44, batch 1450, loss[loss=0.2114, ctc_loss=0.136, cr_loss=0.3775, over 20876.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3677, over 4089138.16 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:48:55,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=782784.8333333334, ans=0.0 2024-09-18 15:49:05,693 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.211e+02 2.362e+02 2.505e+02 3.838e+02, threshold=4.723e+02, percent-clipped=0.0 2024-09-18 15:49:09,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-18 15:49:10,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=782813.1666666666, ans=0.05 2024-09-18 15:49:28,402 INFO [train.py:1198] (1/2) Epoch 44, batch 1500, loss[loss=0.2409, ctc_loss=0.1611, cr_loss=0.3991, over 20646.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3689, over 4091587.87 frames. ], batch size: 71, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:49:34,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=782869.8333333334, ans=0.125 2024-09-18 15:49:42,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782898.1666666666, ans=0.0 2024-09-18 15:49:49,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=782898.1666666666, ans=0.0 2024-09-18 15:50:20,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=782954.8333333334, ans=0.125 2024-09-18 15:50:24,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=782954.8333333334, ans=0.07 2024-09-18 15:50:40,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=782983.1666666666, ans=0.125 2024-09-18 15:50:46,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=783011.5, ans=0.125 2024-09-18 15:50:47,737 INFO [train.py:1198] (1/2) Epoch 44, batch 1550, loss[loss=0.236, ctc_loss=0.1561, cr_loss=0.3995, over 20635.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3683, over 4088867.13 frames. ], batch size: 71, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:51:11,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=22.5 2024-09-18 15:51:43,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.221e+02 2.341e+02 2.547e+02 3.742e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 15:51:51,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783124.8333333334, ans=0.1 2024-09-18 15:51:56,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783124.8333333334, ans=0.1 2024-09-18 15:52:02,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=783124.8333333334, ans=0.2 2024-09-18 15:52:06,567 INFO [train.py:1198] (1/2) Epoch 44, batch 1600, loss[loss=0.2715, ctc_loss=0.1824, cr_loss=0.4455, over 20964.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3688, over 4101886.77 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:52:28,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=783181.5, ans=0.125 2024-09-18 15:52:34,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=783181.5, ans=0.04949747468305833 2024-09-18 15:52:35,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783209.8333333334, ans=0.0 2024-09-18 15:52:38,714 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:52:57,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-18 15:53:01,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783238.1666666666, ans=0.1 2024-09-18 15:53:07,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=783266.5, ans=0.025 2024-09-18 15:53:22,061 INFO [train.py:1198] (1/2) Epoch 44, batch 1650, loss[loss=0.2014, ctc_loss=0.1314, cr_loss=0.3496, over 20957.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3683, over 4103456.34 frames. ], batch size: 48, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:53:26,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=783294.8333333334, ans=0.125 2024-09-18 15:53:40,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=783323.1666666666, ans=0.125 2024-09-18 15:54:15,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.298e+02 2.409e+02 2.609e+02 3.009e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-18 15:54:25,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=783408.1666666666, ans=0.2 2024-09-18 15:54:38,530 INFO [train.py:1198] (1/2) Epoch 44, batch 1700, loss[loss=0.2316, ctc_loss=0.1509, cr_loss=0.4035, over 20878.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3675, over 4107616.01 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:54:51,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=15.0 2024-09-18 15:55:19,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=783493.1666666666, ans=0.2 2024-09-18 15:55:36,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783521.5, ans=0.125 2024-09-18 15:55:38,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-18 15:55:54,318 INFO [train.py:1198] (1/2) Epoch 44, batch 1750, loss[loss=0.2159, ctc_loss=0.1428, cr_loss=0.3653, over 21037.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3677, over 4101884.05 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:55:56,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=783578.1666666666, ans=0.125 2024-09-18 15:56:03,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=783578.1666666666, ans=0.125 2024-09-18 15:56:44,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=783663.1666666666, ans=22.5 2024-09-18 15:56:50,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.237e+02 2.340e+02 2.460e+02 5.468e+02, threshold=4.679e+02, percent-clipped=1.0 2024-09-18 15:57:01,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783691.5, ans=0.0 2024-09-18 15:57:16,308 INFO [train.py:1198] (1/2) Epoch 44, batch 1800, loss[loss=0.1908, ctc_loss=0.1253, cr_loss=0.3272, over 20993.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3688, over 4107285.66 frames. ], batch size: 51, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:57:21,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=783719.8333333334, ans=0.125 2024-09-18 15:57:49,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=783776.5, ans=0.125 2024-09-18 15:57:49,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=783776.5, ans=0.0 2024-09-18 15:58:06,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=783804.8333333334, ans=0.2 2024-09-18 15:58:26,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=783833.1666666666, ans=0.125 2024-09-18 15:58:31,831 INFO [train.py:1198] (1/2) Epoch 44, batch 1850, loss[loss=0.2415, ctc_loss=0.1617, cr_loss=0.3989, over 19360.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4111796.06 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:58:34,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2024-09-18 15:58:38,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=783861.5, ans=0.125 2024-09-18 15:59:11,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=783918.1666666666, ans=0.125 2024-09-18 15:59:19,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-18 15:59:26,404 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.230e+02 2.364e+02 2.564e+02 3.958e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 15:59:45,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.09 vs. limit=6.0 2024-09-18 15:59:47,675 INFO [train.py:1198] (1/2) Epoch 44, batch 1900, loss[loss=0.2047, ctc_loss=0.1316, cr_loss=0.3655, over 20861.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3692, over 4118160.18 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:59:49,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=784003.1666666666, ans=0.125 2024-09-18 16:00:19,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=784059.8333333334, ans=0.2 2024-09-18 16:00:25,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-09-18 16:00:43,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=784088.1666666666, ans=15.0 2024-09-18 16:00:46,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=784116.5, ans=0.05 2024-09-18 16:00:51,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=784116.5, ans=0.125 2024-09-18 16:01:00,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=784116.5, ans=0.125 2024-09-18 16:01:02,716 INFO [train.py:1198] (1/2) Epoch 44, batch 1950, loss[loss=0.2135, ctc_loss=0.1375, cr_loss=0.3801, over 20955.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3685, over 4117647.89 frames. ], batch size: 60, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 16:01:27,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=784173.1666666666, ans=0.0 2024-09-18 16:01:32,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=784201.5, ans=0.0 2024-09-18 16:01:40,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=784201.5, ans=12.0 2024-09-18 16:01:41,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=784201.5, ans=0.0 2024-09-18 16:01:49,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=784229.8333333334, ans=0.125 2024-09-18 16:01:57,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.222e+02 2.340e+02 2.472e+02 5.398e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-18 16:02:17,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=784286.5, ans=0.125 2024-09-18 16:02:18,799 INFO [train.py:1198] (1/2) Epoch 44, batch 2000, loss[loss=0.2102, ctc_loss=0.1362, cr_loss=0.3697, over 21067.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.369, over 4115391.62 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 16:02:29,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784286.5, ans=0.1 2024-09-18 16:02:35,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=784314.8333333334, ans=0.2 2024-09-18 16:02:45,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784314.8333333334, ans=0.0 2024-09-18 16:03:24,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=784399.8333333334, ans=0.025 2024-09-18 16:03:39,697 INFO [train.py:1198] (1/2) Epoch 44, batch 2050, loss[loss=0.1929, ctc_loss=0.1233, cr_loss=0.3484, over 20954.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4116344.96 frames. ], batch size: 50, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:03:44,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784428.1666666666, ans=0.1 2024-09-18 16:04:04,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=784456.5, ans=0.125 2024-09-18 16:04:07,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784456.5, ans=0.125 2024-09-18 16:04:34,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.235e+02 2.350e+02 2.509e+02 4.746e+02, threshold=4.699e+02, percent-clipped=1.0 2024-09-18 16:04:34,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=784513.1666666666, ans=0.2 2024-09-18 16:04:48,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=784541.5, ans=0.125 2024-09-18 16:04:50,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=784541.5, ans=0.07 2024-09-18 16:04:50,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=784541.5, ans=0.125 2024-09-18 16:04:51,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=784541.5, ans=0.025 2024-09-18 16:04:55,787 INFO [train.py:1198] (1/2) Epoch 44, batch 2100, loss[loss=0.2387, ctc_loss=0.1603, cr_loss=0.3921, over 19396.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3684, over 4108769.20 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:05:11,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=784598.1666666666, ans=0.125 2024-09-18 16:05:28,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-18 16:05:37,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2024-09-18 16:05:38,970 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:06:12,050 INFO [train.py:1198] (1/2) Epoch 44, batch 2150, loss[loss=0.239, ctc_loss=0.1599, cr_loss=0.3959, over 20980.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3695, over 4093732.30 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:06:24,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-18 16:06:55,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2024-09-18 16:07:01,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-18 16:07:06,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.326e+02 2.412e+02 2.683e+02 9.743e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-18 16:07:23,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=784824.8333333334, ans=0.2 2024-09-18 16:07:25,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784824.8333333334, ans=0.125 2024-09-18 16:07:27,785 INFO [train.py:1198] (1/2) Epoch 44, batch 2200, loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3727, over 20846.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3674, over 4096641.23 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:07:31,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-18 16:07:41,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=784881.5, ans=0.0 2024-09-18 16:07:57,050 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:08:33,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=784966.5, ans=0.05 2024-09-18 16:08:48,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784994.8333333334, ans=0.1 2024-09-18 16:08:49,723 INFO [train.py:1198] (1/2) Epoch 44, batch 2250, loss[loss=0.2235, ctc_loss=0.1471, cr_loss=0.3821, over 20944.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3672, over 4104948.55 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:09:18,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=785051.5, ans=0.07 2024-09-18 16:09:23,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-18 16:09:44,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.232e+02 2.356e+02 2.516e+02 5.738e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-18 16:09:47,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=785079.8333333334, ans=0.025 2024-09-18 16:10:03,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-18 16:10:04,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=785136.5, ans=0.025 2024-09-18 16:10:05,572 INFO [train.py:1198] (1/2) Epoch 44, batch 2300, loss[loss=0.2416, ctc_loss=0.1611, cr_loss=0.4023, over 20661.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3693, over 4088256.07 frames. ], batch size: 66, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:10:22,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-18 16:10:40,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=785193.1666666666, ans=0.0 2024-09-18 16:10:55,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=785221.5, ans=0.0 2024-09-18 16:11:20,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=785278.1666666666, ans=0.0 2024-09-18 16:11:21,188 INFO [train.py:1198] (1/2) Epoch 44, batch 2350, loss[loss=0.2376, ctc_loss=0.1557, cr_loss=0.4097, over 20992.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3672, over 4089762.61 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:11:41,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-18 16:12:10,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=785363.1666666666, ans=0.125 2024-09-18 16:12:15,889 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.230e+02 2.373e+02 2.587e+02 6.933e+02, threshold=4.747e+02, percent-clipped=1.0 2024-09-18 16:12:28,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=785391.5, ans=0.2 2024-09-18 16:12:29,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=785391.5, ans=0.2 2024-09-18 16:12:29,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=785391.5, ans=0.0 2024-09-18 16:12:37,196 INFO [train.py:1198] (1/2) Epoch 44, batch 2400, loss[loss=0.2414, ctc_loss=0.1618, cr_loss=0.3977, over 20338.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3683, over 4085371.58 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:13:18,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=785476.5, ans=0.015 2024-09-18 16:13:27,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=785504.8333333334, ans=0.125 2024-09-18 16:13:44,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=785533.1666666666, ans=0.0 2024-09-18 16:13:56,674 INFO [train.py:1198] (1/2) Epoch 44, batch 2450, loss[loss=0.2161, ctc_loss=0.1397, cr_loss=0.3815, over 20895.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3691, over 4095560.98 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:14:13,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785589.8333333334, ans=0.125 2024-09-18 16:14:37,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=785618.1666666666, ans=0.0 2024-09-18 16:14:44,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=785646.5, ans=10.0 2024-09-18 16:14:54,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.191e+02 2.329e+02 2.465e+02 4.448e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 16:15:15,309 INFO [train.py:1198] (1/2) Epoch 44, batch 2500, loss[loss=0.2034, ctc_loss=0.132, cr_loss=0.3567, over 21071.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3685, over 4099271.95 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:15:56,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=785759.8333333334, ans=0.2 2024-09-18 16:16:27,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=785816.5, ans=0.125 2024-09-18 16:16:31,328 INFO [train.py:1198] (1/2) Epoch 44, batch 2550, loss[loss=0.1803, ctc_loss=0.1156, cr_loss=0.3234, over 21060.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.368, over 4109262.56 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:16:43,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-18 16:17:18,970 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:17:26,279 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.269e+02 2.356e+02 2.486e+02 3.153e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-18 16:17:47,677 INFO [train.py:1198] (1/2) Epoch 44, batch 2600, loss[loss=0.2462, ctc_loss=0.1658, cr_loss=0.402, over 20277.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3701, over 4117665.27 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:18:12,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786014.8333333334, ans=0.1 2024-09-18 16:18:14,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=12.0 2024-09-18 16:18:27,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-18 16:18:28,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=786043.1666666666, ans=0.0 2024-09-18 16:18:33,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=786071.5, ans=0.0 2024-09-18 16:18:36,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=786071.5, ans=0.035 2024-09-18 16:19:03,600 INFO [train.py:1198] (1/2) Epoch 44, batch 2650, loss[loss=0.2409, ctc_loss=0.1602, cr_loss=0.4036, over 20707.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3696, over 4100519.43 frames. ], batch size: 71, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:19:22,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=786156.5, ans=0.125 2024-09-18 16:19:43,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-18 16:20:01,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.215e+02 2.398e+02 2.540e+02 4.601e+02, threshold=4.795e+02, percent-clipped=0.0 2024-09-18 16:20:22,470 INFO [train.py:1198] (1/2) Epoch 44, batch 2700, loss[loss=0.1858, ctc_loss=0.1208, cr_loss=0.3248, over 20986.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3679, over 4109864.32 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:21:00,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2024-09-18 16:21:08,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=786326.5, ans=0.125 2024-09-18 16:21:41,606 INFO [train.py:1198] (1/2) Epoch 44, batch 2750, loss[loss=0.213, ctc_loss=0.1384, cr_loss=0.373, over 21019.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4099450.35 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:22:06,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786439.8333333334, ans=0.1 2024-09-18 16:22:16,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=786468.1666666666, ans=0.2 2024-09-18 16:22:35,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.277e+02 2.396e+02 2.598e+02 3.085e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 16:22:57,378 INFO [train.py:1198] (1/2) Epoch 44, batch 2800, loss[loss=0.1839, ctc_loss=0.1188, cr_loss=0.3254, over 19920.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3693, over 4099021.34 frames. ], batch size: 44, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:23:00,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=786553.1666666666, ans=0.025 2024-09-18 16:23:05,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=786553.1666666666, ans=0.125 2024-09-18 16:23:10,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-09-18 16:23:11,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-09-18 16:24:01,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=786666.5, ans=10.0 2024-09-18 16:24:13,098 INFO [train.py:1198] (1/2) Epoch 44, batch 2850, loss[loss=0.1947, ctc_loss=0.1274, cr_loss=0.3362, over 21066.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 4091030.58 frames. ], batch size: 53, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:24:15,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=786694.8333333334, ans=0.025 2024-09-18 16:24:47,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786751.5, ans=0.1 2024-09-18 16:25:07,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.256e+02 2.398e+02 2.542e+02 7.526e+02, threshold=4.796e+02, percent-clipped=1.0 2024-09-18 16:25:31,636 INFO [train.py:1198] (1/2) Epoch 44, batch 2900, loss[loss=0.2032, ctc_loss=0.1321, cr_loss=0.3555, over 20974.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3685, over 4092411.91 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:25:36,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=786836.5, ans=0.2 2024-09-18 16:25:40,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=786836.5, ans=0.125 2024-09-18 16:25:50,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-18 16:26:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=786949.8333333334, ans=0.125 2024-09-18 16:26:48,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786978.1666666666, ans=0.1 2024-09-18 16:26:50,152 INFO [train.py:1198] (1/2) Epoch 44, batch 2950, loss[loss=0.1923, ctc_loss=0.126, cr_loss=0.3312, over 20966.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1424, cr_loss=0.3666, over 4094728.17 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:26:58,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=786978.1666666666, ans=0.2 2024-09-18 16:27:44,738 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.241e+02 2.367e+02 2.490e+02 3.421e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 16:28:04,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=787119.8333333334, ans=0.2 2024-09-18 16:28:06,203 INFO [train.py:1198] (1/2) Epoch 44, batch 3000, loss[loss=0.2409, ctc_loss=0.1602, cr_loss=0.4038, over 20850.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3681, over 4092687.68 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:28:06,204 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 16:28:24,416 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9831, 3.9498, 3.8969, 3.4857], device='cuda:1') 2024-09-18 16:28:28,051 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.03963, ctc_loss=0.03963, cr_loss=1.646e-14, over 944034.00 frames. 2024-09-18 16:28:28,051 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 16:28:32,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=787119.8333333334, ans=0.0 2024-09-18 16:28:32,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=787119.8333333334, ans=0.125 2024-09-18 16:28:35,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=787119.8333333334, ans=0.125 2024-09-18 16:28:52,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=787148.1666666666, ans=0.07 2024-09-18 16:29:42,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=787261.5, ans=0.125 2024-09-18 16:29:44,038 INFO [train.py:1198] (1/2) Epoch 44, batch 3050, loss[loss=0.2323, ctc_loss=0.1558, cr_loss=0.3824, over 21086.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3682, over 4100395.38 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:30:19,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.67 vs. limit=6.0 2024-09-18 16:30:25,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=787318.1666666666, ans=0.0 2024-09-18 16:30:38,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-18 16:30:38,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.194e+02 2.360e+02 2.503e+02 3.910e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 16:31:03,260 INFO [train.py:1198] (1/2) Epoch 44, batch 3100, loss[loss=0.226, ctc_loss=0.1496, cr_loss=0.3824, over 20646.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.142, cr_loss=0.3662, over 4109036.28 frames. ], batch size: 66, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:31:11,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787403.1666666666, ans=0.1 2024-09-18 16:31:38,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=787459.8333333334, ans=0.025 2024-09-18 16:31:58,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=787488.1666666666, ans=0.2 2024-09-18 16:32:03,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=787488.1666666666, ans=0.2 2024-09-18 16:32:21,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=787544.8333333334, ans=0.025 2024-09-18 16:32:22,861 INFO [train.py:1198] (1/2) Epoch 44, batch 3150, loss[loss=0.2302, ctc_loss=0.153, cr_loss=0.3863, over 21070.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1417, cr_loss=0.366, over 4117974.07 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:33:17,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.239e+02 2.339e+02 2.498e+02 3.418e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-18 16:33:22,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=787658.1666666666, ans=0.0 2024-09-18 16:33:38,833 INFO [train.py:1198] (1/2) Epoch 44, batch 3200, loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3855, over 20870.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.3669, over 4115333.56 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:33:46,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=787686.5, ans=0.125 2024-09-18 16:34:39,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=787799.8333333334, ans=0.0 2024-09-18 16:34:39,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-18 16:34:54,554 INFO [train.py:1198] (1/2) Epoch 44, batch 3250, loss[loss=0.2289, ctc_loss=0.1507, cr_loss=0.3913, over 20955.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3688, over 4116121.14 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:35:02,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=787828.1666666666, ans=0.0 2024-09-18 16:35:13,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=787856.5, ans=0.125 2024-09-18 16:35:49,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.261e+02 2.346e+02 2.501e+02 3.187e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-18 16:36:10,342 INFO [train.py:1198] (1/2) Epoch 44, batch 3300, loss[loss=0.2029, ctc_loss=0.1294, cr_loss=0.3678, over 20972.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3688, over 4117360.95 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:36:37,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=787998.1666666666, ans=0.125 2024-09-18 16:37:28,604 INFO [train.py:1198] (1/2) Epoch 44, batch 3350, loss[loss=0.2219, ctc_loss=0.1472, cr_loss=0.3738, over 20981.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3683, over 4118569.04 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:38:00,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=788168.1666666666, ans=0.0 2024-09-18 16:38:08,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-09-18 16:38:25,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.223e+02 2.376e+02 2.583e+02 6.569e+02, threshold=4.752e+02, percent-clipped=1.0 2024-09-18 16:38:32,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788224.8333333334, ans=0.1 2024-09-18 16:38:44,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=788224.8333333334, ans=0.125 2024-09-18 16:38:46,719 INFO [train.py:1198] (1/2) Epoch 44, batch 3400, loss[loss=0.2061, ctc_loss=0.1329, cr_loss=0.3658, over 20976.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3683, over 4102494.20 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:38:54,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=788253.1666666666, ans=0.125 2024-09-18 16:39:09,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=788281.5, ans=0.025 2024-09-18 16:39:17,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=788309.8333333334, ans=0.0 2024-09-18 16:39:31,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=788338.1666666666, ans=0.125 2024-09-18 16:39:43,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-18 16:40:02,482 INFO [train.py:1198] (1/2) Epoch 44, batch 3450, loss[loss=0.2123, ctc_loss=0.1425, cr_loss=0.3488, over 21019.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4104678.65 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:40:21,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=788423.1666666666, ans=0.025 2024-09-18 16:40:42,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=788451.5, ans=0.125 2024-09-18 16:40:57,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.290e+02 2.415e+02 2.549e+02 3.500e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 16:41:12,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=788508.1666666666, ans=0.0 2024-09-18 16:41:18,281 INFO [train.py:1198] (1/2) Epoch 44, batch 3500, loss[loss=0.2257, ctc_loss=0.1498, cr_loss=0.3791, over 20835.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3681, over 4087723.99 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:41:22,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=22.5 2024-09-18 16:41:35,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=788564.8333333334, ans=0.125 2024-09-18 16:42:37,161 INFO [train.py:1198] (1/2) Epoch 44, batch 3550, loss[loss=0.2263, ctc_loss=0.1496, cr_loss=0.3832, over 20990.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3679, over 4091214.73 frames. ], batch size: 67, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:42:57,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=788706.5, ans=0.0 2024-09-18 16:43:04,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=788706.5, ans=0.0 2024-09-18 16:43:06,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788734.8333333334, ans=0.125 2024-09-18 16:43:15,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-18 16:43:34,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.266e+02 2.401e+02 2.547e+02 4.043e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 16:43:35,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-09-18 16:43:56,143 INFO [train.py:1198] (1/2) Epoch 44, batch 3600, loss[loss=0.2362, ctc_loss=0.156, cr_loss=0.4011, over 20956.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3677, over 4089443.02 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:44:12,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2024-09-18 16:44:22,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=788848.1666666666, ans=0.125 2024-09-18 16:44:29,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788876.5, ans=0.125 2024-09-18 16:44:38,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=788876.5, ans=0.0 2024-09-18 16:45:12,092 INFO [train.py:1198] (1/2) Epoch 44, batch 3650, loss[loss=0.2175, ctc_loss=0.1444, cr_loss=0.3657, over 21042.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.368, over 4089969.37 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:45:26,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-18 16:45:37,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=788989.8333333334, ans=0.125 2024-09-18 16:46:07,398 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.289e+02 2.448e+02 2.618e+02 3.925e+02, threshold=4.895e+02, percent-clipped=0.0 2024-09-18 16:46:09,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=789046.5, ans=0.0 2024-09-18 16:46:28,875 INFO [train.py:1198] (1/2) Epoch 44, batch 3700, loss[loss=0.2063, ctc_loss=0.1355, cr_loss=0.3544, over 21039.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3689, over 4090098.50 frames. ], batch size: 63, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:46:38,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=789103.1666666666, ans=15.0 2024-09-18 16:47:26,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789188.1666666666, ans=0.1 2024-09-18 16:47:44,153 INFO [train.py:1198] (1/2) Epoch 44, batch 3750, loss[loss=0.2201, ctc_loss=0.1442, cr_loss=0.3791, over 21023.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3685, over 4091262.12 frames. ], batch size: 63, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:48:29,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=789301.5, ans=0.1 2024-09-18 16:48:41,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.254e+02 2.407e+02 2.587e+02 3.240e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-18 16:49:04,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-18 16:49:05,610 INFO [train.py:1198] (1/2) Epoch 44, batch 3800, loss[loss=0.1929, ctc_loss=0.1269, cr_loss=0.3302, over 20281.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 4093115.86 frames. ], batch size: 45, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:49:25,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-18 16:49:39,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=789443.1666666666, ans=0.125 2024-09-18 16:49:46,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=789443.1666666666, ans=0.125 2024-09-18 16:50:13,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=789499.8333333334, ans=0.125 2024-09-18 16:50:13,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789499.8333333334, ans=0.125 2024-09-18 16:50:21,117 INFO [train.py:1198] (1/2) Epoch 44, batch 3850, loss[loss=0.2112, ctc_loss=0.143, cr_loss=0.3406, over 20981.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3671, over 4088301.84 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:50:23,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=789528.1666666666, ans=0.0 2024-09-18 16:50:27,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=789528.1666666666, ans=0.04949747468305833 2024-09-18 16:51:04,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=789584.8333333334, ans=0.125 2024-09-18 16:51:06,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-18 16:51:15,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.170e+02 2.352e+02 2.495e+02 3.572e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 16:51:19,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=789613.1666666666, ans=0.2 2024-09-18 16:51:29,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=789641.5, ans=0.125 2024-09-18 16:51:37,027 INFO [train.py:1198] (1/2) Epoch 44, batch 3900, loss[loss=0.23, ctc_loss=0.1489, cr_loss=0.4054, over 21024.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3673, over 4103984.71 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:51:49,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=789669.8333333334, ans=0.0 2024-09-18 16:52:06,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789726.5, ans=0.1 2024-09-18 16:52:16,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=789726.5, ans=0.125 2024-09-18 16:52:52,658 INFO [train.py:1198] (1/2) Epoch 44, batch 3950, loss[loss=0.2062, ctc_loss=0.1355, cr_loss=0.3532, over 20236.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1421, cr_loss=0.3668, over 4094265.29 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:52:56,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-18 16:53:06,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=789839.8333333334, ans=0.125 2024-09-18 16:53:15,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789839.8333333334, ans=0.1 2024-09-18 16:53:36,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=789896.5, ans=0.0 2024-09-18 16:53:49,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.272e+02 2.415e+02 2.674e+02 3.646e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-18 16:53:56,287 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:54:10,903 INFO [train.py:1198] (1/2) Epoch 44, batch 4000, loss[loss=0.2123, ctc_loss=0.1368, cr_loss=0.3776, over 20981.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1424, cr_loss=0.367, over 4092808.06 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:54:11,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=789953.1666666666, ans=0.125 2024-09-18 16:54:28,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=789981.5, ans=0.07 2024-09-18 16:54:46,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=790009.8333333334, ans=0.125 2024-09-18 16:54:56,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=790009.8333333334, ans=0.125 2024-09-18 16:55:23,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=790066.5, ans=0.125 2024-09-18 16:55:30,406 INFO [train.py:1198] (1/2) Epoch 44, batch 4050, loss[loss=0.2559, ctc_loss=0.1696, cr_loss=0.4314, over 20827.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1419, cr_loss=0.3661, over 4093161.90 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:55:52,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2024-09-18 16:56:04,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790151.5, ans=0.125 2024-09-18 16:56:26,498 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.223e+02 2.372e+02 2.511e+02 3.889e+02, threshold=4.744e+02, percent-clipped=0.0 2024-09-18 16:56:37,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=790208.1666666666, ans=0.0 2024-09-18 16:56:40,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=790208.1666666666, ans=0.2 2024-09-18 16:56:46,372 INFO [train.py:1198] (1/2) Epoch 44, batch 4100, loss[loss=0.1731, ctc_loss=0.1116, cr_loss=0.3074, over 20301.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3661, over 4094443.90 frames. ], batch size: 45, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:57:38,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=790321.5, ans=0.2 2024-09-18 16:57:49,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=22.5 2024-09-18 16:58:02,125 INFO [train.py:1198] (1/2) Epoch 44, batch 4150, loss[loss=0.2276, ctc_loss=0.1512, cr_loss=0.3817, over 19898.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.366, over 4098332.45 frames. ], batch size: 80, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:58:27,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790406.5, ans=0.125 2024-09-18 16:58:59,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.251e+02 2.338e+02 2.466e+02 3.795e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-18 16:59:11,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790491.5, ans=0.125 2024-09-18 16:59:17,556 INFO [train.py:1198] (1/2) Epoch 44, batch 4200, loss[loss=0.1912, ctc_loss=0.1245, cr_loss=0.3332, over 20960.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3663, over 4094453.16 frames. ], batch size: 48, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 16:59:46,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790548.1666666666, ans=0.125 2024-09-18 16:59:48,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790548.1666666666, ans=0.1 2024-09-18 16:59:51,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-09-18 16:59:56,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790576.5, ans=0.125 2024-09-18 17:00:00,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790576.5, ans=0.125 2024-09-18 17:00:08,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=790604.8333333334, ans=0.125 2024-09-18 17:00:22,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-09-18 17:00:23,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790633.1666666666, ans=0.125 2024-09-18 17:00:37,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=790633.1666666666, ans=0.1 2024-09-18 17:00:39,883 INFO [train.py:1198] (1/2) Epoch 44, batch 4250, loss[loss=0.2625, ctc_loss=0.1802, cr_loss=0.4112, over 19399.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3663, over 4100858.04 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:01:09,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790718.1666666666, ans=0.125 2024-09-18 17:01:37,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.223e+02 2.363e+02 2.512e+02 3.647e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 17:01:47,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=790774.8333333334, ans=0.0 2024-09-18 17:01:56,172 INFO [train.py:1198] (1/2) Epoch 44, batch 4300, loss[loss=0.2284, ctc_loss=0.15, cr_loss=0.3922, over 20830.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3671, over 4100791.50 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:02:34,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=790859.8333333334, ans=0.125 2024-09-18 17:03:11,811 INFO [train.py:1198] (1/2) Epoch 44, batch 4350, loss[loss=0.2462, ctc_loss=0.1653, cr_loss=0.4041, over 20851.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3686, over 4111868.13 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:03:46,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=791001.5, ans=0.09899494936611666 2024-09-18 17:03:48,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=791001.5, ans=0.125 2024-09-18 17:04:09,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.240e+02 2.398e+02 2.515e+02 3.522e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 17:04:27,598 INFO [train.py:1198] (1/2) Epoch 44, batch 4400, loss[loss=0.2086, ctc_loss=0.137, cr_loss=0.3584, over 20875.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.3701, over 4105346.55 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:05:46,686 INFO [train.py:1198] (1/2) Epoch 44, batch 4450, loss[loss=0.2125, ctc_loss=0.1387, cr_loss=0.369, over 20780.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3697, over 4107664.94 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:05:54,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791228.1666666666, ans=0.1 2024-09-18 17:06:08,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=791256.5, ans=0.125 2024-09-18 17:06:34,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=791313.1666666666, ans=0.125 2024-09-18 17:06:47,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.237e+02 2.340e+02 2.513e+02 5.679e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-18 17:06:53,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=791341.5, ans=0.125 2024-09-18 17:06:53,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=791341.5, ans=0.2 2024-09-18 17:07:05,516 INFO [train.py:1198] (1/2) Epoch 44, batch 4500, loss[loss=0.2252, ctc_loss=0.1453, cr_loss=0.3995, over 20959.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3678, over 4109500.78 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:07:19,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.08 vs. limit=6.0 2024-09-18 17:07:28,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=791398.1666666666, ans=0.125 2024-09-18 17:07:49,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=791454.8333333334, ans=0.125 2024-09-18 17:08:09,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-18 17:08:18,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=791483.1666666666, ans=0.125 2024-09-18 17:08:21,508 INFO [train.py:1198] (1/2) Epoch 44, batch 4550, loss[loss=0.2136, ctc_loss=0.1397, cr_loss=0.3695, over 20891.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3681, over 4110067.34 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:08:48,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-18 17:09:09,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=791596.5, ans=0.125 2024-09-18 17:09:18,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.279e+02 2.386e+02 2.531e+02 3.881e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-18 17:09:19,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791596.5, ans=0.1 2024-09-18 17:09:37,395 INFO [train.py:1198] (1/2) Epoch 44, batch 4600, loss[loss=0.194, ctc_loss=0.1234, cr_loss=0.3531, over 20990.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4086605.52 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:09:47,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-18 17:10:39,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=791766.5, ans=0.2 2024-09-18 17:10:52,918 INFO [train.py:1198] (1/2) Epoch 44, batch 4650, loss[loss=0.2203, ctc_loss=0.1452, cr_loss=0.3758, over 20712.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3687, over 4089120.18 frames. ], batch size: 71, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:11:38,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=791851.5, ans=0.0 2024-09-18 17:11:55,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.226e+02 2.335e+02 2.506e+02 4.223e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-18 17:12:14,836 INFO [train.py:1198] (1/2) Epoch 44, batch 4700, loss[loss=0.1828, ctc_loss=0.1181, cr_loss=0.3234, over 20978.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1438, cr_loss=0.3706, over 4092060.96 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:13:15,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792049.8333333334, ans=0.0 2024-09-18 17:13:30,237 INFO [train.py:1198] (1/2) Epoch 44, batch 4750, loss[loss=0.2031, ctc_loss=0.1321, cr_loss=0.3547, over 20885.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1434, cr_loss=0.3702, over 4095916.00 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:13:43,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=792106.5, ans=0.2 2024-09-18 17:14:05,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=792134.8333333334, ans=0.0 2024-09-18 17:14:11,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=792134.8333333334, ans=0.5 2024-09-18 17:14:17,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792163.1666666666, ans=0.1 2024-09-18 17:14:28,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.260e+02 2.389e+02 2.525e+02 3.580e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-18 17:14:30,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-09-18 17:14:45,726 INFO [train.py:1198] (1/2) Epoch 44, batch 4800, loss[loss=0.2131, ctc_loss=0.1409, cr_loss=0.3609, over 20975.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3707, over 4088518.52 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:15:02,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=792248.1666666666, ans=0.125 2024-09-18 17:15:08,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792248.1666666666, ans=0.0 2024-09-18 17:15:08,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792248.1666666666, ans=0.0 2024-09-18 17:15:33,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=792304.8333333334, ans=0.2 2024-09-18 17:15:48,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=792333.1666666666, ans=0.2 2024-09-18 17:16:01,533 INFO [train.py:1198] (1/2) Epoch 44, batch 4850, loss[loss=0.2559, ctc_loss=0.1714, cr_loss=0.4225, over 20633.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1432, cr_loss=0.3695, over 4095033.30 frames. ], batch size: 66, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:16:21,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=792389.8333333334, ans=0.125 2024-09-18 17:16:29,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792389.8333333334, ans=0.1 2024-09-18 17:16:47,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=792446.5, ans=0.0 2024-09-18 17:17:02,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=792446.5, ans=0.125 2024-09-18 17:17:04,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.276e+02 2.403e+02 2.587e+02 5.923e+02, threshold=4.806e+02, percent-clipped=1.0 2024-09-18 17:17:20,686 INFO [train.py:1198] (1/2) Epoch 44, batch 4900, loss[loss=0.2205, ctc_loss=0.1443, cr_loss=0.3811, over 20963.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3688, over 4103560.40 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:17:21,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792503.1666666666, ans=0.1 2024-09-18 17:17:33,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=792503.1666666666, ans=0.125 2024-09-18 17:17:49,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:17:58,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 17:17:59,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=792559.8333333334, ans=0.04949747468305833 2024-09-18 17:18:09,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-18 17:18:16,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 17:18:20,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=792588.1666666666, ans=0.025 2024-09-18 17:18:38,358 INFO [train.py:1198] (1/2) Epoch 44, batch 4950, loss[loss=0.2227, ctc_loss=0.146, cr_loss=0.3836, over 21037.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3688, over 4102245.80 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:18:51,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-09-18 17:19:17,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=792701.5, ans=0.09899494936611666 2024-09-18 17:19:36,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.237e+02 2.369e+02 2.526e+02 3.939e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 17:19:53,593 INFO [train.py:1198] (1/2) Epoch 44, batch 5000, loss[loss=0.2089, ctc_loss=0.1342, cr_loss=0.3734, over 21008.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3691, over 4092818.28 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:20:14,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=792814.8333333334, ans=0.125 2024-09-18 17:20:24,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-09-18 17:20:35,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=792843.1666666666, ans=0.125 2024-09-18 17:21:01,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-18 17:21:08,205 INFO [train.py:1198] (1/2) Epoch 44, batch 5050, loss[loss=0.1922, ctc_loss=0.1236, cr_loss=0.3434, over 21011.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.3698, over 4084202.37 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:22:05,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.238e+02 2.368e+02 2.535e+02 3.170e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 17:22:22,289 INFO [train.py:1198] (1/2) Epoch 44, batch 5100, loss[loss=0.2497, ctc_loss=0.1675, cr_loss=0.4112, over 19451.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3705, over 4078699.98 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:22:38,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=793098.1666666666, ans=10.0 2024-09-18 17:22:43,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=793098.1666666666, ans=0.125 2024-09-18 17:23:13,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=793154.8333333334, ans=0.125 2024-09-18 17:23:18,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=793154.8333333334, ans=0.125 2024-09-18 17:23:36,496 INFO [train.py:1198] (1/2) Epoch 44, batch 5150, loss[loss=0.2589, ctc_loss=0.1737, cr_loss=0.4259, over 18182.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3699, over 4082706.68 frames. ], batch size: 108, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:23:57,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=793239.8333333334, ans=0.0 2024-09-18 17:24:06,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-09-18 17:24:12,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=793268.1666666666, ans=0.2 2024-09-18 17:24:26,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-18 17:24:31,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-18 17:24:33,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.263e+02 2.379e+02 2.589e+02 3.092e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 17:24:37,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-18 17:24:50,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=793353.1666666666, ans=0.0 2024-09-18 17:24:51,597 INFO [train.py:1198] (1/2) Epoch 44, batch 5200, loss[loss=0.2257, ctc_loss=0.1477, cr_loss=0.39, over 20977.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3696, over 4079903.94 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:24:53,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793353.1666666666, ans=0.125 2024-09-18 17:25:10,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=793381.5, ans=0.125 2024-09-18 17:25:35,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=793438.1666666666, ans=0.125 2024-09-18 17:25:44,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=793438.1666666666, ans=0.0 2024-09-18 17:26:05,998 INFO [train.py:1198] (1/2) Epoch 44, batch 5250, loss[loss=0.2407, ctc_loss=0.1615, cr_loss=0.3957, over 20664.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3681, over 4085856.77 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:27:07,682 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.198e+02 2.344e+02 2.484e+02 5.021e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-18 17:27:24,243 INFO [train.py:1198] (1/2) Epoch 44, batch 5300, loss[loss=0.184, ctc_loss=0.1202, cr_loss=0.319, over 20987.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3679, over 4082727.14 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:27:28,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=12.0 2024-09-18 17:28:40,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=793778.1666666666, ans=0.2 2024-09-18 17:28:41,712 INFO [train.py:1198] (1/2) Epoch 44, batch 5350, loss[loss=0.2266, ctc_loss=0.1489, cr_loss=0.3883, over 20983.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3678, over 4088310.31 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:29:33,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=793863.1666666666, ans=0.07 2024-09-18 17:29:38,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=793863.1666666666, ans=0.125 2024-09-18 17:29:39,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.206e+02 2.376e+02 2.538e+02 3.029e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-18 17:29:50,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=793891.5, ans=0.125 2024-09-18 17:29:55,659 INFO [train.py:1198] (1/2) Epoch 44, batch 5400, loss[loss=0.2383, ctc_loss=0.1578, cr_loss=0.4027, over 20880.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3681, over 4096755.63 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:30:54,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=794033.1666666666, ans=0.125 2024-09-18 17:30:57,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=794033.1666666666, ans=0.125 2024-09-18 17:31:10,156 INFO [train.py:1198] (1/2) Epoch 44, batch 5450, loss[loss=0.2345, ctc_loss=0.1573, cr_loss=0.3858, over 20947.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3682, over 4110542.61 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:31:21,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794061.5, ans=0.125 2024-09-18 17:32:10,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.224e+02 2.387e+02 2.567e+02 3.134e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-18 17:32:25,115 INFO [train.py:1198] (1/2) Epoch 44, batch 5500, loss[loss=0.2192, ctc_loss=0.144, cr_loss=0.3761, over 20211.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3688, over 4108241.59 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:32:28,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794203.1666666666, ans=0.1 2024-09-18 17:32:29,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794203.1666666666, ans=0.1 2024-09-18 17:33:02,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=794259.8333333334, ans=0.0 2024-09-18 17:33:02,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=794259.8333333334, ans=0.0 2024-09-18 17:33:04,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2024-09-18 17:33:08,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 17:33:39,276 INFO [train.py:1198] (1/2) Epoch 44, batch 5550, loss[loss=0.2167, ctc_loss=0.1436, cr_loss=0.3652, over 20818.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3682, over 4092903.85 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:34:29,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=794429.8333333334, ans=0.125 2024-09-18 17:34:34,748 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:34:38,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.257e+02 2.381e+02 2.509e+02 3.359e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 17:34:40,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794458.1666666666, ans=0.1 2024-09-18 17:34:53,908 INFO [train.py:1198] (1/2) Epoch 44, batch 5600, loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3679, over 20252.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3679, over 4091314.82 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:34:54,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=794486.5, ans=0.2 2024-09-18 17:35:06,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=794486.5, ans=0.2 2024-09-18 17:35:06,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2024-09-18 17:35:32,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-18 17:35:33,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794543.1666666666, ans=0.125 2024-09-18 17:35:57,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794599.8333333334, ans=0.0 2024-09-18 17:36:10,624 INFO [train.py:1198] (1/2) Epoch 44, batch 5650, loss[loss=0.2376, ctc_loss=0.1589, cr_loss=0.3935, over 20672.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3678, over 4086850.94 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:36:17,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-18 17:36:19,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=794628.1666666666, ans=0.125 2024-09-18 17:36:22,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=794628.1666666666, ans=0.0 2024-09-18 17:36:55,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794684.8333333334, ans=0.0 2024-09-18 17:37:11,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 17:37:13,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.240e+02 2.370e+02 2.562e+02 3.826e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 17:37:27,350 INFO [train.py:1198] (1/2) Epoch 44, batch 5700, loss[loss=0.2193, ctc_loss=0.1445, cr_loss=0.3739, over 21048.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3691, over 4092553.88 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:37:27,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794769.8333333334, ans=0.1 2024-09-18 17:37:44,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794798.1666666666, ans=0.125 2024-09-18 17:37:46,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-09-18 17:38:03,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=794826.5, ans=0.025 2024-09-18 17:38:29,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=794883.1666666666, ans=0.0 2024-09-18 17:38:35,363 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:38:42,456 INFO [train.py:1198] (1/2) Epoch 44, batch 5750, loss[loss=0.191, ctc_loss=0.125, cr_loss=0.3298, over 20778.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3694, over 4093023.28 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:38:51,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-09-18 17:39:39,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=794996.5, ans=0.125 2024-09-18 17:39:42,296 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:39:43,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.301e+02 2.405e+02 2.544e+02 5.354e+02, threshold=4.810e+02, percent-clipped=1.0 2024-09-18 17:39:46,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=795024.8333333334, ans=0.07 2024-09-18 17:39:56,743 INFO [train.py:1198] (1/2) Epoch 44, batch 5800, loss[loss=0.1715, ctc_loss=0.1107, cr_loss=0.3036, over 20971.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3701, over 4087140.58 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:40:08,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=795053.1666666666, ans=0.125 2024-09-18 17:40:16,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795081.5, ans=0.125 2024-09-18 17:40:16,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=795081.5, ans=0.09899494936611666 2024-09-18 17:40:26,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=795109.8333333334, ans=0.035 2024-09-18 17:41:00,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=795166.5, ans=0.125 2024-09-18 17:41:10,612 INFO [train.py:1198] (1/2) Epoch 44, batch 5850, loss[loss=0.205, ctc_loss=0.1372, cr_loss=0.339, over 21021.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3709, over 4066556.86 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:41:31,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=795223.1666666666, ans=0.125 2024-09-18 17:41:39,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=795251.5, ans=0.0 2024-09-18 17:42:11,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.307e+02 2.427e+02 2.615e+02 7.127e+02, threshold=4.855e+02, percent-clipped=1.0 2024-09-18 17:42:24,560 INFO [train.py:1198] (1/2) Epoch 44, batch 5900, loss[loss=0.2498, ctc_loss=0.1685, cr_loss=0.4064, over 20666.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1448, cr_loss=0.3712, over 4070093.05 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:42:35,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 17:42:53,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=795393.1666666666, ans=0.125 2024-09-18 17:43:38,666 INFO [train.py:1198] (1/2) Epoch 44, batch 5950, loss[loss=0.2214, ctc_loss=0.1441, cr_loss=0.3866, over 20771.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.144, cr_loss=0.3704, over 4084729.41 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:44:03,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-18 17:44:18,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=795534.8333333334, ans=0.04949747468305833 2024-09-18 17:44:42,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.247e+02 2.351e+02 2.487e+02 3.184e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-18 17:44:55,862 INFO [train.py:1198] (1/2) Epoch 44, batch 6000, loss[loss=0.1872, ctc_loss=0.1221, cr_loss=0.3256, over 20361.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3704, over 4072132.13 frames. ], batch size: 45, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:44:55,863 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 17:45:15,463 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.572e-14, over 944034.00 frames. 2024-09-18 17:45:15,464 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 17:45:21,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=795619.8333333334, ans=0.125 2024-09-18 17:45:39,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=795648.1666666666, ans=0.07 2024-09-18 17:45:41,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=795648.1666666666, ans=0.0 2024-09-18 17:46:04,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=795704.8333333334, ans=0.025 2024-09-18 17:46:08,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=795704.8333333334, ans=0.125 2024-09-18 17:46:29,338 INFO [train.py:1198] (1/2) Epoch 44, batch 6050, loss[loss=0.2404, ctc_loss=0.1602, cr_loss=0.4008, over 20670.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.371, over 4059100.25 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:46:51,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=795789.8333333334, ans=0.125 2024-09-18 17:47:06,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795818.1666666666, ans=0.125 2024-09-18 17:47:06,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=795818.1666666666, ans=0.5 2024-09-18 17:47:30,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=795874.8333333334, ans=0.025 2024-09-18 17:47:31,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.254e+02 2.386e+02 2.577e+02 4.407e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 17:47:45,134 INFO [train.py:1198] (1/2) Epoch 44, batch 6100, loss[loss=0.2072, ctc_loss=0.1362, cr_loss=0.3548, over 20978.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3698, over 4079855.59 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:48:02,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=795931.5, ans=0.95 2024-09-18 17:48:27,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-09-18 17:48:35,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=795988.1666666666, ans=0.95 2024-09-18 17:48:49,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-09-18 17:48:54,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 17:49:00,134 INFO [train.py:1198] (1/2) Epoch 44, batch 6150, loss[loss=0.2472, ctc_loss=0.1663, cr_loss=0.4044, over 20111.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3693, over 4080182.52 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:49:13,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=796073.1666666666, ans=0.0 2024-09-18 17:49:14,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=796073.1666666666, ans=0.0 2024-09-18 17:49:35,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=796101.5, ans=0.125 2024-09-18 17:50:00,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.226e+02 2.383e+02 2.566e+02 3.697e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 17:50:14,543 INFO [train.py:1198] (1/2) Epoch 44, batch 6200, loss[loss=0.2318, ctc_loss=0.1553, cr_loss=0.3824, over 20665.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3694, over 4063187.88 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:50:31,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=796214.8333333334, ans=0.125 2024-09-18 17:50:45,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=12.0 2024-09-18 17:50:57,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=796271.5, ans=0.125 2024-09-18 17:50:58,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796271.5, ans=0.1 2024-09-18 17:51:24,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2024-09-18 17:51:27,099 INFO [train.py:1198] (1/2) Epoch 44, batch 6250, loss[loss=0.2555, ctc_loss=0.1766, cr_loss=0.3944, over 14233.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3689, over 4021524.84 frames. ], batch size: 149, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:51:27,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=22.5 2024-09-18 17:51:31,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=796328.1666666666, ans=0.125 2024-09-18 17:52:01,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=796384.8333333334, ans=0.2 2024-09-18 17:52:13,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=796413.1666666666, ans=0.025 2024-09-18 17:52:13,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-18 17:52:19,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-18 17:52:27,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.280e+02 2.451e+02 2.671e+02 5.085e+02, threshold=4.902e+02, percent-clipped=1.0 2024-09-18 17:52:40,868 INFO [train.py:1198] (1/2) Epoch 44, batch 6300, loss[loss=0.1849, ctc_loss=0.1188, cr_loss=0.3305, over 19729.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3712, over 3956933.09 frames. ], batch size: 44, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:52:41,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-18 17:52:56,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=796498.1666666666, ans=0.0 2024-09-18 17:53:12,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=796526.5, ans=0.125 2024-09-18 17:53:26,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=796554.8333333334, ans=0.0 2024-09-18 17:53:27,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=796554.8333333334, ans=0.125 2024-09-18 17:53:27,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=796554.8333333334, ans=0.125 2024-09-18 17:53:33,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=796554.8333333334, ans=0.125 2024-09-18 17:53:51,334 INFO [train.py:1198] (1/2) Epoch 44, batch 6350, loss[loss=0.2506, ctc_loss=0.1746, cr_loss=0.38, over 14308.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3739, over 3781427.24 frames. ], batch size: 150, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:54:27,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=796668.1666666666, ans=0.125 2024-09-18 17:55:40,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.667e+02 2.899e+02 3.128e+02 5.853e+02, threshold=5.799e+02, percent-clipped=3.0 2024-09-18 17:55:40,122 INFO [train.py:1198] (1/2) Epoch 45, batch 0, loss[loss=0.231, ctc_loss=0.1541, cr_loss=0.3847, over 20964.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1541, cr_loss=0.3847, over 20964.00 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:55:40,122 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 17:55:58,285 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.03872, ctc_loss=0.03872, cr_loss=1.531e-14, over 944034.00 frames. 2024-09-18 17:55:58,285 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 17:56:43,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796812.6666666666, ans=0.1 2024-09-18 17:56:46,640 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:56:47,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=796812.6666666666, ans=0.0 2024-09-18 17:56:49,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=796812.6666666666, ans=0.125 2024-09-18 17:56:52,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=796812.6666666666, ans=0.125 2024-09-18 17:56:57,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-09-18 17:56:58,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796841.0, ans=0.125 2024-09-18 17:57:04,897 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:57:13,626 INFO [train.py:1198] (1/2) Epoch 45, batch 50, loss[loss=0.276, ctc_loss=0.1959, cr_loss=0.4005, over 13860.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.3704, over 925006.87 frames. ], batch size: 149, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:57:21,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=796869.3333333334, ans=0.0 2024-09-18 17:57:36,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=796897.6666666666, ans=0.0 2024-09-18 17:57:41,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=796897.6666666666, ans=0.0 2024-09-18 17:57:43,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-18 17:57:48,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796926.0, ans=0.1 2024-09-18 17:57:48,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=796926.0, ans=0.125 2024-09-18 17:57:50,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-18 17:58:06,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=796954.3333333334, ans=0.0 2024-09-18 17:58:27,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=797011.0, ans=0.125 2024-09-18 17:58:28,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.219e+02 2.318e+02 2.504e+02 2.972e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-18 17:58:28,914 INFO [train.py:1198] (1/2) Epoch 45, batch 100, loss[loss=0.2047, ctc_loss=0.133, cr_loss=0.3582, over 20768.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.369, over 1629483.79 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:59:12,956 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:59:14,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=797096.0, ans=0.125 2024-09-18 17:59:47,777 INFO [train.py:1198] (1/2) Epoch 45, batch 150, loss[loss=0.1984, ctc_loss=0.1294, cr_loss=0.3451, over 21069.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.3699, over 2173744.07 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 17:59:48,206 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:59:57,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797152.6666666666, ans=0.1 2024-09-18 18:00:38,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-18 18:00:49,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=797237.6666666666, ans=0.125 2024-09-18 18:00:50,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=797266.0, ans=0.0 2024-09-18 18:01:06,691 INFO [train.py:1198] (1/2) Epoch 45, batch 200, loss[loss=0.2074, ctc_loss=0.1337, cr_loss=0.3686, over 20901.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1437, cr_loss=0.3701, over 2606927.03 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:01:08,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.215e+02 2.309e+02 2.528e+02 3.607e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-18 18:01:25,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=797322.6666666666, ans=0.0 2024-09-18 18:01:41,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=797351.0, ans=0.125 2024-09-18 18:01:41,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=797351.0, ans=0.2 2024-09-18 18:01:49,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=797351.0, ans=0.0 2024-09-18 18:01:55,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=797379.3333333334, ans=0.0 2024-09-18 18:02:22,091 INFO [train.py:1198] (1/2) Epoch 45, batch 250, loss[loss=0.2226, ctc_loss=0.1482, cr_loss=0.3722, over 20882.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3695, over 2922224.03 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:02:22,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=797436.0, ans=0.125 2024-09-18 18:03:01,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=797492.6666666666, ans=0.125 2024-09-18 18:03:37,498 INFO [train.py:1198] (1/2) Epoch 45, batch 300, loss[loss=0.2084, ctc_loss=0.1385, cr_loss=0.3496, over 20937.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.145, cr_loss=0.3718, over 3160747.32 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:03:39,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.263e+02 2.431e+02 2.595e+02 4.391e+02, threshold=4.861e+02, percent-clipped=0.0 2024-09-18 18:03:42,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797577.6666666666, ans=0.0 2024-09-18 18:03:42,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-18 18:04:11,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=797634.3333333334, ans=0.0 2024-09-18 18:04:12,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=797634.3333333334, ans=0.125 2024-09-18 18:04:53,371 INFO [train.py:1198] (1/2) Epoch 45, batch 350, loss[loss=0.1732, ctc_loss=0.109, cr_loss=0.3214, over 19852.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3693, over 3345989.53 frames. ], batch size: 44, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:05:04,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=797719.3333333334, ans=0.125 2024-09-18 18:05:26,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=797776.0, ans=0.125 2024-09-18 18:05:33,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797776.0, ans=0.0 2024-09-18 18:05:47,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=797804.3333333334, ans=0.125 2024-09-18 18:05:59,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=797832.6666666666, ans=0.125 2024-09-18 18:06:15,555 INFO [train.py:1198] (1/2) Epoch 45, batch 400, loss[loss=0.2317, ctc_loss=0.1537, cr_loss=0.3899, over 21029.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3711, over 3495029.12 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:06:17,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.233e+02 2.338e+02 2.589e+02 3.115e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 18:06:23,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=797861.0, ans=0.0 2024-09-18 18:06:50,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=12.0 2024-09-18 18:07:09,224 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:07:24,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=797974.3333333334, ans=0.0 2024-09-18 18:07:31,428 INFO [train.py:1198] (1/2) Epoch 45, batch 450, loss[loss=0.2423, ctc_loss=0.1642, cr_loss=0.3904, over 20316.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.37, over 3631991.45 frames. ], batch size: 74, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:07:51,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=12.0 2024-09-18 18:08:19,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=798087.6666666666, ans=0.1 2024-09-18 18:08:33,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.11 vs. limit=10.0 2024-09-18 18:08:46,374 INFO [train.py:1198] (1/2) Epoch 45, batch 500, loss[loss=0.2011, ctc_loss=0.129, cr_loss=0.3602, over 20971.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.369, over 3739147.27 frames. ], batch size: 50, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:08:47,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.224e+02 2.357e+02 2.491e+02 6.322e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 18:08:57,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.11 vs. limit=10.0 2024-09-18 18:09:19,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=798201.0, ans=0.5 2024-09-18 18:09:33,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-18 18:09:39,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=798229.3333333334, ans=0.2 2024-09-18 18:09:56,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=798257.6666666666, ans=0.125 2024-09-18 18:10:02,022 INFO [train.py:1198] (1/2) Epoch 45, batch 550, loss[loss=0.2264, ctc_loss=0.1522, cr_loss=0.3714, over 21043.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3694, over 3826468.39 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:10:10,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=798286.0, ans=0.125 2024-09-18 18:10:23,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=798314.3333333334, ans=0.04949747468305833 2024-09-18 18:10:26,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=798314.3333333334, ans=0.5 2024-09-18 18:10:41,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=798342.6666666666, ans=0.025 2024-09-18 18:10:48,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=798371.0, ans=0.07 2024-09-18 18:10:49,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-09-18 18:10:57,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=798371.0, ans=0.0 2024-09-18 18:10:59,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=798371.0, ans=0.125 2024-09-18 18:11:20,367 INFO [train.py:1198] (1/2) Epoch 45, batch 600, loss[loss=0.2253, ctc_loss=0.1508, cr_loss=0.3723, over 20647.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3683, over 3881869.35 frames. ], batch size: 71, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:11:23,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.258e+02 2.386e+02 2.547e+02 3.258e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 18:11:26,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=798427.6666666666, ans=0.125 2024-09-18 18:12:13,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=798512.6666666666, ans=0.0 2024-09-18 18:12:26,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=798541.0, ans=0.125 2024-09-18 18:12:28,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2024-09-18 18:12:39,538 INFO [train.py:1198] (1/2) Epoch 45, batch 650, loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3717, over 20787.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3692, over 3916927.98 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:12:41,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=798569.3333333334, ans=0.1 2024-09-18 18:12:55,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=798597.6666666666, ans=0.125 2024-09-18 18:13:05,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=798597.6666666666, ans=0.125 2024-09-18 18:13:19,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=798626.0, ans=0.125 2024-09-18 18:13:28,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-09-18 18:13:55,099 INFO [train.py:1198] (1/2) Epoch 45, batch 700, loss[loss=0.2422, ctc_loss=0.1598, cr_loss=0.4119, over 20977.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3705, over 3951500.65 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 8.0 2024-09-18 18:13:59,546 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.285e+02 2.417e+02 2.573e+02 3.290e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-18 18:14:20,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=798739.3333333334, ans=0.025 2024-09-18 18:14:29,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=798767.6666666666, ans=0.125 2024-09-18 18:14:47,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=798796.0, ans=0.125 2024-09-18 18:15:10,468 INFO [train.py:1198] (1/2) Epoch 45, batch 750, loss[loss=0.2363, ctc_loss=0.1564, cr_loss=0.3992, over 20891.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3708, over 3967955.67 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 8.0 2024-09-18 18:15:12,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=798852.6666666666, ans=0.025 2024-09-18 18:15:16,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=798852.6666666666, ans=0.0 2024-09-18 18:15:27,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-18 18:15:44,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.20 vs. limit=10.0 2024-09-18 18:16:07,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=798937.6666666666, ans=0.125 2024-09-18 18:16:22,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=798966.0, ans=0.125 2024-09-18 18:16:25,417 INFO [train.py:1198] (1/2) Epoch 45, batch 800, loss[loss=0.1939, ctc_loss=0.1264, cr_loss=0.3374, over 21059.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3693, over 4002420.05 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:16:27,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798994.3333333334, ans=0.1 2024-09-18 18:16:29,967 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.236e+02 2.375e+02 2.577e+02 6.506e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 18:16:40,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-09-18 18:17:27,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799079.3333333334, ans=0.1 2024-09-18 18:17:46,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=799136.0, ans=0.025 2024-09-18 18:17:47,220 INFO [train.py:1198] (1/2) Epoch 45, batch 850, loss[loss=0.2056, ctc_loss=0.1359, cr_loss=0.3486, over 20768.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.37, over 4022635.46 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:18:06,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=799164.3333333334, ans=0.0 2024-09-18 18:18:08,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=799164.3333333334, ans=0.2 2024-09-18 18:18:14,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799164.3333333334, ans=0.1 2024-09-18 18:18:25,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=799192.6666666666, ans=0.0 2024-09-18 18:18:28,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=799192.6666666666, ans=0.0 2024-09-18 18:18:40,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=799221.0, ans=0.2 2024-09-18 18:18:42,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799221.0, ans=0.1 2024-09-18 18:19:02,991 INFO [train.py:1198] (1/2) Epoch 45, batch 900, loss[loss=0.2008, ctc_loss=0.1318, cr_loss=0.3451, over 21037.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3684, over 4046353.22 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:19:07,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.270e+02 2.356e+02 2.546e+02 3.184e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-18 18:19:54,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=8.0 2024-09-18 18:20:14,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=799391.0, ans=0.125 2024-09-18 18:20:15,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-18 18:20:18,849 INFO [train.py:1198] (1/2) Epoch 45, batch 950, loss[loss=0.2296, ctc_loss=0.1517, cr_loss=0.3893, over 20928.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3693, over 4060183.92 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:20:44,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799447.6666666666, ans=0.1 2024-09-18 18:21:13,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=799504.3333333334, ans=0.025 2024-09-18 18:21:13,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=799504.3333333334, ans=0.125 2024-09-18 18:21:19,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=799532.6666666666, ans=0.125 2024-09-18 18:21:21,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2024-09-18 18:21:24,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=799532.6666666666, ans=0.0 2024-09-18 18:21:33,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-18 18:21:34,825 INFO [train.py:1198] (1/2) Epoch 45, batch 1000, loss[loss=0.2172, ctc_loss=0.1423, cr_loss=0.3744, over 20978.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3693, over 4079888.07 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:21:39,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.263e+02 2.432e+02 2.613e+02 3.713e+02, threshold=4.863e+02, percent-clipped=0.0 2024-09-18 18:21:41,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=799561.0, ans=0.0 2024-09-18 18:22:06,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-18 18:22:40,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=799674.3333333334, ans=0.125 2024-09-18 18:22:54,113 INFO [train.py:1198] (1/2) Epoch 45, batch 1050, loss[loss=0.2263, ctc_loss=0.1484, cr_loss=0.3897, over 20886.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3687, over 4093462.70 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:23:03,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=799702.6666666666, ans=0.0 2024-09-18 18:23:31,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799759.3333333334, ans=0.125 2024-09-18 18:23:48,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=22.5 2024-09-18 18:23:54,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-18 18:24:12,444 INFO [train.py:1198] (1/2) Epoch 45, batch 1100, loss[loss=0.1837, ctc_loss=0.1206, cr_loss=0.3152, over 19879.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3684, over 4091620.85 frames. ], batch size: 44, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:24:17,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.246e+02 2.365e+02 2.530e+02 3.099e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 18:24:20,395 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:24:32,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799872.6666666666, ans=0.125 2024-09-18 18:25:04,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=799929.3333333334, ans=0.125 2024-09-18 18:25:07,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=799929.3333333334, ans=0.0 2024-09-18 18:25:09,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-18 18:25:19,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=799957.6666666666, ans=0.125 2024-09-18 18:25:28,321 INFO [train.py:1198] (1/2) Epoch 45, batch 1150, loss[loss=0.1908, ctc_loss=0.1226, cr_loss=0.3409, over 21049.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3683, over 4091766.10 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:25:42,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=800014.3333333334, ans=0.125 2024-09-18 18:25:47,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=800014.3333333334, ans=0.09899494936611666 2024-09-18 18:25:52,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-18 18:25:58,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=800042.6666666666, ans=0.125 2024-09-18 18:26:18,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=800071.0, ans=0.125 2024-09-18 18:26:26,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=800071.0, ans=0.2 2024-09-18 18:26:29,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=800099.3333333334, ans=0.0 2024-09-18 18:26:33,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=800099.3333333334, ans=0.2 2024-09-18 18:26:44,296 INFO [train.py:1198] (1/2) Epoch 45, batch 1200, loss[loss=0.1806, ctc_loss=0.1145, cr_loss=0.3304, over 20944.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.142, cr_loss=0.3664, over 4085298.05 frames. ], batch size: 48, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:26:49,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.224e+02 2.342e+02 2.520e+02 5.310e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-18 18:26:52,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=800127.6666666666, ans=0.125 2024-09-18 18:27:26,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=800184.3333333334, ans=0.0 2024-09-18 18:27:47,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=800241.0, ans=0.125 2024-09-18 18:28:00,351 INFO [train.py:1198] (1/2) Epoch 45, batch 1250, loss[loss=0.2196, ctc_loss=0.1441, cr_loss=0.3773, over 20886.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3674, over 4082612.37 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:28:00,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=800269.3333333334, ans=0.025 2024-09-18 18:29:08,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=800382.6666666666, ans=0.0 2024-09-18 18:29:21,888 INFO [train.py:1198] (1/2) Epoch 45, batch 1300, loss[loss=0.2445, ctc_loss=0.1647, cr_loss=0.3991, over 20846.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3674, over 4080716.67 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:29:26,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.230e+02 2.357e+02 2.489e+02 3.462e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 18:29:40,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=800439.3333333334, ans=0.125 2024-09-18 18:29:50,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=800467.6666666666, ans=0.0 2024-09-18 18:29:56,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=800467.6666666666, ans=0.0 2024-09-18 18:30:31,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=800524.3333333334, ans=0.025 2024-09-18 18:30:37,597 INFO [train.py:1198] (1/2) Epoch 45, batch 1350, loss[loss=0.2231, ctc_loss=0.1467, cr_loss=0.3819, over 20352.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.142, cr_loss=0.3669, over 4087457.34 frames. ], batch size: 74, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:31:11,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=800609.3333333334, ans=0.2 2024-09-18 18:31:46,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=800666.0, ans=10.0 2024-09-18 18:31:49,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=800666.0, ans=0.0 2024-09-18 18:31:52,173 INFO [train.py:1198] (1/2) Epoch 45, batch 1400, loss[loss=0.2124, ctc_loss=0.1425, cr_loss=0.3496, over 21018.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.367, over 4102726.33 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:31:55,630 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:31:58,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.315e+02 2.423e+02 2.618e+02 8.259e+02, threshold=4.847e+02, percent-clipped=1.0 2024-09-18 18:32:09,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800722.6666666666, ans=0.1 2024-09-18 18:32:37,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=800779.3333333334, ans=0.125 2024-09-18 18:33:07,192 INFO [train.py:1198] (1/2) Epoch 45, batch 1450, loss[loss=0.2208, ctc_loss=0.1439, cr_loss=0.3845, over 20767.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3675, over 4093533.67 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:33:32,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800864.3333333334, ans=0.1 2024-09-18 18:33:36,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=800892.6666666666, ans=0.2 2024-09-18 18:33:51,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=800921.0, ans=0.125 2024-09-18 18:34:21,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=800949.3333333334, ans=0.125 2024-09-18 18:34:26,126 INFO [train.py:1198] (1/2) Epoch 45, batch 1500, loss[loss=0.2252, ctc_loss=0.1491, cr_loss=0.3804, over 21080.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3683, over 4087923.16 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:34:31,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.306e+02 2.399e+02 2.560e+02 3.657e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 18:35:39,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=801091.0, ans=0.125 2024-09-18 18:35:43,571 INFO [train.py:1198] (1/2) Epoch 45, batch 1550, loss[loss=0.2143, ctc_loss=0.1416, cr_loss=0.3635, over 20838.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3674, over 4079377.03 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:35:46,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=801119.3333333334, ans=0.05 2024-09-18 18:35:54,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=801119.3333333334, ans=0.125 2024-09-18 18:36:20,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=801176.0, ans=0.125 2024-09-18 18:36:27,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801204.3333333334, ans=0.1 2024-09-18 18:36:42,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2024-09-18 18:36:44,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=801232.6666666666, ans=0.2 2024-09-18 18:36:59,563 INFO [train.py:1198] (1/2) Epoch 45, batch 1600, loss[loss=0.2096, ctc_loss=0.1369, cr_loss=0.3634, over 21020.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1421, cr_loss=0.3661, over 4092828.53 frames. ], batch size: 63, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:37:05,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.255e+02 2.375e+02 2.515e+02 3.208e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 18:37:12,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=801261.0, ans=0.125 2024-09-18 18:37:26,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-18 18:37:48,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=801346.0, ans=0.2 2024-09-18 18:37:49,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801346.0, ans=0.1 2024-09-18 18:37:51,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=801346.0, ans=0.0 2024-09-18 18:37:54,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-18 18:38:00,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=801374.3333333334, ans=0.125 2024-09-18 18:38:15,174 INFO [train.py:1198] (1/2) Epoch 45, batch 1650, loss[loss=0.2349, ctc_loss=0.1544, cr_loss=0.4022, over 20918.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1417, cr_loss=0.366, over 4101580.83 frames. ], batch size: 64, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:38:35,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-18 18:38:38,611 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=22.5 2024-09-18 18:39:30,701 INFO [train.py:1198] (1/2) Epoch 45, batch 1700, loss[loss=0.243, ctc_loss=0.1637, cr_loss=0.3961, over 20866.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1411, cr_loss=0.3653, over 4111889.50 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:39:35,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=801544.3333333334, ans=0.0 2024-09-18 18:39:36,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.253e+02 2.361e+02 2.504e+02 5.463e+02, threshold=4.722e+02, percent-clipped=1.0 2024-09-18 18:40:52,308 INFO [train.py:1198] (1/2) Epoch 45, batch 1750, loss[loss=0.2085, ctc_loss=0.1343, cr_loss=0.3709, over 20787.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3676, over 4102924.33 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:40:54,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=801686.0, ans=0.0 2024-09-18 18:41:07,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=801714.3333333334, ans=0.0 2024-09-18 18:41:52,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=801799.3333333334, ans=0.0 2024-09-18 18:42:07,892 INFO [train.py:1198] (1/2) Epoch 45, batch 1800, loss[loss=0.1865, ctc_loss=0.1201, cr_loss=0.3318, over 20981.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.368, over 4104645.40 frames. ], batch size: 52, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:42:14,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.228e+02 2.327e+02 2.526e+02 4.210e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-18 18:42:18,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=801827.6666666666, ans=0.0 2024-09-18 18:42:29,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=801856.0, ans=0.125 2024-09-18 18:43:01,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=801912.6666666666, ans=0.2 2024-09-18 18:43:13,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=801941.0, ans=10.0 2024-09-18 18:43:24,125 INFO [train.py:1198] (1/2) Epoch 45, batch 1850, loss[loss=0.1844, ctc_loss=0.1197, cr_loss=0.3234, over 20981.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3673, over 4097062.36 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:43:26,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-18 18:43:34,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-09-18 18:43:38,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=801997.6666666666, ans=0.125 2024-09-18 18:43:42,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=801997.6666666666, ans=0.025 2024-09-18 18:43:45,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=801997.6666666666, ans=0.125 2024-09-18 18:44:39,733 INFO [train.py:1198] (1/2) Epoch 45, batch 1900, loss[loss=0.2571, ctc_loss=0.1786, cr_loss=0.3924, over 14480.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3669, over 4090734.07 frames. ], batch size: 149, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:44:45,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.299e+02 2.434e+02 2.578e+02 3.526e+02, threshold=4.868e+02, percent-clipped=0.0 2024-09-18 18:44:54,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=22.5 2024-09-18 18:45:01,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=802139.3333333334, ans=0.07 2024-09-18 18:45:11,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802167.6666666666, ans=0.0 2024-09-18 18:45:37,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802196.0, ans=0.1 2024-09-18 18:45:57,308 INFO [train.py:1198] (1/2) Epoch 45, batch 1950, loss[loss=0.2012, ctc_loss=0.1324, cr_loss=0.344, over 20773.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3684, over 4093374.23 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:46:10,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2024-09-18 18:46:12,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=802252.6666666666, ans=0.0 2024-09-18 18:46:37,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=802309.3333333334, ans=0.125 2024-09-18 18:46:59,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=802366.0, ans=0.2 2024-09-18 18:47:15,490 INFO [train.py:1198] (1/2) Epoch 45, batch 2000, loss[loss=0.2093, ctc_loss=0.1391, cr_loss=0.351, over 20827.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3685, over 4095196.65 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:47:17,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=802394.3333333334, ans=0.0 2024-09-18 18:47:21,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.275e+02 2.408e+02 2.614e+02 4.325e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 18:48:07,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=802479.3333333334, ans=15.0 2024-09-18 18:48:10,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-18 18:48:30,857 INFO [train.py:1198] (1/2) Epoch 45, batch 2050, loss[loss=0.2233, ctc_loss=0.1464, cr_loss=0.3844, over 20878.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3687, over 4096010.71 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:49:45,841 INFO [train.py:1198] (1/2) Epoch 45, batch 2100, loss[loss=0.2396, ctc_loss=0.1568, cr_loss=0.4141, over 20715.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3695, over 4095693.58 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:49:52,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=802677.6666666666, ans=10.0 2024-09-18 18:49:53,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.287e+02 2.405e+02 2.555e+02 6.408e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-18 18:51:01,147 INFO [train.py:1198] (1/2) Epoch 45, batch 2150, loss[loss=0.2522, ctc_loss=0.1755, cr_loss=0.3837, over 13934.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3694, over 4092078.50 frames. ], batch size: 150, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:51:18,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=802847.6666666666, ans=0.125 2024-09-18 18:51:19,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802847.6666666666, ans=0.0 2024-09-18 18:51:52,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=802904.3333333334, ans=0.0 2024-09-18 18:52:06,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-18 18:52:07,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=802932.6666666666, ans=0.125 2024-09-18 18:52:19,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=802932.6666666666, ans=0.0 2024-09-18 18:52:21,934 INFO [train.py:1198] (1/2) Epoch 45, batch 2200, loss[loss=0.2195, ctc_loss=0.146, cr_loss=0.3676, over 20832.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.3701, over 4081281.90 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:52:29,526 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.281e+02 2.422e+02 2.611e+02 3.786e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-18 18:52:31,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=802961.0, ans=0.0 2024-09-18 18:52:32,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=802961.0, ans=0.0 2024-09-18 18:52:54,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803017.6666666666, ans=0.0 2024-09-18 18:52:54,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=803017.6666666666, ans=0.0 2024-09-18 18:52:54,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=803017.6666666666, ans=0.2 2024-09-18 18:53:01,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=803017.6666666666, ans=0.125 2024-09-18 18:53:07,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803046.0, ans=0.1 2024-09-18 18:53:31,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=803074.3333333334, ans=0.2 2024-09-18 18:53:37,443 INFO [train.py:1198] (1/2) Epoch 45, batch 2250, loss[loss=0.2229, ctc_loss=0.1472, cr_loss=0.3785, over 20976.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3695, over 4074771.07 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:54:09,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=803159.3333333334, ans=0.125 2024-09-18 18:54:10,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=803159.3333333334, ans=0.125 2024-09-18 18:54:14,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 18:54:15,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=803159.3333333334, ans=0.125 2024-09-18 18:54:19,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=803159.3333333334, ans=0.0 2024-09-18 18:54:22,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803187.6666666666, ans=0.125 2024-09-18 18:54:27,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=803187.6666666666, ans=0.125 2024-09-18 18:54:43,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803216.0, ans=0.125 2024-09-18 18:54:49,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803216.0, ans=0.125 2024-09-18 18:54:52,622 INFO [train.py:1198] (1/2) Epoch 45, batch 2300, loss[loss=0.1942, ctc_loss=0.126, cr_loss=0.3412, over 20887.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1442, cr_loss=0.3706, over 4076575.23 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:55:00,158 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.237e+02 2.377e+02 2.585e+02 3.314e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 18:55:48,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=803329.3333333334, ans=0.125 2024-09-18 18:56:08,030 INFO [train.py:1198] (1/2) Epoch 45, batch 2350, loss[loss=0.2521, ctc_loss=0.17, cr_loss=0.4102, over 20823.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.144, cr_loss=0.3704, over 4084160.93 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:56:17,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=803386.0, ans=0.125 2024-09-18 18:56:25,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=803414.3333333334, ans=0.125 2024-09-18 18:56:28,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=803414.3333333334, ans=0.2 2024-09-18 18:57:11,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=803499.3333333334, ans=0.125 2024-09-18 18:57:22,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803499.3333333334, ans=0.1 2024-09-18 18:57:29,454 INFO [train.py:1198] (1/2) Epoch 45, batch 2400, loss[loss=0.1728, ctc_loss=0.1124, cr_loss=0.3021, over 20995.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3704, over 4097749.33 frames. ], batch size: 48, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:57:37,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.266e+02 2.397e+02 2.555e+02 3.642e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 18:57:40,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=803527.6666666666, ans=0.2 2024-09-18 18:57:46,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-09-18 18:57:50,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803556.0, ans=0.1 2024-09-18 18:57:55,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=803556.0, ans=0.125 2024-09-18 18:58:16,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=803612.6666666666, ans=0.0 2024-09-18 18:58:44,657 INFO [train.py:1198] (1/2) Epoch 45, batch 2450, loss[loss=0.2449, ctc_loss=0.1641, cr_loss=0.4038, over 20836.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3685, over 4093173.07 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:58:51,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=803669.3333333334, ans=0.0 2024-09-18 18:58:53,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-09-18 18:59:53,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=22.5 2024-09-18 18:59:59,802 INFO [train.py:1198] (1/2) Epoch 45, batch 2500, loss[loss=0.2386, ctc_loss=0.1549, cr_loss=0.4185, over 19984.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3675, over 4089396.17 frames. ], batch size: 80, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:00:07,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.203e+02 2.346e+02 2.471e+02 5.155e+02, threshold=4.693e+02, percent-clipped=1.0 2024-09-18 19:00:34,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=803867.6666666666, ans=0.125 2024-09-18 19:00:49,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=803896.0, ans=0.125 2024-09-18 19:01:14,966 INFO [train.py:1198] (1/2) Epoch 45, batch 2550, loss[loss=0.2447, ctc_loss=0.1658, cr_loss=0.3941, over 17908.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3684, over 4077809.79 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:01:24,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-18 19:01:25,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=803952.6666666666, ans=0.125 2024-09-18 19:01:44,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=804009.3333333334, ans=0.0 2024-09-18 19:02:04,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=12.0 2024-09-18 19:02:15,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=804037.6666666666, ans=0.0 2024-09-18 19:02:26,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=804066.0, ans=0.0 2024-09-18 19:02:29,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2024-09-18 19:02:33,136 INFO [train.py:1198] (1/2) Epoch 45, batch 2600, loss[loss=0.1883, ctc_loss=0.1198, cr_loss=0.3425, over 21008.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3689, over 4081282.60 frames. ], batch size: 52, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:02:40,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.253e+02 2.400e+02 2.597e+02 3.513e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 19:02:41,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=804094.3333333334, ans=0.125 2024-09-18 19:02:50,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-18 19:03:03,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=804122.6666666666, ans=0.125 2024-09-18 19:03:51,349 INFO [train.py:1198] (1/2) Epoch 45, batch 2650, loss[loss=0.1872, ctc_loss=0.1215, cr_loss=0.3286, over 20778.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3688, over 4074744.42 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:04:06,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=804264.3333333334, ans=0.125 2024-09-18 19:04:10,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:04:15,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=804264.3333333334, ans=0.2 2024-09-18 19:04:35,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-18 19:04:36,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2024-09-18 19:05:06,009 INFO [train.py:1198] (1/2) Epoch 45, batch 2700, loss[loss=0.2391, ctc_loss=0.1611, cr_loss=0.3898, over 20976.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.371, over 4076058.15 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:05:13,432 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.277e+02 2.422e+02 2.633e+02 3.547e+02, threshold=4.844e+02, percent-clipped=0.0 2024-09-18 19:05:16,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=804377.6666666666, ans=0.0 2024-09-18 19:06:16,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804491.0, ans=0.125 2024-09-18 19:06:20,859 INFO [train.py:1198] (1/2) Epoch 45, batch 2750, loss[loss=0.2512, ctc_loss=0.1699, cr_loss=0.4066, over 20825.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3707, over 4081744.57 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:06:56,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804576.0, ans=0.1 2024-09-18 19:06:58,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=804576.0, ans=0.125 2024-09-18 19:07:01,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=804576.0, ans=0.0 2024-09-18 19:07:02,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804576.0, ans=0.1 2024-09-18 19:07:16,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804604.3333333334, ans=0.1 2024-09-18 19:07:20,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=804632.6666666666, ans=0.015 2024-09-18 19:07:35,572 INFO [train.py:1198] (1/2) Epoch 45, batch 2800, loss[loss=0.2179, ctc_loss=0.1426, cr_loss=0.3767, over 20334.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3699, over 4087854.94 frames. ], batch size: 74, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:07:47,068 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.260e+02 2.383e+02 2.615e+02 3.904e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 19:07:59,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804689.3333333334, ans=0.1 2024-09-18 19:08:05,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=804689.3333333334, ans=22.5 2024-09-18 19:08:21,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804717.6666666666, ans=0.125 2024-09-18 19:08:34,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=804746.0, ans=0.2 2024-09-18 19:08:34,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=804746.0, ans=0.125 2024-09-18 19:08:49,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=804774.3333333334, ans=0.125 2024-09-18 19:08:51,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-09-18 19:08:55,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.80 vs. limit=5.0 2024-09-18 19:08:58,649 INFO [train.py:1198] (1/2) Epoch 45, batch 2850, loss[loss=0.1963, ctc_loss=0.1286, cr_loss=0.3381, over 20979.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3687, over 4067932.43 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:09:30,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=804859.3333333334, ans=0.2 2024-09-18 19:09:37,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804859.3333333334, ans=0.1 2024-09-18 19:09:57,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=804916.0, ans=0.2 2024-09-18 19:10:07,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-18 19:10:11,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804916.0, ans=0.1 2024-09-18 19:10:12,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=804944.3333333334, ans=0.0 2024-09-18 19:10:14,058 INFO [train.py:1198] (1/2) Epoch 45, batch 2900, loss[loss=0.2185, ctc_loss=0.1424, cr_loss=0.3808, over 20884.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3684, over 4079978.22 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:10:18,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=804944.3333333334, ans=10.0 2024-09-18 19:10:21,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.238e+02 2.395e+02 2.569e+02 3.756e+02, threshold=4.791e+02, percent-clipped=0.0 2024-09-18 19:10:47,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=805001.0, ans=0.125 2024-09-18 19:10:59,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=805029.3333333334, ans=0.025 2024-09-18 19:11:14,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=805057.6666666666, ans=0.2 2024-09-18 19:11:29,747 INFO [train.py:1198] (1/2) Epoch 45, batch 2950, loss[loss=0.2277, ctc_loss=0.1497, cr_loss=0.3899, over 21070.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3673, over 4083046.04 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:11:46,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=805114.3333333334, ans=0.2 2024-09-18 19:11:57,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-18 19:12:21,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=805171.0, ans=0.125 2024-09-18 19:12:26,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-18 19:12:44,800 INFO [train.py:1198] (1/2) Epoch 45, batch 3000, loss[loss=0.1991, ctc_loss=0.1301, cr_loss=0.3447, over 21063.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3677, over 4079972.12 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:12:44,801 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 19:13:03,159 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.03897, ctc_loss=0.03897, cr_loss=1.53e-14, over 944034.00 frames. 2024-09-18 19:13:03,160 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 19:13:14,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.226e+02 2.374e+02 2.629e+02 4.930e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 19:14:18,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=805341.0, ans=0.125 2024-09-18 19:14:21,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805341.0, ans=0.1 2024-09-18 19:14:23,692 INFO [train.py:1198] (1/2) Epoch 45, batch 3050, loss[loss=0.1966, ctc_loss=0.1314, cr_loss=0.3261, over 19844.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.142, cr_loss=0.3661, over 4080805.61 frames. ], batch size: 44, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:14:28,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=805369.3333333334, ans=0.0 2024-09-18 19:14:55,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=805426.0, ans=0.125 2024-09-18 19:15:21,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805454.3333333334, ans=0.1 2024-09-18 19:15:38,621 INFO [train.py:1198] (1/2) Epoch 45, batch 3100, loss[loss=0.2627, ctc_loss=0.1798, cr_loss=0.4143, over 18421.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3671, over 4078576.12 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:15:41,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=805511.0, ans=0.125 2024-09-18 19:15:47,838 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.199e+02 2.341e+02 2.490e+02 3.106e+02, threshold=4.683e+02, percent-clipped=0.0 2024-09-18 19:15:48,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-18 19:16:00,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=805539.3333333334, ans=15.0 2024-09-18 19:16:04,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=805539.3333333334, ans=0.125 2024-09-18 19:16:15,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-18 19:16:21,418 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:16:29,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=805596.0, ans=0.0 2024-09-18 19:16:54,657 INFO [train.py:1198] (1/2) Epoch 45, batch 3150, loss[loss=0.2323, ctc_loss=0.1552, cr_loss=0.3856, over 20872.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3685, over 4086559.77 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:17:26,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=805709.3333333334, ans=0.07 2024-09-18 19:18:10,052 INFO [train.py:1198] (1/2) Epoch 45, batch 3200, loss[loss=0.2001, ctc_loss=0.1322, cr_loss=0.3394, over 21001.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3682, over 4081531.02 frames. ], batch size: 63, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:18:20,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.271e+02 2.400e+02 2.558e+02 6.870e+02, threshold=4.801e+02, percent-clipped=1.0 2024-09-18 19:18:36,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=805822.6666666666, ans=0.125 2024-09-18 19:18:59,336 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:19:24,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=805907.6666666666, ans=0.2 2024-09-18 19:19:27,472 INFO [train.py:1198] (1/2) Epoch 45, batch 3250, loss[loss=0.231, ctc_loss=0.1506, cr_loss=0.402, over 20668.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.3669, over 4077236.80 frames. ], batch size: 68, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:19:30,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=805936.0, ans=0.0 2024-09-18 19:19:45,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=805964.3333333334, ans=0.125 2024-09-18 19:20:34,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=806049.3333333334, ans=0.125 2024-09-18 19:20:42,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 19:20:45,761 INFO [train.py:1198] (1/2) Epoch 45, batch 3300, loss[loss=0.22, ctc_loss=0.1447, cr_loss=0.3769, over 21074.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3652, over 4090427.39 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:20:46,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=806077.6666666666, ans=0.125 2024-09-18 19:20:46,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=806077.6666666666, ans=0.125 2024-09-18 19:20:53,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806077.6666666666, ans=0.1 2024-09-18 19:20:55,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806077.6666666666, ans=0.1 2024-09-18 19:20:56,353 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.201e+02 2.322e+02 2.511e+02 2.958e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-18 19:20:58,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=806077.6666666666, ans=0.125 2024-09-18 19:21:04,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=806106.0, ans=0.125 2024-09-18 19:21:07,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=806106.0, ans=0.125 2024-09-18 19:21:19,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=806134.3333333334, ans=0.125 2024-09-18 19:21:43,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=806162.6666666666, ans=0.125 2024-09-18 19:22:00,993 INFO [train.py:1198] (1/2) Epoch 45, batch 3350, loss[loss=0.2145, ctc_loss=0.139, cr_loss=0.3774, over 20844.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3656, over 4098372.46 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:22:01,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=806219.3333333334, ans=0.05 2024-09-18 19:22:23,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=806247.6666666666, ans=0.0 2024-09-18 19:22:47,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-18 19:22:57,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=806304.3333333334, ans=0.125 2024-09-18 19:23:08,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=806332.6666666666, ans=0.2 2024-09-18 19:23:15,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.77 vs. limit=10.0 2024-09-18 19:23:16,649 INFO [train.py:1198] (1/2) Epoch 45, batch 3400, loss[loss=0.2823, ctc_loss=0.1985, cr_loss=0.4189, over 13775.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1412, cr_loss=0.3652, over 4087121.30 frames. ], batch size: 150, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:23:27,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.263e+02 2.399e+02 2.569e+02 3.908e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-18 19:24:11,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-18 19:24:32,386 INFO [train.py:1198] (1/2) Epoch 45, batch 3450, loss[loss=0.2118, ctc_loss=0.1365, cr_loss=0.3766, over 20988.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1421, cr_loss=0.3664, over 4085204.64 frames. ], batch size: 55, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:24:38,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=806502.6666666666, ans=0.125 2024-09-18 19:24:58,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-09-18 19:25:25,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=806587.6666666666, ans=0.0 2024-09-18 19:25:53,812 INFO [train.py:1198] (1/2) Epoch 45, batch 3500, loss[loss=0.2509, ctc_loss=0.1708, cr_loss=0.4004, over 18415.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.367, over 4095308.40 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:26:04,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.268e+02 2.378e+02 2.564e+02 4.975e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 19:26:06,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=806644.3333333334, ans=0.025 2024-09-18 19:26:47,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=806729.3333333334, ans=0.0 2024-09-18 19:26:48,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=806729.3333333334, ans=0.125 2024-09-18 19:27:09,328 INFO [train.py:1198] (1/2) Epoch 45, batch 3550, loss[loss=0.2211, ctc_loss=0.1471, cr_loss=0.3698, over 21030.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3665, over 4106352.40 frames. ], batch size: 61, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:27:23,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=806814.3333333334, ans=0.125 2024-09-18 19:27:40,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=806842.6666666666, ans=0.04949747468305833 2024-09-18 19:28:08,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-09-18 19:28:11,534 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:28:24,595 INFO [train.py:1198] (1/2) Epoch 45, batch 3600, loss[loss=0.2227, ctc_loss=0.1452, cr_loss=0.3879, over 21039.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3669, over 4092534.70 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:28:35,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.229e+02 2.367e+02 2.528e+02 3.256e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-18 19:28:45,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-18 19:29:09,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=807012.6666666666, ans=0.07 2024-09-18 19:29:24,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807041.0, ans=0.1 2024-09-18 19:29:29,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=807041.0, ans=0.125 2024-09-18 19:29:39,742 INFO [train.py:1198] (1/2) Epoch 45, batch 3650, loss[loss=0.1834, ctc_loss=0.1165, cr_loss=0.3348, over 21053.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.368, over 4092939.78 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:30:13,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-18 19:30:24,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=807126.0, ans=0.125 2024-09-18 19:30:41,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=807182.6666666666, ans=0.0 2024-09-18 19:30:58,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=807182.6666666666, ans=0.95 2024-09-18 19:31:00,792 INFO [train.py:1198] (1/2) Epoch 45, batch 3700, loss[loss=0.202, ctc_loss=0.1327, cr_loss=0.3464, over 20761.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3688, over 4082002.47 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:31:01,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807211.0, ans=0.1 2024-09-18 19:31:13,128 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.278e+02 2.418e+02 2.561e+02 3.251e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-18 19:31:23,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=807239.3333333334, ans=0.125 2024-09-18 19:31:30,150 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:31:47,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807296.0, ans=0.125 2024-09-18 19:31:58,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=807296.0, ans=15.0 2024-09-18 19:32:07,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-18 19:32:16,256 INFO [train.py:1198] (1/2) Epoch 45, batch 3750, loss[loss=0.2382, ctc_loss=0.158, cr_loss=0.4008, over 20660.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.369, over 4078680.01 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:32:16,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=807352.6666666666, ans=0.0 2024-09-18 19:32:40,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=807381.0, ans=0.0 2024-09-18 19:32:43,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=807381.0, ans=0.0 2024-09-18 19:32:54,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-18 19:33:31,271 INFO [train.py:1198] (1/2) Epoch 45, batch 3800, loss[loss=0.2106, ctc_loss=0.136, cr_loss=0.3731, over 20774.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3699, over 4082972.84 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:33:43,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.188e+02 2.339e+02 2.511e+02 3.025e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 19:33:48,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807522.6666666666, ans=0.1 2024-09-18 19:33:49,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-18 19:33:50,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=807522.6666666666, ans=6.0 2024-09-18 19:34:15,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=807551.0, ans=22.5 2024-09-18 19:34:35,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=807607.6666666666, ans=0.015 2024-09-18 19:34:43,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=807607.6666666666, ans=0.125 2024-09-18 19:34:47,564 INFO [train.py:1198] (1/2) Epoch 45, batch 3850, loss[loss=0.1767, ctc_loss=0.1142, cr_loss=0.3125, over 20953.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1427, cr_loss=0.3691, over 4090581.79 frames. ], batch size: 51, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:34:52,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=807636.0, ans=0.0 2024-09-18 19:35:13,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=807664.3333333334, ans=0.0 2024-09-18 19:35:54,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807749.3333333334, ans=0.1 2024-09-18 19:36:05,637 INFO [train.py:1198] (1/2) Epoch 45, batch 3900, loss[loss=0.2256, ctc_loss=0.15, cr_loss=0.378, over 19476.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1435, cr_loss=0.3701, over 4095865.39 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:36:16,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=807777.6666666666, ans=0.0 2024-09-18 19:36:17,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.294e+02 2.399e+02 2.557e+02 3.460e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 19:36:26,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-18 19:36:30,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=807806.0, ans=0.125 2024-09-18 19:37:09,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=807891.0, ans=0.125 2024-09-18 19:37:24,225 INFO [train.py:1198] (1/2) Epoch 45, batch 3950, loss[loss=0.2473, ctc_loss=0.1659, cr_loss=0.4074, over 19340.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1433, cr_loss=0.3697, over 4103724.64 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:37:36,942 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:38:05,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=807976.0, ans=0.125 2024-09-18 19:38:19,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=808004.3333333334, ans=0.0 2024-09-18 19:38:30,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=808032.6666666666, ans=0.0 2024-09-18 19:38:34,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=808032.6666666666, ans=0.2 2024-09-18 19:38:40,562 INFO [train.py:1198] (1/2) Epoch 45, batch 4000, loss[loss=0.1946, ctc_loss=0.1288, cr_loss=0.3288, over 20978.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.37, over 4084777.56 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:38:51,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=808061.0, ans=0.125 2024-09-18 19:38:52,764 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.284e+02 2.406e+02 2.540e+02 3.120e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-18 19:38:57,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=808089.3333333334, ans=0.035 2024-09-18 19:38:59,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=808089.3333333334, ans=0.125 2024-09-18 19:39:05,102 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:39:14,164 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:39:14,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 19:39:45,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=808174.3333333334, ans=0.0 2024-09-18 19:39:47,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=808174.3333333334, ans=0.0 2024-09-18 19:39:56,067 INFO [train.py:1198] (1/2) Epoch 45, batch 4050, loss[loss=0.2259, ctc_loss=0.1486, cr_loss=0.3862, over 20625.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4092413.50 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:40:16,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=808231.0, ans=0.0 2024-09-18 19:40:40,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=808287.6666666666, ans=0.125 2024-09-18 19:41:00,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=808316.0, ans=0.2 2024-09-18 19:41:10,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=808344.3333333334, ans=0.0 2024-09-18 19:41:11,469 INFO [train.py:1198] (1/2) Epoch 45, batch 4100, loss[loss=0.235, ctc_loss=0.1543, cr_loss=0.4035, over 20611.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1423, cr_loss=0.3668, over 4093845.79 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:41:19,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=808344.3333333334, ans=0.125 2024-09-18 19:41:24,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.233e+02 2.357e+02 2.491e+02 5.288e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 19:42:01,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-18 19:42:32,248 INFO [train.py:1198] (1/2) Epoch 45, batch 4150, loss[loss=0.1756, ctc_loss=0.1122, cr_loss=0.3166, over 20965.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3683, over 4082387.32 frames. ], batch size: 48, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:42:33,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-18 19:42:45,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-18 19:43:10,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=808542.6666666666, ans=0.0 2024-09-18 19:43:31,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=808599.3333333334, ans=0.0 2024-09-18 19:43:37,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=808599.3333333334, ans=0.125 2024-09-18 19:43:42,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=808599.3333333334, ans=0.0 2024-09-18 19:43:44,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=15.0 2024-09-18 19:43:46,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=808627.6666666666, ans=0.125 2024-09-18 19:43:47,720 INFO [train.py:1198] (1/2) Epoch 45, batch 4200, loss[loss=0.2515, ctc_loss=0.1673, cr_loss=0.4208, over 20975.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3685, over 4077367.96 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:43:49,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=808627.6666666666, ans=0.2 2024-09-18 19:44:01,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.247e+02 2.404e+02 2.517e+02 7.160e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-18 19:44:06,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=808656.0, ans=0.025 2024-09-18 19:44:59,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-09-18 19:45:03,400 INFO [train.py:1198] (1/2) Epoch 45, batch 4250, loss[loss=0.1993, ctc_loss=0.1301, cr_loss=0.3456, over 20925.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3682, over 4073980.62 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:45:11,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-18 19:45:12,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808769.3333333334, ans=0.1 2024-09-18 19:45:21,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=808797.6666666666, ans=0.0 2024-09-18 19:45:51,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=808854.3333333334, ans=0.0 2024-09-18 19:46:18,554 INFO [train.py:1198] (1/2) Epoch 45, batch 4300, loss[loss=0.2131, ctc_loss=0.141, cr_loss=0.3604, over 20352.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3683, over 4075342.47 frames. ], batch size: 74, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:46:32,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.198e+02 2.379e+02 2.525e+02 3.771e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 19:46:34,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-18 19:46:56,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=808967.6666666666, ans=0.0 2024-09-18 19:47:03,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808996.0, ans=0.1 2024-09-18 19:47:08,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=808996.0, ans=0.125 2024-09-18 19:47:11,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=808996.0, ans=0.0 2024-09-18 19:47:36,189 INFO [train.py:1198] (1/2) Epoch 45, batch 4350, loss[loss=0.2162, ctc_loss=0.1419, cr_loss=0.3712, over 20903.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1428, cr_loss=0.3673, over 4073561.79 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:47:36,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809052.6666666666, ans=0.1 2024-09-18 19:48:00,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=809081.0, ans=0.0 2024-09-18 19:48:00,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=809081.0, ans=0.0 2024-09-18 19:48:42,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=809166.0, ans=0.125 2024-09-18 19:48:54,676 INFO [train.py:1198] (1/2) Epoch 45, batch 4400, loss[loss=0.1795, ctc_loss=0.1145, cr_loss=0.3248, over 20939.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3671, over 4073941.37 frames. ], batch size: 49, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:49:08,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.285e+02 2.400e+02 2.563e+02 6.544e+02, threshold=4.799e+02, percent-clipped=1.0 2024-09-18 19:49:58,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=809307.6666666666, ans=0.02 2024-09-18 19:50:06,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809307.6666666666, ans=0.1 2024-09-18 19:50:10,270 INFO [train.py:1198] (1/2) Epoch 45, batch 4450, loss[loss=0.2177, ctc_loss=0.1457, cr_loss=0.3599, over 20328.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3676, over 4081852.51 frames. ], batch size: 74, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:50:37,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=809364.3333333334, ans=0.0 2024-09-18 19:50:39,603 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:51:02,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=809421.0, ans=0.125 2024-09-18 19:51:18,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=809449.3333333334, ans=0.025 2024-09-18 19:51:23,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=12.0 2024-09-18 19:51:26,098 INFO [train.py:1198] (1/2) Epoch 45, batch 4500, loss[loss=0.2089, ctc_loss=0.1366, cr_loss=0.3615, over 21067.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3675, over 4081871.35 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:51:39,820 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.214e+02 2.367e+02 2.533e+02 3.449e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-18 19:51:53,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=809506.0, ans=0.0 2024-09-18 19:52:31,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=809591.0, ans=0.125 2024-09-18 19:52:42,003 INFO [train.py:1198] (1/2) Epoch 45, batch 4550, loss[loss=0.2168, ctc_loss=0.1422, cr_loss=0.3732, over 20863.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3677, over 4091768.64 frames. ], batch size: 65, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:53:04,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=809647.6666666666, ans=0.125 2024-09-18 19:53:22,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=809676.0, ans=0.04949747468305833 2024-09-18 19:53:25,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=809676.0, ans=0.0 2024-09-18 19:54:02,374 INFO [train.py:1198] (1/2) Epoch 45, batch 4600, loss[loss=0.2292, ctc_loss=0.1511, cr_loss=0.3904, over 20879.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3682, over 4085458.11 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:54:17,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.254e+02 2.428e+02 2.643e+02 4.087e+02, threshold=4.856e+02, percent-clipped=0.0 2024-09-18 19:54:49,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=809846.0, ans=0.2 2024-09-18 19:55:01,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=22.5 2024-09-18 19:55:17,732 INFO [train.py:1198] (1/2) Epoch 45, batch 4650, loss[loss=0.2589, ctc_loss=0.1782, cr_loss=0.4038, over 14323.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3687, over 4082228.94 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:55:19,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=809902.6666666666, ans=0.125 2024-09-18 19:55:25,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=809902.6666666666, ans=0.125 2024-09-18 19:56:21,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=810016.0, ans=0.0 2024-09-18 19:56:32,824 INFO [train.py:1198] (1/2) Epoch 45, batch 4700, loss[loss=0.2328, ctc_loss=0.1572, cr_loss=0.3781, over 20346.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.368, over 4086856.15 frames. ], batch size: 74, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:56:47,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.266e+02 2.415e+02 2.594e+02 3.766e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 19:56:51,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810072.6666666666, ans=0.1 2024-09-18 19:57:20,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-18 19:57:21,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:57:36,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=810157.6666666666, ans=0.125 2024-09-18 19:57:41,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-18 19:57:48,035 INFO [train.py:1198] (1/2) Epoch 45, batch 4750, loss[loss=0.2284, ctc_loss=0.15, cr_loss=0.3921, over 20628.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1428, cr_loss=0.3671, over 4096021.85 frames. ], batch size: 71, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:57:51,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=810186.0, ans=0.0 2024-09-18 19:58:02,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=810214.3333333334, ans=15.0 2024-09-18 19:58:12,452 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:58:33,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=810271.0, ans=15.0 2024-09-18 19:58:35,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2024-09-18 19:58:40,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=810271.0, ans=0.0 2024-09-18 19:59:06,125 INFO [train.py:1198] (1/2) Epoch 45, batch 4800, loss[loss=0.2054, ctc_loss=0.1324, cr_loss=0.3647, over 20965.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3677, over 4097998.80 frames. ], batch size: 48, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:59:07,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=810327.6666666666, ans=0.0 2024-09-18 19:59:24,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.260e+02 2.378e+02 2.546e+02 4.963e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 19:59:36,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=810356.0, ans=0.125 2024-09-18 19:59:55,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=810412.6666666666, ans=0.125 2024-09-18 19:59:56,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=810412.6666666666, ans=0.125 2024-09-18 19:59:56,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=810412.6666666666, ans=0.125 2024-09-18 20:00:00,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=810412.6666666666, ans=0.0 2024-09-18 20:00:02,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810412.6666666666, ans=0.125 2024-09-18 20:00:18,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-09-18 20:00:24,600 INFO [train.py:1198] (1/2) Epoch 45, batch 4850, loss[loss=0.234, ctc_loss=0.1548, cr_loss=0.3956, over 20983.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3677, over 4091777.89 frames. ], batch size: 64, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:00:25,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=810469.3333333334, ans=0.0 2024-09-18 20:00:43,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=810497.6666666666, ans=0.0 2024-09-18 20:01:14,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=810554.3333333334, ans=0.025 2024-09-18 20:01:23,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=12.0 2024-09-18 20:01:24,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=810582.6666666666, ans=0.2 2024-09-18 20:01:39,092 INFO [train.py:1198] (1/2) Epoch 45, batch 4900, loss[loss=0.1891, ctc_loss=0.1248, cr_loss=0.3216, over 20766.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3684, over 4091872.80 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:01:40,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=810611.0, ans=0.125 2024-09-18 20:01:53,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.234e+02 2.384e+02 2.527e+02 3.984e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-18 20:02:08,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=810667.6666666666, ans=0.035 2024-09-18 20:02:15,168 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:02:46,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810724.3333333334, ans=0.0 2024-09-18 20:02:53,890 INFO [train.py:1198] (1/2) Epoch 45, batch 4950, loss[loss=0.2302, ctc_loss=0.1525, cr_loss=0.3886, over 21004.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3686, over 4087201.14 frames. ], batch size: 61, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:03:34,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810809.3333333334, ans=0.1 2024-09-18 20:03:53,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-09-18 20:03:54,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=810866.0, ans=0.125 2024-09-18 20:04:07,720 INFO [train.py:1198] (1/2) Epoch 45, batch 5000, loss[loss=0.1884, ctc_loss=0.1252, cr_loss=0.3159, over 20299.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3695, over 4070091.82 frames. ], batch size: 45, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:04:22,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.233e+02 2.353e+02 2.590e+02 4.290e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 20:04:42,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=8.0 2024-09-18 20:04:45,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=810951.0, ans=0.125 2024-09-18 20:05:03,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=810979.3333333334, ans=0.05 2024-09-18 20:05:12,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-18 20:05:22,206 INFO [train.py:1198] (1/2) Epoch 45, batch 5050, loss[loss=0.1764, ctc_loss=0.113, cr_loss=0.3171, over 20984.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.369, over 4075275.07 frames. ], batch size: 48, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:05:39,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=811064.3333333334, ans=0.05 2024-09-18 20:05:55,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=811092.6666666666, ans=0.5 2024-09-18 20:06:31,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=811149.3333333334, ans=0.0 2024-09-18 20:06:32,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=811149.3333333334, ans=10.0 2024-09-18 20:06:37,106 INFO [train.py:1198] (1/2) Epoch 45, batch 5100, loss[loss=0.2435, ctc_loss=0.1621, cr_loss=0.4066, over 20672.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3691, over 4082835.67 frames. ], batch size: 66, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 20:06:41,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=811177.6666666666, ans=0.0 2024-09-18 20:06:53,216 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.268e+02 2.408e+02 2.549e+02 3.406e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-18 20:06:56,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=811206.0, ans=0.0 2024-09-18 20:07:07,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811234.3333333334, ans=0.1 2024-09-18 20:07:09,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.95 vs. limit=6.0 2024-09-18 20:07:24,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=811262.6666666666, ans=0.0 2024-09-18 20:07:53,669 INFO [train.py:1198] (1/2) Epoch 45, batch 5150, loss[loss=0.2068, ctc_loss=0.1338, cr_loss=0.3649, over 20887.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3688, over 4087772.80 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 20:07:53,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811319.3333333334, ans=0.1 2024-09-18 20:07:53,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=811319.3333333334, ans=10.0 2024-09-18 20:08:42,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811404.3333333334, ans=0.1 2024-09-18 20:09:01,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=811432.6666666666, ans=0.125 2024-09-18 20:09:07,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811432.6666666666, ans=0.125 2024-09-18 20:09:09,985 INFO [train.py:1198] (1/2) Epoch 45, batch 5200, loss[loss=0.2005, ctc_loss=0.1319, cr_loss=0.3428, over 20966.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3673, over 4096966.53 frames. ], batch size: 50, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:09:26,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.256e+02 2.414e+02 2.581e+02 3.470e+02, threshold=4.828e+02, percent-clipped=0.0 2024-09-18 20:10:21,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-18 20:10:24,217 INFO [train.py:1198] (1/2) Epoch 45, batch 5250, loss[loss=0.2144, ctc_loss=0.1422, cr_loss=0.3611, over 21003.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3677, over 4095151.27 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:10:49,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=811631.0, ans=0.04949747468305833 2024-09-18 20:10:53,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-18 20:10:55,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-18 20:10:56,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=811659.3333333334, ans=0.125 2024-09-18 20:11:02,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811659.3333333334, ans=0.125 2024-09-18 20:11:27,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=811716.0, ans=0.2 2024-09-18 20:11:39,064 INFO [train.py:1198] (1/2) Epoch 45, batch 5300, loss[loss=0.2569, ctc_loss=0.177, cr_loss=0.3992, over 13947.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3664, over 4095323.55 frames. ], batch size: 151, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:11:39,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-18 20:11:55,659 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.221e+02 2.355e+02 2.478e+02 3.018e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 20:11:56,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811772.6666666666, ans=0.125 2024-09-18 20:12:22,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=811829.3333333334, ans=10.0 2024-09-18 20:12:30,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=811829.3333333334, ans=0.0 2024-09-18 20:12:39,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=811857.6666666666, ans=0.125 2024-09-18 20:12:53,888 INFO [train.py:1198] (1/2) Epoch 45, batch 5350, loss[loss=0.2227, ctc_loss=0.1488, cr_loss=0.3698, over 21068.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.367, over 4089454.96 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:12:57,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811886.0, ans=0.0 2024-09-18 20:12:59,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-18 20:13:41,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-18 20:14:08,687 INFO [train.py:1198] (1/2) Epoch 45, batch 5400, loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3697, over 20830.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3659, over 4092591.79 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:14:20,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=812027.6666666666, ans=0.125 2024-09-18 20:14:24,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.291e+02 2.455e+02 2.611e+02 3.469e+02, threshold=4.910e+02, percent-clipped=0.0 2024-09-18 20:14:35,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=812056.0, ans=0.125 2024-09-18 20:14:44,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=812084.3333333334, ans=0.125 2024-09-18 20:15:22,568 INFO [train.py:1198] (1/2) Epoch 45, batch 5450, loss[loss=0.2246, ctc_loss=0.15, cr_loss=0.373, over 21069.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3681, over 4097144.01 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:15:24,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812169.3333333334, ans=0.1 2024-09-18 20:15:33,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=812169.3333333334, ans=0.125 2024-09-18 20:15:54,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=812226.0, ans=0.04949747468305833 2024-09-18 20:16:01,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=812226.0, ans=0.0 2024-09-18 20:16:08,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=812254.3333333334, ans=0.0 2024-09-18 20:16:27,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=812282.6666666666, ans=0.125 2024-09-18 20:16:28,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-18 20:16:36,189 INFO [train.py:1198] (1/2) Epoch 45, batch 5500, loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.3778, over 19966.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3694, over 4089187.07 frames. ], batch size: 80, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:16:49,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=812339.3333333334, ans=0.125 2024-09-18 20:16:52,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.240e+02 2.378e+02 2.585e+02 3.655e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 20:17:15,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-18 20:17:25,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=812396.0, ans=0.0 2024-09-18 20:17:44,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=812424.3333333334, ans=0.125 2024-09-18 20:17:47,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=812424.3333333334, ans=0.0 2024-09-18 20:17:54,274 INFO [train.py:1198] (1/2) Epoch 45, batch 5550, loss[loss=0.2154, ctc_loss=0.1411, cr_loss=0.3718, over 21047.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3701, over 4075965.90 frames. ], batch size: 62, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:17:57,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=812452.6666666666, ans=0.0 2024-09-18 20:18:40,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=812537.6666666666, ans=0.09899494936611666 2024-09-18 20:18:47,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=812537.6666666666, ans=0.125 2024-09-18 20:18:50,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812537.6666666666, ans=0.1 2024-09-18 20:18:59,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812566.0, ans=0.1 2024-09-18 20:19:02,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=812566.0, ans=0.95 2024-09-18 20:19:08,123 INFO [train.py:1198] (1/2) Epoch 45, batch 5600, loss[loss=0.2232, ctc_loss=0.1467, cr_loss=0.3825, over 20967.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3695, over 4087725.87 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:19:24,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.261e+02 2.398e+02 2.555e+02 3.909e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 20:19:56,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=15.0 2024-09-18 20:20:21,895 INFO [train.py:1198] (1/2) Epoch 45, batch 5650, loss[loss=0.2222, ctc_loss=0.1499, cr_loss=0.3614, over 20269.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3686, over 4066048.89 frames. ], batch size: 74, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:20:29,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=812736.0, ans=0.0 2024-09-18 20:20:46,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=812764.3333333334, ans=0.0 2024-09-18 20:20:53,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812792.6666666666, ans=0.1 2024-09-18 20:20:56,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812792.6666666666, ans=0.1 2024-09-18 20:20:58,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=812792.6666666666, ans=0.125 2024-09-18 20:21:20,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=812849.3333333334, ans=0.0 2024-09-18 20:21:28,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2024-09-18 20:21:36,223 INFO [train.py:1198] (1/2) Epoch 45, batch 5700, loss[loss=0.236, ctc_loss=0.1561, cr_loss=0.3993, over 20699.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3694, over 4080169.81 frames. ], batch size: 71, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:21:43,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=812877.6666666666, ans=0.125 2024-09-18 20:21:52,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.280e+02 2.407e+02 2.556e+02 3.733e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-18 20:22:06,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=812934.3333333334, ans=0.0 2024-09-18 20:22:28,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=812962.6666666666, ans=0.0 2024-09-18 20:22:43,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812991.0, ans=0.1 2024-09-18 20:22:50,893 INFO [train.py:1198] (1/2) Epoch 45, batch 5750, loss[loss=0.225, ctc_loss=0.1489, cr_loss=0.3804, over 21003.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3694, over 4099177.38 frames. ], batch size: 55, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:23:28,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=813076.0, ans=0.125 2024-09-18 20:23:38,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=813104.3333333334, ans=0.0 2024-09-18 20:23:44,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=813104.3333333334, ans=0.125 2024-09-18 20:23:56,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=813132.6666666666, ans=0.125 2024-09-18 20:24:05,229 INFO [train.py:1198] (1/2) Epoch 45, batch 5800, loss[loss=0.1866, ctc_loss=0.12, cr_loss=0.3331, over 20322.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3696, over 4093212.81 frames. ], batch size: 45, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:24:08,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=813161.0, ans=0.125 2024-09-18 20:24:21,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.265e+02 2.380e+02 2.521e+02 3.588e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 20:24:29,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-18 20:24:35,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=813217.6666666666, ans=0.0 2024-09-18 20:24:48,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813246.0, ans=0.125 2024-09-18 20:24:51,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=813246.0, ans=0.0 2024-09-18 20:25:07,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=813274.3333333334, ans=0.0 2024-09-18 20:25:08,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-18 20:25:08,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-18 20:25:11,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=813274.3333333334, ans=0.2 2024-09-18 20:25:19,079 INFO [train.py:1198] (1/2) Epoch 45, batch 5850, loss[loss=0.2079, ctc_loss=0.1373, cr_loss=0.3529, over 20894.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3676, over 4099428.55 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:25:35,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813331.0, ans=0.1 2024-09-18 20:25:38,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=813331.0, ans=0.02 2024-09-18 20:25:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=813331.0, ans=0.125 2024-09-18 20:26:10,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-18 20:26:37,317 INFO [train.py:1198] (1/2) Epoch 45, batch 5900, loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3358, over 21058.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3672, over 4097986.09 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:26:46,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=813444.3333333334, ans=0.04949747468305833 2024-09-18 20:26:53,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.229e+02 2.390e+02 2.560e+02 6.063e+02, threshold=4.781e+02, percent-clipped=1.0 2024-09-18 20:26:57,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=813472.6666666666, ans=0.0 2024-09-18 20:27:01,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=813472.6666666666, ans=0.125 2024-09-18 20:27:05,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=813472.6666666666, ans=15.0 2024-09-18 20:27:07,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=813501.0, ans=0.0 2024-09-18 20:27:27,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=813529.3333333334, ans=0.025 2024-09-18 20:27:37,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=813557.6666666666, ans=0.0 2024-09-18 20:27:52,153 INFO [train.py:1198] (1/2) Epoch 45, batch 5950, loss[loss=0.2096, ctc_loss=0.1378, cr_loss=0.3591, over 20884.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3669, over 4103668.70 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:27:53,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=813586.0, ans=0.125 2024-09-18 20:27:54,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-18 20:27:55,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813586.0, ans=0.125 2024-09-18 20:27:59,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=813586.0, ans=0.2 2024-09-18 20:28:01,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=813586.0, ans=0.2 2024-09-18 20:29:06,646 INFO [train.py:1198] (1/2) Epoch 45, batch 6000, loss[loss=0.1843, ctc_loss=0.1212, cr_loss=0.3155, over 20961.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3682, over 4099815.64 frames. ], batch size: 49, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:29:06,646 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 20:29:26,292 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.582e-14, over 944034.00 frames. 2024-09-18 20:29:26,292 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 20:29:42,660 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.282e+02 2.411e+02 2.557e+02 5.115e+02, threshold=4.821e+02, percent-clipped=1.0 2024-09-18 20:29:44,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813756.0, ans=0.1 2024-09-18 20:30:34,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=813841.0, ans=0.125 2024-09-18 20:30:41,113 INFO [train.py:1198] (1/2) Epoch 45, batch 6050, loss[loss=0.2412, ctc_loss=0.1599, cr_loss=0.4063, over 20075.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.369, over 4091795.56 frames. ], batch size: 80, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:30:41,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-18 20:30:44,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=813869.3333333334, ans=0.0 2024-09-18 20:31:04,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=813897.6666666666, ans=0.05 2024-09-18 20:31:13,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=813926.0, ans=0.0 2024-09-18 20:31:50,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=813982.6666666666, ans=0.2 2024-09-18 20:31:54,547 INFO [train.py:1198] (1/2) Epoch 45, batch 6100, loss[loss=0.2111, ctc_loss=0.1377, cr_loss=0.3668, over 21058.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.369, over 4093877.19 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:31:54,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=814011.0, ans=0.0 2024-09-18 20:31:54,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=814011.0, ans=0.025 2024-09-18 20:32:10,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.295e+02 2.406e+02 2.594e+02 3.498e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-18 20:33:08,157 INFO [train.py:1198] (1/2) Epoch 45, batch 6150, loss[loss=0.2452, ctc_loss=0.17, cr_loss=0.3762, over 13933.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3698, over 4078285.78 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:33:17,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814152.6666666666, ans=0.125 2024-09-18 20:33:18,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814152.6666666666, ans=0.1 2024-09-18 20:33:24,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=814181.0, ans=0.125 2024-09-18 20:33:39,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-18 20:34:23,561 INFO [train.py:1198] (1/2) Epoch 45, batch 6200, loss[loss=0.2104, ctc_loss=0.1395, cr_loss=0.3545, over 21058.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3708, over 4055780.22 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:34:24,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2024-09-18 20:34:40,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.309e+02 2.445e+02 2.676e+02 3.869e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-18 20:34:41,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=814322.6666666666, ans=0.2 2024-09-18 20:34:43,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=814322.6666666666, ans=0.125 2024-09-18 20:35:09,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=814379.3333333334, ans=0.2 2024-09-18 20:35:29,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=814407.6666666666, ans=0.125 2024-09-18 20:35:38,124 INFO [train.py:1198] (1/2) Epoch 45, batch 6250, loss[loss=0.233, ctc_loss=0.1535, cr_loss=0.3975, over 21049.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.371, over 4034038.65 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:35:40,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=814436.0, ans=0.2 2024-09-18 20:36:11,048 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:36:24,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=814521.0, ans=0.0 2024-09-18 20:36:51,896 INFO [train.py:1198] (1/2) Epoch 45, batch 6300, loss[loss=0.2492, ctc_loss=0.1705, cr_loss=0.3939, over 14156.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3704, over 4002745.60 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:37:08,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.338e+02 2.474e+02 2.662e+02 4.819e+02, threshold=4.947e+02, percent-clipped=0.0 2024-09-18 20:37:11,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-18 20:37:12,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814606.0, ans=0.1 2024-09-18 20:37:12,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.84 vs. limit=10.0 2024-09-18 20:37:27,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=814634.3333333334, ans=0.125 2024-09-18 20:37:35,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=814634.3333333334, ans=0.125 2024-09-18 20:37:39,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.61 vs. limit=15.0 2024-09-18 20:38:07,090 INFO [train.py:1198] (1/2) Epoch 45, batch 6350, loss[loss=0.2415, ctc_loss=0.1665, cr_loss=0.3753, over 14407.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3716, over 3956148.12 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:38:12,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=814719.3333333334, ans=0.2 2024-09-18 20:38:12,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=814719.3333333334, ans=0.0 2024-09-18 20:38:18,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814719.3333333334, ans=0.1 2024-09-18 20:38:25,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=814747.6666666666, ans=0.125 2024-09-18 20:38:55,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=814804.3333333334, ans=10.0 2024-09-18 20:38:56,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=814804.3333333334, ans=0.125 2024-09-18 20:39:51,726 INFO [train.py:1198] (1/2) Epoch 46, batch 0, loss[loss=0.2336, ctc_loss=0.1536, cr_loss=0.3999, over 20919.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1536, cr_loss=0.3999, over 20919.00 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:39:51,726 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 20:40:09,787 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.03873, ctc_loss=0.03873, cr_loss=1.576e-14, over 944034.00 frames. 2024-09-18 20:40:09,788 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 20:40:28,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814863.8333333334, ans=0.1 2024-09-18 20:40:31,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814863.8333333334, ans=0.1 2024-09-18 20:40:33,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-18 20:40:34,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=814863.8333333334, ans=0.125 2024-09-18 20:40:44,899 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.444e+02 2.730e+02 2.944e+02 3.654e+02, threshold=5.460e+02, percent-clipped=0.0 2024-09-18 20:40:55,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=814892.1666666666, ans=0.0 2024-09-18 20:41:03,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=814920.5, ans=0.5 2024-09-18 20:41:15,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-18 20:41:30,193 INFO [train.py:1198] (1/2) Epoch 46, batch 50, loss[loss=0.2383, ctc_loss=0.163, cr_loss=0.3765, over 19421.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1451, cr_loss=0.3686, over 917796.94 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:42:21,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=22.5 2024-09-18 20:42:30,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-18 20:42:35,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=815090.5, ans=0.2 2024-09-18 20:42:45,876 INFO [train.py:1198] (1/2) Epoch 46, batch 100, loss[loss=0.2299, ctc_loss=0.1515, cr_loss=0.3918, over 20942.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3674, over 1611194.05 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:42:55,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-18 20:42:59,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=815147.1666666666, ans=0.125 2024-09-18 20:43:02,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=815147.1666666666, ans=0.025 2024-09-18 20:43:16,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.193e+02 2.400e+02 2.544e+02 3.395e+02, threshold=4.800e+02, percent-clipped=0.0 2024-09-18 20:43:20,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=815175.5, ans=0.0 2024-09-18 20:43:45,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=815232.1666666666, ans=0.0 2024-09-18 20:44:01,122 INFO [train.py:1198] (1/2) Epoch 46, batch 150, loss[loss=0.2012, ctc_loss=0.1313, cr_loss=0.3496, over 20871.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1414, cr_loss=0.3646, over 2161732.53 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:44:07,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815260.5, ans=0.1 2024-09-18 20:44:18,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=815288.8333333334, ans=0.0 2024-09-18 20:44:32,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-18 20:44:37,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=815317.1666666666, ans=0.125 2024-09-18 20:45:16,395 INFO [train.py:1198] (1/2) Epoch 46, batch 200, loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3582, over 20253.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1421, cr_loss=0.3663, over 2591338.45 frames. ], batch size: 74, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:45:16,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=815402.1666666666, ans=0.125 2024-09-18 20:45:28,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=815402.1666666666, ans=0.09899494936611666 2024-09-18 20:45:46,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.198e+02 2.358e+02 2.505e+02 4.198e+02, threshold=4.716e+02, percent-clipped=0.0 2024-09-18 20:45:54,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=815458.8333333334, ans=0.0 2024-09-18 20:46:36,432 INFO [train.py:1198] (1/2) Epoch 46, batch 250, loss[loss=0.2572, ctc_loss=0.169, cr_loss=0.4408, over 20979.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.3658, over 2935355.95 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:46:38,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=815543.8333333334, ans=0.125 2024-09-18 20:46:57,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=815572.1666666666, ans=0.025 2024-09-18 20:47:12,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2024-09-18 20:47:18,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=815600.5, ans=0.125 2024-09-18 20:47:18,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-18 20:47:26,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815628.8333333334, ans=0.1 2024-09-18 20:47:51,346 INFO [train.py:1198] (1/2) Epoch 46, batch 300, loss[loss=0.228, ctc_loss=0.1505, cr_loss=0.3877, over 20686.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1407, cr_loss=0.3645, over 3200955.84 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:47:51,820 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:47:56,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-18 20:47:59,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=815685.5, ans=0.0 2024-09-18 20:48:21,445 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.253e+02 2.352e+02 2.520e+02 3.176e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 20:48:25,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 20:48:29,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815742.1666666666, ans=0.0 2024-09-18 20:48:38,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815770.5, ans=0.0 2024-09-18 20:48:55,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=815798.8333333334, ans=0.0 2024-09-18 20:49:02,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=815798.8333333334, ans=0.0 2024-09-18 20:49:06,767 INFO [train.py:1198] (1/2) Epoch 46, batch 350, loss[loss=0.2569, ctc_loss=0.1734, cr_loss=0.4175, over 18174.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3652, over 3395416.35 frames. ], batch size: 108, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:49:09,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815827.1666666666, ans=0.0 2024-09-18 20:49:14,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=815827.1666666666, ans=0.125 2024-09-18 20:49:50,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815912.1666666666, ans=0.125 2024-09-18 20:49:51,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-18 20:50:05,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=815940.5, ans=0.0 2024-09-18 20:50:13,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=15.0 2024-09-18 20:50:22,118 INFO [train.py:1198] (1/2) Epoch 46, batch 400, loss[loss=0.2175, ctc_loss=0.1426, cr_loss=0.3749, over 20650.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1398, cr_loss=0.3641, over 3564117.53 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:50:30,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=815968.8333333334, ans=0.2 2024-09-18 20:50:41,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815997.1666666666, ans=0.0 2024-09-18 20:50:42,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=22.5 2024-09-18 20:50:53,616 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.215e+02 2.369e+02 2.519e+02 3.261e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 20:50:55,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816025.5, ans=0.125 2024-09-18 20:51:31,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=816082.1666666666, ans=0.125 2024-09-18 20:51:39,043 INFO [train.py:1198] (1/2) Epoch 46, batch 450, loss[loss=0.233, ctc_loss=0.1591, cr_loss=0.3697, over 21020.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1404, cr_loss=0.3648, over 3690313.02 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:51:52,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816110.5, ans=0.125 2024-09-18 20:52:02,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816138.8333333334, ans=0.1 2024-09-18 20:52:33,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-09-18 20:52:41,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=816195.5, ans=0.125 2024-09-18 20:52:59,646 INFO [train.py:1198] (1/2) Epoch 46, batch 500, loss[loss=0.2163, ctc_loss=0.1405, cr_loss=0.3791, over 20837.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.365, over 3789548.76 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:53:29,693 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.255e+02 2.347e+02 2.535e+02 3.411e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 20:53:40,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=816308.8333333334, ans=0.0 2024-09-18 20:54:14,664 INFO [train.py:1198] (1/2) Epoch 46, batch 550, loss[loss=0.2229, ctc_loss=0.1478, cr_loss=0.3755, over 20973.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3666, over 3849540.53 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:54:32,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-18 20:54:43,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=816450.5, ans=0.125 2024-09-18 20:54:44,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-18 20:54:49,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=816450.5, ans=0.2 2024-09-18 20:55:02,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=816478.8333333334, ans=0.0 2024-09-18 20:55:24,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-18 20:55:29,698 INFO [train.py:1198] (1/2) Epoch 46, batch 600, loss[loss=0.2422, ctc_loss=0.16, cr_loss=0.4113, over 20974.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3684, over 3913767.46 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:55:59,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.252e+02 2.397e+02 2.572e+02 4.425e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 20:56:19,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=816620.5, ans=0.5 2024-09-18 20:56:32,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816648.8333333334, ans=0.125 2024-09-18 20:56:42,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=816648.8333333334, ans=0.125 2024-09-18 20:56:44,595 INFO [train.py:1198] (1/2) Epoch 46, batch 650, loss[loss=0.2372, ctc_loss=0.1582, cr_loss=0.3951, over 20844.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.369, over 3936314.85 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:57:01,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816705.5, ans=0.1 2024-09-18 20:57:23,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=816733.8333333334, ans=0.025 2024-09-18 20:57:34,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=816762.1666666666, ans=0.09899494936611666 2024-09-18 20:57:36,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-18 20:58:05,800 INFO [train.py:1198] (1/2) Epoch 46, batch 700, loss[loss=0.2051, ctc_loss=0.1354, cr_loss=0.3487, over 20866.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.369, over 3985078.69 frames. ], batch size: 57, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 20:58:07,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=816818.8333333334, ans=0.025 2024-09-18 20:58:13,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=816818.8333333334, ans=0.0 2024-09-18 20:58:32,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=816847.1666666666, ans=0.125 2024-09-18 20:58:35,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.241e+02 2.365e+02 2.538e+02 3.589e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 20:59:20,789 INFO [train.py:1198] (1/2) Epoch 46, batch 750, loss[loss=0.1724, ctc_loss=0.1109, cr_loss=0.3073, over 20943.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.369, over 4014605.09 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:00:35,782 INFO [train.py:1198] (1/2) Epoch 46, batch 800, loss[loss=0.1971, ctc_loss=0.1277, cr_loss=0.3472, over 20964.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3694, over 4041668.40 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:00:36,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=817102.1666666666, ans=0.0 2024-09-18 21:00:52,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817130.5, ans=0.125 2024-09-18 21:00:56,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=817130.5, ans=0.2 2024-09-18 21:00:57,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=817130.5, ans=0.2 2024-09-18 21:01:02,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=817130.5, ans=0.07 2024-09-18 21:01:05,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.311e+02 2.454e+02 2.599e+02 3.298e+02, threshold=4.908e+02, percent-clipped=0.0 2024-09-18 21:01:27,236 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:01:47,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=817215.5, ans=0.0 2024-09-18 21:01:50,354 INFO [train.py:1198] (1/2) Epoch 46, batch 850, loss[loss=0.2535, ctc_loss=0.1745, cr_loss=0.3948, over 13971.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1443, cr_loss=0.3712, over 4038347.81 frames. ], batch size: 149, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:01:50,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=817243.8333333334, ans=0.0 2024-09-18 21:01:56,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=817243.8333333334, ans=0.125 2024-09-18 21:01:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=817243.8333333334, ans=0.125 2024-09-18 21:02:04,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=817272.1666666666, ans=0.5 2024-09-18 21:02:08,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=817272.1666666666, ans=0.125 2024-09-18 21:02:12,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=817272.1666666666, ans=0.0 2024-09-18 21:02:13,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=817272.1666666666, ans=0.125 2024-09-18 21:02:16,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-18 21:02:17,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=12.0 2024-09-18 21:02:20,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817300.5, ans=0.0 2024-09-18 21:03:08,561 INFO [train.py:1198] (1/2) Epoch 46, batch 900, loss[loss=0.2048, ctc_loss=0.1359, cr_loss=0.3443, over 21054.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3701, over 4053518.70 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:03:29,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=817413.8333333334, ans=0.04949747468305833 2024-09-18 21:03:30,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-18 21:03:41,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.294e+02 2.436e+02 2.536e+02 3.738e+02, threshold=4.872e+02, percent-clipped=0.0 2024-09-18 21:04:04,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817470.5, ans=0.1 2024-09-18 21:04:05,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=817470.5, ans=0.2 2024-09-18 21:04:17,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=817498.8333333334, ans=0.125 2024-09-18 21:04:25,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=817527.1666666666, ans=0.0 2024-09-18 21:04:26,339 INFO [train.py:1198] (1/2) Epoch 46, batch 950, loss[loss=0.2269, ctc_loss=0.149, cr_loss=0.3894, over 21054.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1434, cr_loss=0.37, over 4061856.73 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:04:32,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=817527.1666666666, ans=0.025 2024-09-18 21:04:53,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=817555.5, ans=0.0 2024-09-18 21:04:57,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=817583.8333333334, ans=0.125 2024-09-18 21:05:30,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-09-18 21:05:41,723 INFO [train.py:1198] (1/2) Epoch 46, batch 1000, loss[loss=0.2122, ctc_loss=0.1386, cr_loss=0.3683, over 21065.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1439, cr_loss=0.3708, over 4066387.43 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:06:11,914 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.291e+02 2.408e+02 2.588e+02 4.290e+02, threshold=4.816e+02, percent-clipped=0.0 2024-09-18 21:06:18,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817725.5, ans=0.0 2024-09-18 21:06:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=817725.5, ans=0.125 2024-09-18 21:06:36,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=817753.8333333334, ans=0.0 2024-09-18 21:06:36,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=817753.8333333334, ans=0.2 2024-09-18 21:06:48,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-18 21:06:56,990 INFO [train.py:1198] (1/2) Epoch 46, batch 1050, loss[loss=0.2316, ctc_loss=0.1526, cr_loss=0.3953, over 20668.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1439, cr_loss=0.3709, over 4074072.09 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:07:04,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=817810.5, ans=0.125 2024-09-18 21:07:34,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=817867.1666666666, ans=0.05 2024-09-18 21:07:47,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-18 21:07:52,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817895.5, ans=0.125 2024-09-18 21:08:11,877 INFO [train.py:1198] (1/2) Epoch 46, batch 1100, loss[loss=0.232, ctc_loss=0.1566, cr_loss=0.377, over 18498.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1443, cr_loss=0.372, over 4072615.13 frames. ], batch size: 108, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:08:23,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=817952.1666666666, ans=0.0 2024-09-18 21:08:31,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817980.5, ans=0.0 2024-09-18 21:08:44,931 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.310e+02 2.456e+02 2.641e+02 3.685e+02, threshold=4.912e+02, percent-clipped=0.0 2024-09-18 21:08:46,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=818008.8333333334, ans=0.125 2024-09-18 21:09:01,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=12.0 2024-09-18 21:09:28,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=818065.5, ans=0.125 2024-09-18 21:09:33,030 INFO [train.py:1198] (1/2) Epoch 46, batch 1150, loss[loss=0.2297, ctc_loss=0.1506, cr_loss=0.3955, over 20970.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.144, cr_loss=0.3715, over 4076071.60 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:09:33,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=818093.8333333334, ans=0.04949747468305833 2024-09-18 21:09:49,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818122.1666666666, ans=0.1 2024-09-18 21:10:25,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=818178.8333333334, ans=0.125 2024-09-18 21:10:29,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=818178.8333333334, ans=0.125 2024-09-18 21:10:48,794 INFO [train.py:1198] (1/2) Epoch 46, batch 1200, loss[loss=0.2112, ctc_loss=0.1384, cr_loss=0.364, over 20834.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1441, cr_loss=0.3716, over 4083434.62 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:11:00,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 21:11:20,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.264e+02 2.384e+02 2.544e+02 3.200e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-18 21:11:32,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-18 21:11:39,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=818320.5, ans=0.5 2024-09-18 21:11:44,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=818320.5, ans=0.125 2024-09-18 21:11:47,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=818320.5, ans=0.2 2024-09-18 21:12:04,764 INFO [train.py:1198] (1/2) Epoch 46, batch 1250, loss[loss=0.2029, ctc_loss=0.1304, cr_loss=0.3625, over 20942.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3691, over 4091872.36 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:12:08,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=818377.1666666666, ans=0.0 2024-09-18 21:12:12,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=818377.1666666666, ans=0.125 2024-09-18 21:12:28,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=818405.5, ans=0.125 2024-09-18 21:12:42,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=818433.8333333334, ans=0.125 2024-09-18 21:12:51,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-18 21:13:01,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=818462.1666666666, ans=0.0 2024-09-18 21:13:19,706 INFO [train.py:1198] (1/2) Epoch 46, batch 1300, loss[loss=0.1879, ctc_loss=0.1208, cr_loss=0.3355, over 20954.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3695, over 4081978.37 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:13:46,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=818547.1666666666, ans=0.125 2024-09-18 21:13:47,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=818547.1666666666, ans=0.0 2024-09-18 21:13:51,738 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.225e+02 2.348e+02 2.571e+02 4.045e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 21:13:51,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=818575.5, ans=0.125 2024-09-18 21:14:35,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818632.1666666666, ans=0.1 2024-09-18 21:14:37,967 INFO [train.py:1198] (1/2) Epoch 46, batch 1350, loss[loss=0.2128, ctc_loss=0.1409, cr_loss=0.3595, over 20639.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3689, over 4083152.75 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:15:07,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818688.8333333334, ans=0.1 2024-09-18 21:15:14,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=818717.1666666666, ans=0.125 2024-09-18 21:15:51,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-18 21:15:56,772 INFO [train.py:1198] (1/2) Epoch 46, batch 1400, loss[loss=0.2584, ctc_loss=0.1777, cr_loss=0.4035, over 18221.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 4079679.35 frames. ], batch size: 108, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:16:01,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=818802.1666666666, ans=0.125 2024-09-18 21:16:10,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=12.0 2024-09-18 21:16:18,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-18 21:16:30,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.209e+02 2.340e+02 2.531e+02 4.612e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 21:17:08,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=818915.5, ans=0.125 2024-09-18 21:17:13,039 INFO [train.py:1198] (1/2) Epoch 46, batch 1450, loss[loss=0.2256, ctc_loss=0.15, cr_loss=0.378, over 20861.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3691, over 4082255.81 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:17:28,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=818972.1666666666, ans=0.125 2024-09-18 21:18:09,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=819028.8333333334, ans=0.125 2024-09-18 21:18:12,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819057.1666666666, ans=0.125 2024-09-18 21:18:13,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=819057.1666666666, ans=0.0 2024-09-18 21:18:25,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=819057.1666666666, ans=0.0 2024-09-18 21:18:28,585 INFO [train.py:1198] (1/2) Epoch 46, batch 1500, loss[loss=0.2014, ctc_loss=0.1329, cr_loss=0.3426, over 20789.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3685, over 4075624.89 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:18:46,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819113.8333333334, ans=0.1 2024-09-18 21:19:01,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.253e+02 2.389e+02 2.516e+02 6.989e+02, threshold=4.778e+02, percent-clipped=1.0 2024-09-18 21:19:43,778 INFO [train.py:1198] (1/2) Epoch 46, batch 1550, loss[loss=0.1926, ctc_loss=0.1216, cr_loss=0.3552, over 20950.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3684, over 4071775.30 frames. ], batch size: 48, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:21:04,466 INFO [train.py:1198] (1/2) Epoch 46, batch 1600, loss[loss=0.2728, ctc_loss=0.1843, cr_loss=0.4422, over 20959.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3668, over 4088368.45 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:21:29,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=819397.1666666666, ans=15.0 2024-09-18 21:21:37,830 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.223e+02 2.381e+02 2.505e+02 3.293e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 21:21:51,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=819453.8333333334, ans=0.0 2024-09-18 21:22:20,561 INFO [train.py:1198] (1/2) Epoch 46, batch 1650, loss[loss=0.1778, ctc_loss=0.1163, cr_loss=0.3075, over 19911.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3669, over 4072273.44 frames. ], batch size: 44, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:23:06,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819595.5, ans=0.125 2024-09-18 21:23:12,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-18 21:23:19,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=819623.8333333334, ans=0.125 2024-09-18 21:23:35,985 INFO [train.py:1198] (1/2) Epoch 46, batch 1700, loss[loss=0.2677, ctc_loss=0.1796, cr_loss=0.4403, over 21055.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3683, over 4071983.20 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:24:03,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=819680.5, ans=0.125 2024-09-18 21:24:10,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.242e+02 2.353e+02 2.530e+02 3.299e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 21:24:51,125 INFO [train.py:1198] (1/2) Epoch 46, batch 1750, loss[loss=0.2476, ctc_loss=0.1642, cr_loss=0.417, over 20959.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3684, over 4081559.47 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:25:05,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-18 21:25:15,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=819822.1666666666, ans=0.0 2024-09-18 21:25:48,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819878.8333333334, ans=0.1 2024-09-18 21:25:51,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=819878.8333333334, ans=0.125 2024-09-18 21:25:56,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=819907.1666666666, ans=0.2 2024-09-18 21:26:02,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-18 21:26:04,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-18 21:26:09,376 INFO [train.py:1198] (1/2) Epoch 46, batch 1800, loss[loss=0.232, ctc_loss=0.1541, cr_loss=0.3895, over 20977.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3682, over 4084179.42 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:26:21,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819935.5, ans=0.1 2024-09-18 21:26:47,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.211e+02 2.352e+02 2.538e+02 3.477e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 21:27:12,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2024-09-18 21:27:28,100 INFO [train.py:1198] (1/2) Epoch 46, batch 1850, loss[loss=0.2432, ctc_loss=0.1666, cr_loss=0.3831, over 14709.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3688, over 4075447.60 frames. ], batch size: 149, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:28:43,474 INFO [train.py:1198] (1/2) Epoch 46, batch 1900, loss[loss=0.2239, ctc_loss=0.1467, cr_loss=0.386, over 20888.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3693, over 4090952.56 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:28:46,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=820218.8333333334, ans=0.125 2024-09-18 21:29:07,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-09-18 21:29:11,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2024-09-18 21:29:18,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.229e+02 2.377e+02 2.550e+02 3.271e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 21:29:18,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=820275.5, ans=0.025 2024-09-18 21:29:58,786 INFO [train.py:1198] (1/2) Epoch 46, batch 1950, loss[loss=0.2332, ctc_loss=0.1561, cr_loss=0.3857, over 20963.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3689, over 4101414.80 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:29:59,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=820360.5, ans=0.07 2024-09-18 21:30:08,046 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:30:16,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2024-09-18 21:30:43,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 21:30:48,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=820445.5, ans=0.0 2024-09-18 21:30:52,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-18 21:31:10,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-18 21:31:14,216 INFO [train.py:1198] (1/2) Epoch 46, batch 2000, loss[loss=0.2056, ctc_loss=0.1363, cr_loss=0.3468, over 20786.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3685, over 4092844.45 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:31:14,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=820502.1666666666, ans=0.125 2024-09-18 21:31:51,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.259e+02 2.419e+02 2.580e+02 7.892e+02, threshold=4.838e+02, percent-clipped=2.0 2024-09-18 21:31:52,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=820558.8333333334, ans=0.07 2024-09-18 21:32:03,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820587.1666666666, ans=0.1 2024-09-18 21:32:13,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=820587.1666666666, ans=0.0 2024-09-18 21:32:26,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=820615.5, ans=0.125 2024-09-18 21:32:28,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.07 vs. limit=6.0 2024-09-18 21:32:35,652 INFO [train.py:1198] (1/2) Epoch 46, batch 2050, loss[loss=0.2452, ctc_loss=0.1678, cr_loss=0.3868, over 14387.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3674, over 4095240.55 frames. ], batch size: 149, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:32:42,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=820643.8333333334, ans=0.2 2024-09-18 21:32:46,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=820643.8333333334, ans=0.2 2024-09-18 21:32:54,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=820672.1666666666, ans=0.0 2024-09-18 21:33:38,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-18 21:33:51,673 INFO [train.py:1198] (1/2) Epoch 46, batch 2100, loss[loss=0.2358, ctc_loss=0.157, cr_loss=0.3941, over 20356.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3676, over 4088204.80 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:34:02,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=820785.5, ans=0.125 2024-09-18 21:34:26,603 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.267e+02 2.383e+02 2.565e+02 5.256e+02, threshold=4.767e+02, percent-clipped=1.0 2024-09-18 21:34:34,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=820842.1666666666, ans=0.125 2024-09-18 21:34:34,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=820842.1666666666, ans=0.2 2024-09-18 21:35:07,450 INFO [train.py:1198] (1/2) Epoch 46, batch 2150, loss[loss=0.2138, ctc_loss=0.1385, cr_loss=0.3769, over 21001.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1426, cr_loss=0.3698, over 4094029.64 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:35:30,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-18 21:35:31,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820955.5, ans=0.1 2024-09-18 21:35:48,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-18 21:36:19,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=821040.5, ans=0.05 2024-09-18 21:36:22,200 INFO [train.py:1198] (1/2) Epoch 46, batch 2200, loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3703, over 20846.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1428, cr_loss=0.3698, over 4101953.98 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:36:31,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=821068.8333333334, ans=0.125 2024-09-18 21:36:45,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=821097.1666666666, ans=0.0 2024-09-18 21:36:57,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.236e+02 2.409e+02 2.555e+02 3.681e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 21:36:57,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=821125.5, ans=0.0 2024-09-18 21:37:10,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=821153.8333333334, ans=0.125 2024-09-18 21:37:41,379 INFO [train.py:1198] (1/2) Epoch 46, batch 2250, loss[loss=0.2043, ctc_loss=0.1341, cr_loss=0.3508, over 21007.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3701, over 4102812.24 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:38:07,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-18 21:38:26,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821267.1666666666, ans=0.1 2024-09-18 21:38:39,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=821295.5, ans=0.125 2024-09-18 21:38:45,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=821323.8333333334, ans=0.0 2024-09-18 21:38:59,026 INFO [train.py:1198] (1/2) Epoch 46, batch 2300, loss[loss=0.2064, ctc_loss=0.1336, cr_loss=0.364, over 20971.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1438, cr_loss=0.3718, over 4102604.06 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:39:16,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=821380.5, ans=0.09899494936611666 2024-09-18 21:39:19,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=821380.5, ans=0.025 2024-09-18 21:39:33,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.231e+02 2.396e+02 2.524e+02 3.614e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-18 21:39:38,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=821408.8333333334, ans=0.025 2024-09-18 21:40:04,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=821465.5, ans=0.2 2024-09-18 21:40:14,314 INFO [train.py:1198] (1/2) Epoch 46, batch 2350, loss[loss=0.2409, ctc_loss=0.1598, cr_loss=0.4054, over 19285.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1433, cr_loss=0.3706, over 4099526.06 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:40:23,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=821493.8333333334, ans=0.0 2024-09-18 21:40:37,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=821522.1666666666, ans=0.125 2024-09-18 21:41:09,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=821578.8333333334, ans=0.09899494936611666 2024-09-18 21:41:30,302 INFO [train.py:1198] (1/2) Epoch 46, batch 2400, loss[loss=0.1857, ctc_loss=0.1207, cr_loss=0.3248, over 19842.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3676, over 4102040.06 frames. ], batch size: 44, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:41:32,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=821635.5, ans=0.125 2024-09-18 21:42:04,803 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.233e+02 2.365e+02 2.515e+02 3.615e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 21:42:45,486 INFO [train.py:1198] (1/2) Epoch 46, batch 2450, loss[loss=0.2051, ctc_loss=0.1357, cr_loss=0.3468, over 21008.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3687, over 4110700.11 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:42:54,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=821777.1666666666, ans=0.2 2024-09-18 21:43:32,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-18 21:44:06,039 INFO [train.py:1198] (1/2) Epoch 46, batch 2500, loss[loss=0.2226, ctc_loss=0.1473, cr_loss=0.3765, over 20943.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3689, over 4099757.65 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:44:15,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821918.8333333334, ans=0.1 2024-09-18 21:44:18,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.55 vs. limit=10.0 2024-09-18 21:44:33,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=821947.1666666666, ans=0.05 2024-09-18 21:44:34,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821975.5, ans=0.1 2024-09-18 21:44:42,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.281e+02 2.378e+02 2.521e+02 3.111e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-18 21:44:46,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-18 21:44:47,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-18 21:45:21,230 INFO [train.py:1198] (1/2) Epoch 46, batch 2550, loss[loss=0.1781, ctc_loss=0.1151, cr_loss=0.3151, over 20350.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3682, over 4102928.90 frames. ], batch size: 45, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:45:51,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-18 21:46:07,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=822145.5, ans=0.125 2024-09-18 21:46:35,782 INFO [train.py:1198] (1/2) Epoch 46, batch 2600, loss[loss=0.2684, ctc_loss=0.1858, cr_loss=0.4127, over 14354.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3685, over 4096124.13 frames. ], batch size: 149, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:46:36,181 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:46:48,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=822202.1666666666, ans=0.125 2024-09-18 21:46:51,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=822230.5, ans=0.0 2024-09-18 21:46:52,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=822230.5, ans=0.0 2024-09-18 21:47:01,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=822230.5, ans=0.05 2024-09-18 21:47:12,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.268e+02 2.382e+02 2.565e+02 3.672e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-18 21:47:48,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=822315.5, ans=0.2 2024-09-18 21:47:51,381 INFO [train.py:1198] (1/2) Epoch 46, batch 2650, loss[loss=0.2178, ctc_loss=0.1415, cr_loss=0.3816, over 21066.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3675, over 4101118.67 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:48:49,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=822428.8333333334, ans=0.0 2024-09-18 21:48:57,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:49:08,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=822485.5, ans=0.2 2024-09-18 21:49:09,692 INFO [train.py:1198] (1/2) Epoch 46, batch 2700, loss[loss=0.2191, ctc_loss=0.1462, cr_loss=0.3648, over 20970.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3678, over 4101493.79 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:49:29,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-18 21:49:48,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.250e+02 2.391e+02 2.532e+02 3.748e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-18 21:50:20,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-18 21:50:28,731 INFO [train.py:1198] (1/2) Epoch 46, batch 2750, loss[loss=0.2074, ctc_loss=0.1334, cr_loss=0.3702, over 20973.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3688, over 4096911.47 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:50:29,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=822627.1666666666, ans=12.0 2024-09-18 21:50:35,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=822627.1666666666, ans=0.2 2024-09-18 21:50:38,285 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:50:40,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-09-18 21:51:17,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=12.0 2024-09-18 21:51:44,399 INFO [train.py:1198] (1/2) Epoch 46, batch 2800, loss[loss=0.2358, ctc_loss=0.1547, cr_loss=0.4054, over 20824.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.143, cr_loss=0.3696, over 4092727.62 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:51:49,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=822768.8333333334, ans=0.125 2024-09-18 21:51:49,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=822768.8333333334, ans=0.0 2024-09-18 21:52:07,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=822797.1666666666, ans=0.0 2024-09-18 21:52:20,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.317e+02 2.449e+02 2.604e+02 5.763e+02, threshold=4.898e+02, percent-clipped=1.0 2024-09-18 21:52:36,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822853.8333333334, ans=0.125 2024-09-18 21:52:51,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822882.1666666666, ans=0.0 2024-09-18 21:52:51,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=822882.1666666666, ans=0.125 2024-09-18 21:52:59,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822910.5, ans=0.1 2024-09-18 21:53:00,347 INFO [train.py:1198] (1/2) Epoch 46, batch 2850, loss[loss=0.2311, ctc_loss=0.1547, cr_loss=0.3821, over 20973.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3696, over 4097495.88 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:53:03,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822910.5, ans=0.125 2024-09-18 21:53:05,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=822910.5, ans=0.125 2024-09-18 21:53:21,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=822938.8333333334, ans=0.125 2024-09-18 21:53:39,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=822967.1666666666, ans=0.125 2024-09-18 21:54:14,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=823052.1666666666, ans=0.0 2024-09-18 21:54:15,772 INFO [train.py:1198] (1/2) Epoch 46, batch 2900, loss[loss=0.2242, ctc_loss=0.1474, cr_loss=0.3843, over 20888.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3691, over 4099123.02 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:54:22,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=823052.1666666666, ans=0.0 2024-09-18 21:54:22,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823052.1666666666, ans=0.1 2024-09-18 21:54:44,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823080.5, ans=0.125 2024-09-18 21:54:52,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-18 21:54:55,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.266e+02 2.392e+02 2.544e+02 4.341e+02, threshold=4.785e+02, percent-clipped=0.0 2024-09-18 21:55:34,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=823165.5, ans=0.025 2024-09-18 21:55:37,303 INFO [train.py:1198] (1/2) Epoch 46, batch 2950, loss[loss=0.1857, ctc_loss=0.1225, cr_loss=0.3161, over 20972.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3685, over 4090723.33 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:56:06,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=823250.5, ans=0.2 2024-09-18 21:56:32,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=823278.8333333334, ans=0.2 2024-09-18 21:56:43,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=22.5 2024-09-18 21:56:48,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823307.1666666666, ans=0.1 2024-09-18 21:56:52,763 INFO [train.py:1198] (1/2) Epoch 46, batch 3000, loss[loss=0.2113, ctc_loss=0.1377, cr_loss=0.3677, over 20982.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.143, cr_loss=0.3694, over 4080437.13 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:56:52,763 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 21:57:02,830 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3882, 5.5772, 4.9830, 5.2274], device='cuda:1') 2024-09-18 21:57:11,208 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.03911, ctc_loss=0.03911, cr_loss=1.552e-14, over 944034.00 frames. 2024-09-18 21:57:11,208 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 21:57:26,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=823363.8333333334, ans=0.2 2024-09-18 21:57:28,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=823363.8333333334, ans=0.125 2024-09-18 21:57:47,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.264e+02 2.377e+02 2.507e+02 9.092e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 21:58:02,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-18 21:58:18,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-18 21:58:26,769 INFO [train.py:1198] (1/2) Epoch 46, batch 3050, loss[loss=0.181, ctc_loss=0.1169, cr_loss=0.3202, over 20963.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3696, over 4075875.14 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:58:41,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=823505.5, ans=0.125 2024-09-18 21:58:44,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=823505.5, ans=0.125 2024-09-18 21:59:13,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-18 21:59:17,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=823562.1666666666, ans=0.0 2024-09-18 21:59:42,528 INFO [train.py:1198] (1/2) Epoch 46, batch 3100, loss[loss=0.1957, ctc_loss=0.1271, cr_loss=0.343, over 20945.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3694, over 4087550.94 frames. ], batch size: 48, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:00:09,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-18 22:00:11,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=823647.1666666666, ans=0.2 2024-09-18 22:00:21,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.274e+02 2.370e+02 2.562e+02 3.431e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 22:00:54,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=823732.1666666666, ans=0.125 2024-09-18 22:01:04,336 INFO [train.py:1198] (1/2) Epoch 46, batch 3150, loss[loss=0.1789, ctc_loss=0.1158, cr_loss=0.3153, over 20990.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3677, over 4092821.10 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:01:54,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=823845.5, ans=0.0 2024-09-18 22:02:07,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=823873.8333333334, ans=0.0 2024-09-18 22:02:07,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=823873.8333333334, ans=0.2 2024-09-18 22:02:19,988 INFO [train.py:1198] (1/2) Epoch 46, batch 3200, loss[loss=0.2407, ctc_loss=0.1623, cr_loss=0.3923, over 18443.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3678, over 4096315.49 frames. ], batch size: 108, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:02:56,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=823958.8333333334, ans=0.0 2024-09-18 22:02:57,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.295e+02 2.427e+02 2.609e+02 3.768e+02, threshold=4.855e+02, percent-clipped=0.0 2024-09-18 22:03:03,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 22:03:08,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=823987.1666666666, ans=0.04949747468305833 2024-09-18 22:03:28,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824015.5, ans=0.1 2024-09-18 22:03:35,288 INFO [train.py:1198] (1/2) Epoch 46, batch 3250, loss[loss=0.2245, ctc_loss=0.1472, cr_loss=0.3865, over 20969.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3681, over 4097444.92 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:03:49,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=15.0 2024-09-18 22:04:01,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=824072.1666666666, ans=0.125 2024-09-18 22:04:30,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=824128.8333333334, ans=0.0 2024-09-18 22:04:32,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2024-09-18 22:04:37,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=824157.1666666666, ans=0.125 2024-09-18 22:04:40,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=824157.1666666666, ans=0.0 2024-09-18 22:04:48,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=824157.1666666666, ans=0.04949747468305833 2024-09-18 22:04:51,046 INFO [train.py:1198] (1/2) Epoch 46, batch 3300, loss[loss=0.2537, ctc_loss=0.1679, cr_loss=0.429, over 20079.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3682, over 4099583.97 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:05:28,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.245e+02 2.375e+02 2.573e+02 3.336e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 22:05:33,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824242.1666666666, ans=0.1 2024-09-18 22:06:02,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=824298.8333333334, ans=0.0 2024-09-18 22:06:05,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=824298.8333333334, ans=0.0 2024-09-18 22:06:09,669 INFO [train.py:1198] (1/2) Epoch 46, batch 3350, loss[loss=0.2164, ctc_loss=0.142, cr_loss=0.3723, over 20958.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.3699, over 4094311.31 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:06:09,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=824327.1666666666, ans=0.125 2024-09-18 22:06:26,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=824355.5, ans=0.025 2024-09-18 22:07:12,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=824440.5, ans=0.125 2024-09-18 22:07:15,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=824440.5, ans=0.2 2024-09-18 22:07:23,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=824440.5, ans=0.125 2024-09-18 22:07:25,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=824440.5, ans=0.125 2024-09-18 22:07:27,778 INFO [train.py:1198] (1/2) Epoch 46, batch 3400, loss[loss=0.2287, ctc_loss=0.1517, cr_loss=0.3854, over 20881.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3694, over 4087138.49 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:07:52,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=824497.1666666666, ans=0.0 2024-09-18 22:08:06,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.287e+02 2.442e+02 2.582e+02 5.299e+02, threshold=4.883e+02, percent-clipped=1.0 2024-09-18 22:08:09,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=824525.5, ans=0.125 2024-09-18 22:08:43,622 INFO [train.py:1198] (1/2) Epoch 46, batch 3450, loss[loss=0.2545, ctc_loss=0.1694, cr_loss=0.4254, over 21004.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3686, over 4092635.52 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:08:56,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=824610.5, ans=0.125 2024-09-18 22:09:05,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-18 22:09:43,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=824723.8333333334, ans=0.2 2024-09-18 22:09:45,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=824723.8333333334, ans=10.0 2024-09-18 22:09:51,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=824723.8333333334, ans=0.125 2024-09-18 22:09:58,566 INFO [train.py:1198] (1/2) Epoch 46, batch 3500, loss[loss=0.1882, ctc_loss=0.1218, cr_loss=0.3321, over 20991.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3678, over 4087709.39 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 22:10:37,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.261e+02 2.364e+02 2.578e+02 3.983e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 22:10:54,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=824837.1666666666, ans=0.2 2024-09-18 22:11:03,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=824865.5, ans=0.125 2024-09-18 22:11:03,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=824865.5, ans=0.2 2024-09-18 22:11:16,413 INFO [train.py:1198] (1/2) Epoch 46, batch 3550, loss[loss=0.258, ctc_loss=0.1759, cr_loss=0.4106, over 18213.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3677, over 4087612.00 frames. ], batch size: 108, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 22:11:16,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=824893.8333333334, ans=0.125 2024-09-18 22:12:32,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=825007.1666666666, ans=0.125 2024-09-18 22:12:34,648 INFO [train.py:1198] (1/2) Epoch 46, batch 3600, loss[loss=0.2104, ctc_loss=0.1415, cr_loss=0.3443, over 20979.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.367, over 4086081.75 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:12:34,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=825035.5, ans=0.04949747468305833 2024-09-18 22:13:10,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-18 22:13:13,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.231e+02 2.343e+02 2.514e+02 3.903e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 22:13:50,485 INFO [train.py:1198] (1/2) Epoch 46, batch 3650, loss[loss=0.1793, ctc_loss=0.1138, cr_loss=0.3274, over 20982.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3676, over 4092704.82 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:14:02,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=825177.1666666666, ans=0.0 2024-09-18 22:14:19,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=825233.8333333334, ans=0.2 2024-09-18 22:14:22,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=825233.8333333334, ans=0.125 2024-09-18 22:14:26,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825233.8333333334, ans=0.125 2024-09-18 22:14:40,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=825262.1666666666, ans=0.0 2024-09-18 22:14:54,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=825290.5, ans=0.125 2024-09-18 22:15:01,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=825290.5, ans=0.125 2024-09-18 22:15:06,139 INFO [train.py:1198] (1/2) Epoch 46, batch 3700, loss[loss=0.2314, ctc_loss=0.154, cr_loss=0.3869, over 21078.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3685, over 4092154.34 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:15:06,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=825318.8333333334, ans=0.2 2024-09-18 22:15:15,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=825318.8333333334, ans=0.125 2024-09-18 22:15:16,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=825318.8333333334, ans=0.025 2024-09-18 22:15:45,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.311e+02 2.412e+02 2.580e+02 4.323e+02, threshold=4.824e+02, percent-clipped=0.0 2024-09-18 22:16:05,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=825432.1666666666, ans=0.125 2024-09-18 22:16:10,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=825432.1666666666, ans=0.05 2024-09-18 22:16:21,044 INFO [train.py:1198] (1/2) Epoch 46, batch 3750, loss[loss=0.2183, ctc_loss=0.1456, cr_loss=0.3637, over 19496.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.367, over 4098660.62 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:16:39,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.72 vs. limit=6.0 2024-09-18 22:17:02,118 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:17:08,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=22.5 2024-09-18 22:17:12,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825545.5, ans=0.125 2024-09-18 22:17:23,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2024-09-18 22:17:29,212 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:17:30,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=825573.8333333334, ans=0.125 2024-09-18 22:17:33,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=825573.8333333334, ans=0.2 2024-09-18 22:17:33,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=825573.8333333334, ans=0.0 2024-09-18 22:17:39,555 INFO [train.py:1198] (1/2) Epoch 46, batch 3800, loss[loss=0.1735, ctc_loss=0.1107, cr_loss=0.3143, over 20969.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1413, cr_loss=0.366, over 4095890.38 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:17:48,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=825602.1666666666, ans=0.0 2024-09-18 22:17:57,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=825630.5, ans=0.2 2024-09-18 22:17:59,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=825630.5, ans=0.1 2024-09-18 22:18:00,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825630.5, ans=0.1 2024-09-18 22:18:02,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=825630.5, ans=0.2 2024-09-18 22:18:21,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.247e+02 2.356e+02 2.504e+02 3.127e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 22:18:30,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-18 22:18:41,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=825715.5, ans=0.2 2024-09-18 22:18:53,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=825715.5, ans=0.0 2024-09-18 22:18:57,834 INFO [train.py:1198] (1/2) Epoch 46, batch 3850, loss[loss=0.1849, ctc_loss=0.1213, cr_loss=0.3181, over 20955.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3665, over 4096820.77 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:19:18,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.37 vs. limit=22.5 2024-09-18 22:19:27,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=825800.5, ans=0.125 2024-09-18 22:19:28,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2024-09-18 22:20:13,270 INFO [train.py:1198] (1/2) Epoch 46, batch 3900, loss[loss=0.2154, ctc_loss=0.1422, cr_loss=0.3658, over 20258.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3652, over 4104027.42 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:20:26,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=825913.8333333334, ans=0.125 2024-09-18 22:20:48,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825942.1666666666, ans=0.1 2024-09-18 22:20:52,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.227e+02 2.394e+02 2.545e+02 3.122e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 22:20:59,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=825970.5, ans=0.025 2024-09-18 22:21:08,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=825970.5, ans=10.0 2024-09-18 22:21:19,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-09-18 22:21:20,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=825998.8333333334, ans=0.125 2024-09-18 22:21:20,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825998.8333333334, ans=0.125 2024-09-18 22:21:22,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=825998.8333333334, ans=0.0 2024-09-18 22:21:29,285 INFO [train.py:1198] (1/2) Epoch 46, batch 3950, loss[loss=0.2702, ctc_loss=0.1813, cr_loss=0.4449, over 19944.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3669, over 4111955.27 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:21:52,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=826055.5, ans=0.125 2024-09-18 22:22:19,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=826112.1666666666, ans=0.125 2024-09-18 22:22:44,853 INFO [train.py:1198] (1/2) Epoch 46, batch 4000, loss[loss=0.222, ctc_loss=0.1464, cr_loss=0.3785, over 21008.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 4104870.66 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:22:51,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=826168.8333333334, ans=15.0 2024-09-18 22:22:57,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2024-09-18 22:23:27,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.305e+02 2.402e+02 2.540e+02 6.910e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 22:23:34,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-18 22:23:58,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=826282.1666666666, ans=0.2 2024-09-18 22:24:05,876 INFO [train.py:1198] (1/2) Epoch 46, batch 4050, loss[loss=0.2011, ctc_loss=0.1302, cr_loss=0.3547, over 20776.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3678, over 4086932.78 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:25:00,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=826395.5, ans=0.125 2024-09-18 22:25:10,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=826423.8333333334, ans=0.0 2024-09-18 22:25:20,884 INFO [train.py:1198] (1/2) Epoch 46, batch 4100, loss[loss=0.2341, ctc_loss=0.1538, cr_loss=0.4016, over 20971.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3688, over 4075530.39 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:25:22,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=826452.1666666666, ans=0.2 2024-09-18 22:26:00,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.271e+02 2.414e+02 2.615e+02 4.860e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-18 22:26:35,865 INFO [train.py:1198] (1/2) Epoch 46, batch 4150, loss[loss=0.2229, ctc_loss=0.1465, cr_loss=0.3823, over 20942.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4079763.72 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:27:20,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=826678.8333333334, ans=0.125 2024-09-18 22:27:25,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=826678.8333333334, ans=0.025 2024-09-18 22:27:31,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826678.8333333334, ans=0.1 2024-09-18 22:27:43,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=826707.1666666666, ans=0.0 2024-09-18 22:27:52,126 INFO [train.py:1198] (1/2) Epoch 46, batch 4200, loss[loss=0.2231, ctc_loss=0.1482, cr_loss=0.3746, over 20943.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3691, over 4091933.79 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:28:34,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.304e+02 2.426e+02 2.594e+02 3.750e+02, threshold=4.852e+02, percent-clipped=0.0 2024-09-18 22:28:35,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-18 22:28:47,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-09-18 22:28:55,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=22.5 2024-09-18 22:29:11,251 INFO [train.py:1198] (1/2) Epoch 46, batch 4250, loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3821, over 20684.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3693, over 4081108.56 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:29:28,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=826905.5, ans=0.2 2024-09-18 22:29:35,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=826905.5, ans=0.04949747468305833 2024-09-18 22:29:40,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=826905.5, ans=0.0 2024-09-18 22:29:47,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826933.8333333334, ans=0.1 2024-09-18 22:30:14,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=826990.5, ans=0.0 2024-09-18 22:30:29,572 INFO [train.py:1198] (1/2) Epoch 46, batch 4300, loss[loss=0.1786, ctc_loss=0.1144, cr_loss=0.3209, over 20956.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3685, over 4081536.00 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:31:08,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.293e+02 2.404e+02 2.618e+02 4.801e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 22:31:27,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827103.8333333334, ans=0.1 2024-09-18 22:31:27,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827103.8333333334, ans=0.125 2024-09-18 22:31:45,347 INFO [train.py:1198] (1/2) Epoch 46, batch 4350, loss[loss=0.2024, ctc_loss=0.1323, cr_loss=0.3508, over 20621.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4091038.06 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:32:22,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-18 22:32:30,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=827245.5, ans=0.2 2024-09-18 22:32:35,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=827245.5, ans=0.125 2024-09-18 22:32:39,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=827245.5, ans=0.04949747468305833 2024-09-18 22:32:44,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=827273.8333333334, ans=0.2 2024-09-18 22:32:45,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=827273.8333333334, ans=0.125 2024-09-18 22:33:00,653 INFO [train.py:1198] (1/2) Epoch 46, batch 4400, loss[loss=0.1682, ctc_loss=0.1078, cr_loss=0.3019, over 21084.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3682, over 4102220.93 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:33:05,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827302.1666666666, ans=0.1 2024-09-18 22:33:40,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.279e+02 2.378e+02 2.503e+02 3.348e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-18 22:33:45,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=827387.1666666666, ans=0.125 2024-09-18 22:33:53,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=827387.1666666666, ans=0.0 2024-09-18 22:33:56,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-18 22:34:16,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.17 vs. limit=10.0 2024-09-18 22:34:20,251 INFO [train.py:1198] (1/2) Epoch 46, batch 4450, loss[loss=0.2328, ctc_loss=0.1616, cr_loss=0.356, over 13828.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1433, cr_loss=0.3698, over 4101634.58 frames. ], batch size: 149, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:34:47,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827472.1666666666, ans=0.125 2024-09-18 22:34:54,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=827500.5, ans=0.0 2024-09-18 22:35:14,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827528.8333333334, ans=0.125 2024-09-18 22:35:18,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=827528.8333333334, ans=0.2 2024-09-18 22:35:38,952 INFO [train.py:1198] (1/2) Epoch 46, batch 4500, loss[loss=0.2175, ctc_loss=0.1409, cr_loss=0.3832, over 21058.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.368, over 4107817.50 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:36:09,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2024-09-18 22:36:19,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.229e+02 2.370e+02 2.521e+02 3.571e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 22:36:19,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=827642.1666666666, ans=0.2 2024-09-18 22:36:26,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=827670.5, ans=15.0 2024-09-18 22:36:54,138 INFO [train.py:1198] (1/2) Epoch 46, batch 4550, loss[loss=0.1862, ctc_loss=0.1202, cr_loss=0.3301, over 20974.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3692, over 4102694.54 frames. ], batch size: 48, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:37:21,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=827755.5, ans=0.05 2024-09-18 22:37:23,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=827783.8333333334, ans=0.0 2024-09-18 22:37:54,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827840.5, ans=0.1 2024-09-18 22:38:08,636 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:38:09,640 INFO [train.py:1198] (1/2) Epoch 46, batch 4600, loss[loss=0.2039, ctc_loss=0.1332, cr_loss=0.3532, over 20859.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3684, over 4100600.05 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:38:50,898 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.260e+02 2.443e+02 2.616e+02 3.291e+02, threshold=4.886e+02, percent-clipped=0.0 2024-09-18 22:38:52,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=827925.5, ans=0.0 2024-09-18 22:39:25,270 INFO [train.py:1198] (1/2) Epoch 46, batch 4650, loss[loss=0.2428, ctc_loss=0.1608, cr_loss=0.4102, over 20298.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3679, over 4091739.78 frames. ], batch size: 74, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:39:39,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 22:39:58,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=828067.1666666666, ans=0.2 2024-09-18 22:40:40,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=828123.8333333334, ans=0.2 2024-09-18 22:40:46,637 INFO [train.py:1198] (1/2) Epoch 46, batch 4700, loss[loss=0.1675, ctc_loss=0.1071, cr_loss=0.302, over 20268.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.367, over 4080313.36 frames. ], batch size: 45, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:40:56,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=828152.1666666666, ans=0.0 2024-09-18 22:40:57,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=828152.1666666666, ans=0.07 2024-09-18 22:41:05,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=828180.5, ans=0.125 2024-09-18 22:41:27,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.277e+02 2.392e+02 2.504e+02 5.616e+02, threshold=4.785e+02, percent-clipped=1.0 2024-09-18 22:42:02,565 INFO [train.py:1198] (1/2) Epoch 46, batch 4750, loss[loss=0.2728, ctc_loss=0.183, cr_loss=0.4487, over 18490.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3683, over 4071264.38 frames. ], batch size: 108, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:42:17,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=828322.1666666666, ans=0.125 2024-09-18 22:42:26,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828322.1666666666, ans=0.1 2024-09-18 22:42:52,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=828378.8333333334, ans=0.2 2024-09-18 22:43:03,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2024-09-18 22:43:12,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=828407.1666666666, ans=0.07 2024-09-18 22:43:18,004 INFO [train.py:1198] (1/2) Epoch 46, batch 4800, loss[loss=0.1727, ctc_loss=0.113, cr_loss=0.2986, over 20973.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3683, over 4070774.89 frames. ], batch size: 48, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:43:26,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=828435.5, ans=0.0 2024-09-18 22:43:59,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.285e+02 2.408e+02 2.560e+02 3.296e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 22:44:31,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828548.8333333334, ans=0.125 2024-09-18 22:44:34,231 INFO [train.py:1198] (1/2) Epoch 46, batch 4850, loss[loss=0.2494, ctc_loss=0.1679, cr_loss=0.4075, over 20061.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.143, cr_loss=0.3696, over 4085175.73 frames. ], batch size: 80, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:45:06,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=828633.8333333334, ans=0.125 2024-09-18 22:45:17,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=828633.8333333334, ans=0.125 2024-09-18 22:45:37,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=22.5 2024-09-18 22:45:53,743 INFO [train.py:1198] (1/2) Epoch 46, batch 4900, loss[loss=0.1878, ctc_loss=0.1221, cr_loss=0.3283, over 20967.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3671, over 4088675.76 frames. ], batch size: 49, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:46:04,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=828718.8333333334, ans=0.125 2024-09-18 22:46:14,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=828747.1666666666, ans=0.0 2024-09-18 22:46:27,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=828775.5, ans=0.04949747468305833 2024-09-18 22:46:32,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=828775.5, ans=15.0 2024-09-18 22:46:34,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.208e+02 2.346e+02 2.474e+02 3.972e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-18 22:46:44,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828803.8333333334, ans=0.1 2024-09-18 22:47:12,545 INFO [train.py:1198] (1/2) Epoch 46, batch 4950, loss[loss=0.2313, ctc_loss=0.1533, cr_loss=0.3901, over 20684.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3675, over 4102791.04 frames. ], batch size: 66, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:47:15,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=828860.5, ans=0.125 2024-09-18 22:47:31,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=828888.8333333334, ans=0.1 2024-09-18 22:47:57,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=828945.5, ans=0.0 2024-09-18 22:48:03,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=828945.5, ans=0.0 2024-09-18 22:48:15,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=828973.8333333334, ans=0.125 2024-09-18 22:48:27,348 INFO [train.py:1198] (1/2) Epoch 46, batch 5000, loss[loss=0.1967, ctc_loss=0.1297, cr_loss=0.3349, over 20980.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 4110591.23 frames. ], batch size: 52, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:48:33,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829002.1666666666, ans=0.125 2024-09-18 22:48:41,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=829030.5, ans=0.125 2024-09-18 22:48:41,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=829030.5, ans=0.125 2024-09-18 22:49:06,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=829058.8333333334, ans=0.125 2024-09-18 22:49:07,668 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.269e+02 2.385e+02 2.525e+02 7.112e+02, threshold=4.771e+02, percent-clipped=1.0 2024-09-18 22:49:16,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=829087.1666666666, ans=0.0 2024-09-18 22:49:31,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829115.5, ans=0.1 2024-09-18 22:49:34,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=829115.5, ans=0.0 2024-09-18 22:49:35,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=829115.5, ans=0.0 2024-09-18 22:49:41,293 INFO [train.py:1198] (1/2) Epoch 46, batch 5050, loss[loss=0.2527, ctc_loss=0.1693, cr_loss=0.4169, over 18461.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3683, over 4101518.92 frames. ], batch size: 108, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:49:52,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=829143.8333333334, ans=0.95 2024-09-18 22:50:05,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=829172.1666666666, ans=0.2 2024-09-18 22:50:21,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=829200.5, ans=0.125 2024-09-18 22:50:23,009 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:50:28,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=829228.8333333334, ans=0.0 2024-09-18 22:50:47,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=829257.1666666666, ans=0.2 2024-09-18 22:50:55,083 INFO [train.py:1198] (1/2) Epoch 46, batch 5100, loss[loss=0.2247, ctc_loss=0.1495, cr_loss=0.3761, over 19498.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3689, over 4098854.99 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:51:15,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-18 22:51:35,595 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.263e+02 2.420e+02 2.545e+02 3.069e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-18 22:52:10,008 INFO [train.py:1198] (1/2) Epoch 46, batch 5150, loss[loss=0.2257, ctc_loss=0.1476, cr_loss=0.3905, over 20796.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.3704, over 4099714.35 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:52:23,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=829455.5, ans=0.0 2024-09-18 22:52:56,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=829512.1666666666, ans=0.125 2024-09-18 22:53:25,162 INFO [train.py:1198] (1/2) Epoch 46, batch 5200, loss[loss=0.2136, ctc_loss=0.1404, cr_loss=0.3656, over 20772.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3698, over 4106625.77 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:54:05,800 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.230e+02 2.346e+02 2.494e+02 3.637e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-18 22:54:10,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=829653.8333333334, ans=0.015 2024-09-18 22:54:10,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=829653.8333333334, ans=0.125 2024-09-18 22:54:21,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=829653.8333333334, ans=0.125 2024-09-18 22:54:21,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-18 22:54:34,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=829682.1666666666, ans=0.2 2024-09-18 22:54:40,168 INFO [train.py:1198] (1/2) Epoch 46, batch 5250, loss[loss=0.1985, ctc_loss=0.1287, cr_loss=0.349, over 20890.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3684, over 4115022.36 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:54:40,625 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:54:43,403 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:54:49,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829710.5, ans=0.1 2024-09-18 22:54:50,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=829710.5, ans=0.0 2024-09-18 22:54:55,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829738.8333333334, ans=0.1 2024-09-18 22:55:56,819 INFO [train.py:1198] (1/2) Epoch 46, batch 5300, loss[loss=0.2185, ctc_loss=0.1456, cr_loss=0.3646, over 20927.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3684, over 4105238.13 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:55:57,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=829852.1666666666, ans=0.125 2024-09-18 22:56:03,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=829852.1666666666, ans=0.125 2024-09-18 22:56:37,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=22.5 2024-09-18 22:56:38,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=829908.8333333334, ans=0.125 2024-09-18 22:56:41,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.277e+02 2.415e+02 2.568e+02 4.148e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 22:57:02,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-18 22:57:14,037 INFO [train.py:1198] (1/2) Epoch 46, batch 5350, loss[loss=0.2509, ctc_loss=0.1734, cr_loss=0.3874, over 14094.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3671, over 4096553.45 frames. ], batch size: 149, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:57:36,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-18 22:58:06,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=830078.8333333334, ans=0.2 2024-09-18 22:58:22,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=830107.1666666666, ans=0.0 2024-09-18 22:58:22,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-18 22:58:23,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830107.1666666666, ans=0.1 2024-09-18 22:58:25,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=830107.1666666666, ans=0.2 2024-09-18 22:58:27,972 INFO [train.py:1198] (1/2) Epoch 46, batch 5400, loss[loss=0.2137, ctc_loss=0.1399, cr_loss=0.3692, over 20767.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3664, over 4088574.44 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:58:40,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=830135.5, ans=0.125 2024-09-18 22:58:47,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=12.0 2024-09-18 22:59:09,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.265e+02 2.377e+02 2.529e+02 3.871e+02, threshold=4.754e+02, percent-clipped=0.0 2024-09-18 22:59:39,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830248.8333333334, ans=0.125 2024-09-18 22:59:42,816 INFO [train.py:1198] (1/2) Epoch 46, batch 5450, loss[loss=0.2489, ctc_loss=0.1671, cr_loss=0.4087, over 18464.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3676, over 4084900.62 frames. ], batch size: 108, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:59:52,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-09-18 22:59:58,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=830305.5, ans=0.5 2024-09-18 22:59:58,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=830305.5, ans=0.125 2024-09-18 23:00:09,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830305.5, ans=0.1 2024-09-18 23:00:29,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-18 23:00:31,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.62 vs. limit=10.0 2024-09-18 23:00:56,889 INFO [train.py:1198] (1/2) Epoch 46, batch 5500, loss[loss=0.2326, ctc_loss=0.1549, cr_loss=0.3888, over 20815.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.366, over 4096100.85 frames. ], batch size: 65, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:01:01,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830418.8333333334, ans=0.1 2024-09-18 23:01:06,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830418.8333333334, ans=0.125 2024-09-18 23:01:38,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.271e+02 2.381e+02 2.540e+02 3.416e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 23:01:50,858 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:02:11,213 INFO [train.py:1198] (1/2) Epoch 46, batch 5550, loss[loss=0.216, ctc_loss=0.143, cr_loss=0.3648, over 20799.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.365, over 4094268.17 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:02:11,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=830560.5, ans=0.07 2024-09-18 23:02:14,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=830560.5, ans=0.125 2024-09-18 23:02:22,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-18 23:02:23,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=830560.5, ans=0.2 2024-09-18 23:02:43,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=830617.1666666666, ans=0.125 2024-09-18 23:03:25,178 INFO [train.py:1198] (1/2) Epoch 46, batch 5600, loss[loss=0.2047, ctc_loss=0.1338, cr_loss=0.3543, over 20798.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.141, cr_loss=0.3655, over 4095804.98 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:03:37,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=830702.1666666666, ans=0.2 2024-09-18 23:03:44,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=830730.5, ans=0.025 2024-09-18 23:03:49,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830730.5, ans=0.125 2024-09-18 23:03:55,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=830758.8333333334, ans=0.0 2024-09-18 23:04:01,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830758.8333333334, ans=0.1 2024-09-18 23:04:06,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.276e+02 2.421e+02 2.574e+02 4.473e+02, threshold=4.842e+02, percent-clipped=0.0 2024-09-18 23:04:10,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=830787.1666666666, ans=0.2 2024-09-18 23:04:18,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=830787.1666666666, ans=0.125 2024-09-18 23:04:41,661 INFO [train.py:1198] (1/2) Epoch 46, batch 5650, loss[loss=0.2476, ctc_loss=0.1645, cr_loss=0.4153, over 20985.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3681, over 4104856.44 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:04:52,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830843.8333333334, ans=0.1 2024-09-18 23:05:11,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=830900.5, ans=0.125 2024-09-18 23:05:49,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830957.1666666666, ans=0.1 2024-09-18 23:05:58,333 INFO [train.py:1198] (1/2) Epoch 46, batch 5700, loss[loss=0.2077, ctc_loss=0.1349, cr_loss=0.3643, over 21058.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1429, cr_loss=0.3696, over 4089922.16 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:06:02,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-18 23:06:05,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=8.0 2024-09-18 23:06:13,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2024-09-18 23:06:39,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.244e+02 2.366e+02 2.522e+02 4.217e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-18 23:06:57,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=831098.8333333334, ans=0.5 2024-09-18 23:07:11,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=831127.1666666666, ans=0.1 2024-09-18 23:07:12,363 INFO [train.py:1198] (1/2) Epoch 46, batch 5750, loss[loss=0.2269, ctc_loss=0.1487, cr_loss=0.391, over 21045.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3694, over 4099481.24 frames. ], batch size: 63, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:07:17,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=831127.1666666666, ans=0.025 2024-09-18 23:07:20,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:08:26,776 INFO [train.py:1198] (1/2) Epoch 46, batch 5800, loss[loss=0.186, ctc_loss=0.1216, cr_loss=0.3218, over 20984.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.37, over 4093770.37 frames. ], batch size: 52, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:08:31,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=831268.8333333334, ans=0.025 2024-09-18 23:08:52,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831297.1666666666, ans=0.1 2024-09-18 23:08:57,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=831325.5, ans=0.125 2024-09-18 23:09:08,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.225e+02 2.338e+02 2.574e+02 5.613e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-18 23:09:37,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-18 23:09:41,284 INFO [train.py:1198] (1/2) Epoch 46, batch 5850, loss[loss=0.1848, ctc_loss=0.1183, cr_loss=0.3324, over 20950.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1433, cr_loss=0.37, over 4100509.74 frames. ], batch size: 50, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:09:41,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=831410.5, ans=0.025 2024-09-18 23:09:47,660 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:10:08,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=831438.8333333334, ans=0.0 2024-09-18 23:10:55,553 INFO [train.py:1198] (1/2) Epoch 46, batch 5900, loss[loss=0.2205, ctc_loss=0.1434, cr_loss=0.3855, over 20982.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1435, cr_loss=0.37, over 4097754.38 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:11:34,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=831608.8333333334, ans=0.0 2024-09-18 23:11:37,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.284e+02 2.401e+02 2.557e+02 3.628e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 23:12:09,955 INFO [train.py:1198] (1/2) Epoch 46, batch 5950, loss[loss=0.1881, ctc_loss=0.1196, cr_loss=0.342, over 20958.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3673, over 4110119.05 frames. ], batch size: 49, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:12:11,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=831693.8333333334, ans=0.1 2024-09-18 23:12:11,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=831693.8333333334, ans=0.0 2024-09-18 23:12:22,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=831693.8333333334, ans=0.125 2024-09-18 23:12:23,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=831722.1666666666, ans=0.125 2024-09-18 23:12:29,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=831722.1666666666, ans=0.125 2024-09-18 23:12:59,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-18 23:13:06,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=831778.8333333334, ans=0.125 2024-09-18 23:13:26,523 INFO [train.py:1198] (1/2) Epoch 46, batch 6000, loss[loss=0.1843, ctc_loss=0.1196, cr_loss=0.3233, over 20769.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3678, over 4110798.07 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:13:26,523 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 23:13:38,209 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9951, 4.7143, 4.5286, 4.1417], device='cuda:1') 2024-09-18 23:13:45,368 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.039, ctc_loss=0.039, cr_loss=1.565e-14, over 944034.00 frames. 2024-09-18 23:13:45,368 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 23:13:56,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=831835.5, ans=0.0 2024-09-18 23:14:09,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=831863.8333333334, ans=0.125 2024-09-18 23:14:26,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.266e+02 2.435e+02 2.603e+02 3.470e+02, threshold=4.869e+02, percent-clipped=0.0 2024-09-18 23:14:31,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=831920.5, ans=0.1 2024-09-18 23:14:38,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-18 23:14:41,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=831920.5, ans=0.125 2024-09-18 23:14:52,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2024-09-18 23:14:59,702 INFO [train.py:1198] (1/2) Epoch 46, batch 6050, loss[loss=0.2205, ctc_loss=0.1449, cr_loss=0.3783, over 20654.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3676, over 4114132.79 frames. ], batch size: 68, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:15:06,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=831977.1666666666, ans=0.125 2024-09-18 23:15:18,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=832005.5, ans=0.2 2024-09-18 23:15:24,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832005.5, ans=0.1 2024-09-18 23:15:52,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=832062.1666666666, ans=0.0 2024-09-18 23:16:04,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832090.5, ans=0.125 2024-09-18 23:16:14,965 INFO [train.py:1198] (1/2) Epoch 46, batch 6100, loss[loss=0.2074, ctc_loss=0.1339, cr_loss=0.3675, over 21028.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1409, cr_loss=0.3648, over 4101814.71 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:16:19,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=832118.8333333334, ans=0.0 2024-09-18 23:16:58,072 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.260e+02 2.376e+02 2.543e+02 5.489e+02, threshold=4.751e+02, percent-clipped=2.0 2024-09-18 23:17:03,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-18 23:17:26,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832232.1666666666, ans=0.125 2024-09-18 23:17:28,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=832260.5, ans=0.04949747468305833 2024-09-18 23:17:29,420 INFO [train.py:1198] (1/2) Epoch 46, batch 6150, loss[loss=0.259, ctc_loss=0.1737, cr_loss=0.4262, over 20088.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1413, cr_loss=0.3656, over 4096507.74 frames. ], batch size: 80, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:17:33,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=832260.5, ans=0.125 2024-09-18 23:17:35,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=832260.5, ans=0.0 2024-09-18 23:18:05,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=832317.1666666666, ans=0.125 2024-09-18 23:18:43,724 INFO [train.py:1198] (1/2) Epoch 46, batch 6200, loss[loss=0.2684, ctc_loss=0.1801, cr_loss=0.4417, over 20672.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3681, over 4078720.51 frames. ], batch size: 71, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:19:13,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832458.8333333334, ans=0.1 2024-09-18 23:19:26,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.290e+02 2.419e+02 2.581e+02 5.817e+02, threshold=4.837e+02, percent-clipped=1.0 2024-09-18 23:19:33,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=832487.1666666666, ans=0.0 2024-09-18 23:19:46,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=832515.5, ans=0.125 2024-09-18 23:19:48,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=832515.5, ans=0.125 2024-09-18 23:19:50,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=832515.5, ans=0.125 2024-09-18 23:19:57,874 INFO [train.py:1198] (1/2) Epoch 46, batch 6250, loss[loss=0.2527, ctc_loss=0.1746, cr_loss=0.3907, over 18058.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3699, over 4055824.64 frames. ], batch size: 108, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:19:59,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=832543.8333333334, ans=0.0 2024-09-18 23:20:45,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=832628.8333333334, ans=0.1 2024-09-18 23:21:06,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-18 23:21:12,927 INFO [train.py:1198] (1/2) Epoch 46, batch 6300, loss[loss=0.1972, ctc_loss=0.1266, cr_loss=0.3534, over 19968.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.3682, over 4004503.36 frames. ], batch size: 44, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:21:55,150 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.359e+02 2.520e+02 2.688e+02 4.055e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-18 23:21:58,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=832770.5, ans=0.2 2024-09-18 23:22:24,612 INFO [train.py:1198] (1/2) Epoch 46, batch 6350, loss[loss=0.2747, ctc_loss=0.1889, cr_loss=0.4286, over 14457.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3734, over 3832194.13 frames. ], batch size: 149, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:22:50,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=832855.5, ans=0.0 2024-09-18 23:22:55,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832883.8333333334, ans=0.125 2024-09-18 23:23:00,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832883.8333333334, ans=0.1 2024-09-18 23:23:20,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-18 23:24:13,794 INFO [train.py:1198] (1/2) Epoch 47, batch 0, loss[loss=0.2419, ctc_loss=0.1615, cr_loss=0.4024, over 19597.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1615, cr_loss=0.4024, over 19597.00 frames. ], batch size: 90, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:24:13,795 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 23:24:29,391 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2265, 3.5079, 3.5150, 4.0034, 4.5452, 4.2981, 3.8008, 4.1549], device='cuda:1') 2024-09-18 23:24:34,761 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.03863, ctc_loss=0.03863, cr_loss=1.546e-14, over 944034.00 frames. 2024-09-18 23:24:34,761 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-18 23:25:25,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=833028.3333333334, ans=0.04949747468305833 2024-09-18 23:25:32,212 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.461e+02 2.757e+02 3.058e+02 4.288e+02, threshold=5.514e+02, percent-clipped=0.0 2024-09-18 23:25:50,250 INFO [train.py:1198] (1/2) Epoch 47, batch 50, loss[loss=0.1968, ctc_loss=0.129, cr_loss=0.339, over 20976.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1414, cr_loss=0.365, over 930478.18 frames. ], batch size: 50, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:25:52,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=833085.0, ans=0.125 2024-09-18 23:26:17,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=833113.3333333334, ans=0.2 2024-09-18 23:26:35,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=833170.0, ans=15.0 2024-09-18 23:26:40,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=833170.0, ans=0.125 2024-09-18 23:26:48,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=833170.0, ans=0.125 2024-09-18 23:27:05,911 INFO [train.py:1198] (1/2) Epoch 47, batch 100, loss[loss=0.2017, ctc_loss=0.1311, cr_loss=0.353, over 21056.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1391, cr_loss=0.3627, over 1647179.17 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:27:38,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-18 23:27:49,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=833311.6666666666, ans=0.2 2024-09-18 23:28:03,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=833311.6666666666, ans=0.0 2024-09-18 23:28:05,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.287e+02 2.413e+02 2.555e+02 3.150e+02, threshold=4.826e+02, percent-clipped=0.0 2024-09-18 23:28:15,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=833340.0, ans=0.125 2024-09-18 23:28:24,219 INFO [train.py:1198] (1/2) Epoch 47, batch 150, loss[loss=0.2352, ctc_loss=0.1568, cr_loss=0.3922, over 20932.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3664, over 2188737.73 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:28:27,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=833368.3333333334, ans=0.0 2024-09-18 23:29:00,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833425.0, ans=0.125 2024-09-18 23:29:09,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=833453.3333333334, ans=0.0 2024-09-18 23:29:13,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-18 23:29:15,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=833453.3333333334, ans=0.125 2024-09-18 23:29:39,813 INFO [train.py:1198] (1/2) Epoch 47, batch 200, loss[loss=0.2192, ctc_loss=0.1428, cr_loss=0.382, over 20966.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1431, cr_loss=0.3708, over 2608469.83 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:30:40,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.264e+02 2.382e+02 2.575e+02 3.254e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-18 23:30:58,759 INFO [train.py:1198] (1/2) Epoch 47, batch 250, loss[loss=0.2322, ctc_loss=0.1517, cr_loss=0.4025, over 21028.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3701, over 2942433.06 frames. ], batch size: 63, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:31:02,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-18 23:31:06,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=833651.6666666666, ans=0.0 2024-09-18 23:31:09,621 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:31:12,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2024-09-18 23:31:12,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=833680.0, ans=0.0 2024-09-18 23:31:32,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=833708.3333333334, ans=0.0 2024-09-18 23:31:56,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833736.6666666666, ans=0.1 2024-09-18 23:31:59,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833765.0, ans=0.1 2024-09-18 23:32:13,960 INFO [train.py:1198] (1/2) Epoch 47, batch 300, loss[loss=0.2315, ctc_loss=0.1501, cr_loss=0.4073, over 20890.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3689, over 3211260.28 frames. ], batch size: 54, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:32:21,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=833793.3333333334, ans=0.2 2024-09-18 23:32:21,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=833793.3333333334, ans=0.2 2024-09-18 23:32:30,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833821.6666666666, ans=0.125 2024-09-18 23:32:31,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=22.5 2024-09-18 23:32:46,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=833850.0, ans=0.125 2024-09-18 23:33:11,436 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.230e+02 2.339e+02 2.499e+02 8.658e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 23:33:29,889 INFO [train.py:1198] (1/2) Epoch 47, batch 350, loss[loss=0.2232, ctc_loss=0.1447, cr_loss=0.3924, over 21055.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3688, over 3419023.16 frames. ], batch size: 59, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:33:39,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-18 23:33:56,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=833963.3333333334, ans=0.125 2024-09-18 23:34:48,004 INFO [train.py:1198] (1/2) Epoch 47, batch 400, loss[loss=0.2175, ctc_loss=0.1401, cr_loss=0.387, over 20785.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1418, cr_loss=0.3682, over 3575471.72 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:34:49,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=834076.6666666666, ans=0.0 2024-09-18 23:35:14,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-18 23:35:23,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-18 23:35:30,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=834133.3333333334, ans=0.025 2024-09-18 23:35:42,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=834161.6666666666, ans=0.5 2024-09-18 23:35:45,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.236e+02 2.401e+02 2.534e+02 4.806e+02, threshold=4.802e+02, percent-clipped=2.0 2024-09-18 23:35:50,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=834190.0, ans=0.125 2024-09-18 23:36:06,470 INFO [train.py:1198] (1/2) Epoch 47, batch 450, loss[loss=0.2256, ctc_loss=0.1481, cr_loss=0.3873, over 21026.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3679, over 3695150.99 frames. ], batch size: 63, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:36:17,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834218.3333333334, ans=0.1 2024-09-18 23:36:29,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=834246.6666666666, ans=0.0 2024-09-18 23:36:50,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834303.3333333334, ans=0.1 2024-09-18 23:36:59,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834303.3333333334, ans=0.125 2024-09-18 23:37:12,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=834331.6666666666, ans=0.125 2024-09-18 23:37:18,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834331.6666666666, ans=0.125 2024-09-18 23:37:21,200 INFO [train.py:1198] (1/2) Epoch 47, batch 500, loss[loss=0.2074, ctc_loss=0.1367, cr_loss=0.3537, over 21054.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 3783437.04 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:37:48,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=834388.3333333334, ans=0.125 2024-09-18 23:38:16,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=834445.0, ans=0.0 2024-09-18 23:38:18,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.250e+02 2.368e+02 2.551e+02 3.301e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 23:38:24,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-18 23:38:31,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=834473.3333333334, ans=0.125 2024-09-18 23:38:37,096 INFO [train.py:1198] (1/2) Epoch 47, batch 550, loss[loss=0.2245, ctc_loss=0.1479, cr_loss=0.3832, over 21016.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3683, over 3858811.97 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:39:09,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-18 23:39:38,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=834615.0, ans=0.2 2024-09-18 23:39:54,869 INFO [train.py:1198] (1/2) Epoch 47, batch 600, loss[loss=0.2108, ctc_loss=0.1419, cr_loss=0.3446, over 20927.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3673, over 3906473.31 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:40:07,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=834643.3333333334, ans=0.025 2024-09-18 23:40:15,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=834671.6666666666, ans=0.125 2024-09-18 23:40:16,650 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:40:52,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.241e+02 2.402e+02 2.571e+02 5.153e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 23:41:00,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-18 23:41:10,541 INFO [train.py:1198] (1/2) Epoch 47, batch 650, loss[loss=0.2004, ctc_loss=0.1311, cr_loss=0.3462, over 21073.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3685, over 3957748.83 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:41:21,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834785.0, ans=0.125 2024-09-18 23:41:37,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-18 23:41:40,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-18 23:42:10,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834870.0, ans=0.1 2024-09-18 23:42:17,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=834898.3333333334, ans=0.0 2024-09-18 23:42:25,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=8.0 2024-09-18 23:42:26,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=22.5 2024-09-18 23:42:29,017 INFO [train.py:1198] (1/2) Epoch 47, batch 700, loss[loss=0.1945, ctc_loss=0.1258, cr_loss=0.3435, over 20990.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3679, over 3990790.48 frames. ], batch size: 55, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:43:03,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=834983.3333333334, ans=0.125 2024-09-18 23:43:03,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834983.3333333334, ans=0.1 2024-09-18 23:43:09,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=834983.3333333334, ans=0.125 2024-09-18 23:43:15,920 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:43:25,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.323e+02 2.458e+02 2.598e+02 5.259e+02, threshold=4.917e+02, percent-clipped=1.0 2024-09-18 23:43:35,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=835040.0, ans=0.125 2024-09-18 23:43:35,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=12.0 2024-09-18 23:43:43,785 INFO [train.py:1198] (1/2) Epoch 47, batch 750, loss[loss=0.2463, ctc_loss=0.1604, cr_loss=0.4299, over 20809.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3695, over 4013317.68 frames. ], batch size: 65, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:44:37,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-18 23:44:51,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-18 23:44:56,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835181.6666666666, ans=0.1 2024-09-18 23:44:59,551 INFO [train.py:1198] (1/2) Epoch 47, batch 800, loss[loss=0.2223, ctc_loss=0.1448, cr_loss=0.3872, over 21015.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3677, over 4039654.34 frames. ], batch size: 63, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:45:12,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=835210.0, ans=0.125 2024-09-18 23:45:39,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835266.6666666666, ans=0.1 2024-09-18 23:45:59,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.234e+02 2.340e+02 2.471e+02 3.114e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 23:46:08,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=835323.3333333334, ans=0.95 2024-09-18 23:46:17,397 INFO [train.py:1198] (1/2) Epoch 47, batch 850, loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3672, over 21036.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.369, over 4051473.66 frames. ], batch size: 62, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:46:20,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=835351.6666666666, ans=0.0 2024-09-18 23:46:37,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=835380.0, ans=0.125 2024-09-18 23:46:37,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=835380.0, ans=15.0 2024-09-18 23:46:43,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=835380.0, ans=0.0 2024-09-18 23:46:52,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835408.3333333334, ans=0.1 2024-09-18 23:47:12,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=835436.6666666666, ans=0.125 2024-09-18 23:47:35,851 INFO [train.py:1198] (1/2) Epoch 47, batch 900, loss[loss=0.1678, ctc_loss=0.1087, cr_loss=0.2955, over 20978.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1417, cr_loss=0.3685, over 4070394.94 frames. ], batch size: 51, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:47:57,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=835521.6666666666, ans=0.125 2024-09-18 23:48:33,149 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.249e+02 2.397e+02 2.521e+02 3.038e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 23:48:41,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=835606.6666666666, ans=0.125 2024-09-18 23:48:45,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835606.6666666666, ans=0.1 2024-09-18 23:48:51,190 INFO [train.py:1198] (1/2) Epoch 47, batch 950, loss[loss=0.1749, ctc_loss=0.1112, cr_loss=0.3183, over 21001.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3689, over 4069773.98 frames. ], batch size: 48, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:49:03,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=835635.0, ans=0.035 2024-09-18 23:49:19,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=835691.6666666666, ans=0.125 2024-09-18 23:50:06,089 INFO [train.py:1198] (1/2) Epoch 47, batch 1000, loss[loss=0.1947, ctc_loss=0.1267, cr_loss=0.3402, over 20788.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3687, over 4063148.49 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:51:00,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835861.6666666666, ans=0.125 2024-09-18 23:51:07,967 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.275e+02 2.405e+02 2.540e+02 4.645e+02, threshold=4.809e+02, percent-clipped=0.0 2024-09-18 23:51:14,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=835890.0, ans=0.125 2024-09-18 23:51:24,656 INFO [train.py:1198] (1/2) Epoch 47, batch 1050, loss[loss=0.2227, ctc_loss=0.1444, cr_loss=0.3912, over 21004.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3695, over 4067137.43 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:51:26,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=835918.3333333334, ans=0.0 2024-09-18 23:51:50,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835946.6666666666, ans=0.1 2024-09-18 23:52:39,737 INFO [train.py:1198] (1/2) Epoch 47, batch 1100, loss[loss=0.2045, ctc_loss=0.1343, cr_loss=0.3512, over 20800.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.369, over 4053797.32 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:52:56,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=836088.3333333334, ans=0.125 2024-09-18 23:53:17,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=836116.6666666666, ans=0.2 2024-09-18 23:53:32,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=836145.0, ans=0.09899494936611666 2024-09-18 23:53:38,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=836145.0, ans=0.0 2024-09-18 23:53:41,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.275e+02 2.364e+02 2.538e+02 5.104e+02, threshold=4.729e+02, percent-clipped=1.0 2024-09-18 23:53:57,447 INFO [train.py:1198] (1/2) Epoch 47, batch 1150, loss[loss=0.2341, ctc_loss=0.1576, cr_loss=0.3824, over 21060.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1425, cr_loss=0.3694, over 4058221.47 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:54:05,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=836201.6666666666, ans=0.125 2024-09-18 23:54:10,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836201.6666666666, ans=0.1 2024-09-18 23:54:40,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=836258.3333333334, ans=0.125 2024-09-18 23:55:13,751 INFO [train.py:1198] (1/2) Epoch 47, batch 1200, loss[loss=0.2442, ctc_loss=0.164, cr_loss=0.4011, over 19442.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3692, over 4075104.90 frames. ], batch size: 90, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:55:36,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836371.6666666666, ans=0.1 2024-09-18 23:55:41,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836371.6666666666, ans=0.0 2024-09-18 23:55:51,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-09-18 23:55:59,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=836428.3333333334, ans=0.0 2024-09-18 23:56:08,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=836428.3333333334, ans=0.125 2024-09-18 23:56:12,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.286e+02 2.448e+02 2.685e+02 6.865e+02, threshold=4.896e+02, percent-clipped=1.0 2024-09-18 23:56:16,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-18 23:56:32,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-18 23:56:32,898 INFO [train.py:1198] (1/2) Epoch 47, batch 1250, loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3854, over 18203.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3679, over 4076908.09 frames. ], batch size: 108, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:56:42,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=836485.0, ans=0.125 2024-09-18 23:57:49,054 INFO [train.py:1198] (1/2) Epoch 47, batch 1300, loss[loss=0.2516, ctc_loss=0.1669, cr_loss=0.4237, over 20955.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3668, over 4078374.03 frames. ], batch size: 67, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:57:50,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=836626.6666666666, ans=0.125 2024-09-18 23:58:01,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=836626.6666666666, ans=0.125 2024-09-18 23:58:10,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=836655.0, ans=0.125 2024-09-18 23:58:50,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.253e+02 2.367e+02 2.483e+02 4.662e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 23:59:01,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=836740.0, ans=0.0 2024-09-18 23:59:07,231 INFO [train.py:1198] (1/2) Epoch 47, batch 1350, loss[loss=0.1877, ctc_loss=0.1208, cr_loss=0.3342, over 21051.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 4075704.38 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:59:15,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=836768.3333333334, ans=0.125 2024-09-18 23:59:53,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836853.3333333334, ans=0.1 2024-09-18 23:59:56,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-19 00:00:23,293 INFO [train.py:1198] (1/2) Epoch 47, batch 1400, loss[loss=0.1985, ctc_loss=0.1288, cr_loss=0.3486, over 20977.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1412, cr_loss=0.3663, over 4086480.67 frames. ], batch size: 48, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:00:26,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836910.0, ans=0.1 2024-09-19 00:01:22,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.296e+02 2.414e+02 2.581e+02 3.367e+02, threshold=4.828e+02, percent-clipped=0.0 2024-09-19 00:01:23,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837023.3333333334, ans=0.1 2024-09-19 00:01:39,745 INFO [train.py:1198] (1/2) Epoch 47, batch 1450, loss[loss=0.2008, ctc_loss=0.1317, cr_loss=0.3451, over 20801.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3673, over 4074895.07 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:01:42,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-09-19 00:01:44,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=837051.6666666666, ans=0.125 2024-09-19 00:01:44,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=837051.6666666666, ans=0.2 2024-09-19 00:02:50,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=837165.0, ans=0.125 2024-09-19 00:02:57,527 INFO [train.py:1198] (1/2) Epoch 47, batch 1500, loss[loss=0.2453, ctc_loss=0.1638, cr_loss=0.4077, over 20643.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3689, over 4074931.27 frames. ], batch size: 66, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:03:06,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-09-19 00:03:28,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837250.0, ans=0.1 2024-09-19 00:03:37,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-19 00:03:56,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.253e+02 2.430e+02 2.579e+02 3.421e+02, threshold=4.859e+02, percent-clipped=0.0 2024-09-19 00:04:13,207 INFO [train.py:1198] (1/2) Epoch 47, batch 1550, loss[loss=0.2105, ctc_loss=0.1376, cr_loss=0.3645, over 21073.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3696, over 4084137.18 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:04:16,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=837335.0, ans=0.1 2024-09-19 00:04:19,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=837335.0, ans=0.125 2024-09-19 00:04:21,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=837335.0, ans=0.125 2024-09-19 00:04:37,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=837363.3333333334, ans=0.2 2024-09-19 00:05:31,510 INFO [train.py:1198] (1/2) Epoch 47, batch 1600, loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3732, over 19692.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.37, over 4079091.56 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:05:31,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837476.6666666666, ans=0.125 2024-09-19 00:06:19,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837561.6666666666, ans=0.1 2024-09-19 00:06:30,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.295e+02 2.387e+02 2.585e+02 3.288e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 00:06:34,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837590.0, ans=0.1 2024-09-19 00:06:47,138 INFO [train.py:1198] (1/2) Epoch 47, batch 1650, loss[loss=0.2148, ctc_loss=0.1402, cr_loss=0.3731, over 20882.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.37, over 4081097.68 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:07:04,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=837646.6666666666, ans=0.125 2024-09-19 00:07:05,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837646.6666666666, ans=0.1 2024-09-19 00:07:49,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=837731.6666666666, ans=0.125 2024-09-19 00:08:00,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837731.6666666666, ans=0.125 2024-09-19 00:08:06,070 INFO [train.py:1198] (1/2) Epoch 47, batch 1700, loss[loss=0.1848, ctc_loss=0.1169, cr_loss=0.3395, over 20981.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1419, cr_loss=0.3665, over 4088570.70 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:08:14,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=837760.0, ans=0.0 2024-09-19 00:08:27,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=837788.3333333334, ans=0.0 2024-09-19 00:08:40,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837816.6666666666, ans=0.1 2024-09-19 00:09:04,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.220e+02 2.390e+02 2.523e+02 4.047e+02, threshold=4.780e+02, percent-clipped=0.0 2024-09-19 00:09:20,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-19 00:09:20,850 INFO [train.py:1198] (1/2) Epoch 47, batch 1750, loss[loss=0.249, ctc_loss=0.1671, cr_loss=0.4094, over 20686.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3667, over 4089165.89 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:09:31,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=837901.6666666666, ans=0.125 2024-09-19 00:09:39,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=837930.0, ans=0.0 2024-09-19 00:10:33,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=838015.0, ans=0.0 2024-09-19 00:10:35,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-19 00:10:38,021 INFO [train.py:1198] (1/2) Epoch 47, batch 1800, loss[loss=0.2198, ctc_loss=0.1432, cr_loss=0.383, over 20988.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3686, over 4094468.73 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:10:54,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=838071.6666666666, ans=0.0 2024-09-19 00:11:31,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=838128.3333333334, ans=0.125 2024-09-19 00:11:33,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=838128.3333333334, ans=0.125 2024-09-19 00:11:37,358 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.253e+02 2.391e+02 2.616e+02 3.539e+02, threshold=4.781e+02, percent-clipped=0.0 2024-09-19 00:11:53,847 INFO [train.py:1198] (1/2) Epoch 47, batch 1850, loss[loss=0.1752, ctc_loss=0.1124, cr_loss=0.3137, over 20982.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3692, over 4095716.19 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:12:07,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838213.3333333334, ans=0.0 2024-09-19 00:12:30,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838241.6666666666, ans=0.0 2024-09-19 00:13:09,719 INFO [train.py:1198] (1/2) Epoch 47, batch 1900, loss[loss=0.2292, ctc_loss=0.1502, cr_loss=0.3946, over 20655.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3685, over 4094952.11 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:13:11,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=838326.6666666666, ans=0.0 2024-09-19 00:13:36,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=838355.0, ans=0.125 2024-09-19 00:13:57,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=838411.6666666666, ans=0.2 2024-09-19 00:14:03,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=838411.6666666666, ans=0.125 2024-09-19 00:14:10,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.235e+02 2.407e+02 2.574e+02 4.651e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-19 00:14:15,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=838440.0, ans=0.0 2024-09-19 00:14:20,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=838440.0, ans=0.0 2024-09-19 00:14:23,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838440.0, ans=0.125 2024-09-19 00:14:27,664 INFO [train.py:1198] (1/2) Epoch 47, batch 1950, loss[loss=0.2414, ctc_loss=0.1655, cr_loss=0.3793, over 19241.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3696, over 4087944.63 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:14:33,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=838468.3333333334, ans=0.0 2024-09-19 00:14:37,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838468.3333333334, ans=0.0 2024-09-19 00:15:27,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-19 00:15:42,981 INFO [train.py:1198] (1/2) Epoch 47, batch 2000, loss[loss=0.2059, ctc_loss=0.133, cr_loss=0.3642, over 20876.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3683, over 4080885.05 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:16:07,597 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-19 00:16:23,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=838666.6666666666, ans=0.025 2024-09-19 00:16:46,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.253e+02 2.394e+02 2.559e+02 3.952e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 00:17:00,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=838751.6666666666, ans=0.025 2024-09-19 00:17:01,841 INFO [train.py:1198] (1/2) Epoch 47, batch 2050, loss[loss=0.2238, ctc_loss=0.1484, cr_loss=0.3768, over 20850.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3663, over 4087853.92 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:17:38,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-19 00:17:56,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=838836.6666666666, ans=0.0 2024-09-19 00:17:56,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-09-19 00:18:17,160 INFO [train.py:1198] (1/2) Epoch 47, batch 2100, loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3725, over 21068.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1424, cr_loss=0.3671, over 4076749.26 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:18:27,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=838893.3333333334, ans=0.125 2024-09-19 00:18:49,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=838950.0, ans=0.2 2024-09-19 00:19:20,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.293e+02 2.407e+02 2.603e+02 3.621e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 00:19:35,724 INFO [train.py:1198] (1/2) Epoch 47, batch 2150, loss[loss=0.2214, ctc_loss=0.1468, cr_loss=0.3731, over 21062.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3674, over 4081229.73 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:20:51,790 INFO [train.py:1198] (1/2) Epoch 47, batch 2200, loss[loss=0.2167, ctc_loss=0.1414, cr_loss=0.3764, over 21015.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3665, over 4087327.03 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:21:54,509 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.250e+02 2.374e+02 2.571e+02 4.795e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-19 00:22:09,676 INFO [train.py:1198] (1/2) Epoch 47, batch 2250, loss[loss=0.2408, ctc_loss=0.1595, cr_loss=0.4067, over 20278.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3654, over 4092841.27 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:22:28,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=839346.6666666666, ans=0.125 2024-09-19 00:22:42,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=839375.0, ans=0.0 2024-09-19 00:22:54,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=839403.3333333334, ans=0.0 2024-09-19 00:22:57,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839403.3333333334, ans=0.1 2024-09-19 00:23:15,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839431.6666666666, ans=0.1 2024-09-19 00:23:24,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=839460.0, ans=0.0 2024-09-19 00:23:25,288 INFO [train.py:1198] (1/2) Epoch 47, batch 2300, loss[loss=0.182, ctc_loss=0.1183, cr_loss=0.3185, over 20962.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3666, over 4096672.08 frames. ], batch size: 48, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:24:10,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=839545.0, ans=0.07 2024-09-19 00:24:25,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.258e+02 2.391e+02 2.525e+02 3.714e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-19 00:24:35,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=839573.3333333334, ans=0.04949747468305833 2024-09-19 00:24:40,178 INFO [train.py:1198] (1/2) Epoch 47, batch 2350, loss[loss=0.1782, ctc_loss=0.1164, cr_loss=0.3091, over 20982.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3654, over 4101359.35 frames. ], batch size: 48, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:24:52,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=839601.6666666666, ans=0.025 2024-09-19 00:25:04,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=839630.0, ans=0.125 2024-09-19 00:25:29,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=839686.6666666666, ans=0.125 2024-09-19 00:25:47,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=839715.0, ans=0.125 2024-09-19 00:25:49,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-19 00:25:57,793 INFO [train.py:1198] (1/2) Epoch 47, batch 2400, loss[loss=0.2307, ctc_loss=0.1517, cr_loss=0.3949, over 21067.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1412, cr_loss=0.3661, over 4100052.59 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:25:58,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=839743.3333333334, ans=0.05 2024-09-19 00:26:13,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=839771.6666666666, ans=0.125 2024-09-19 00:26:58,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.222e+02 2.397e+02 2.581e+02 3.490e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-19 00:27:05,152 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:27:12,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=839885.0, ans=0.125 2024-09-19 00:27:13,782 INFO [train.py:1198] (1/2) Epoch 47, batch 2450, loss[loss=0.252, ctc_loss=0.1677, cr_loss=0.4216, over 20973.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.3671, over 4071233.86 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:27:39,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=839913.3333333334, ans=0.0 2024-09-19 00:27:51,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=839941.6666666666, ans=0.0 2024-09-19 00:27:58,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=839941.6666666666, ans=0.125 2024-09-19 00:28:13,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=839970.0, ans=0.125 2024-09-19 00:28:32,578 INFO [train.py:1198] (1/2) Epoch 47, batch 2500, loss[loss=0.179, ctc_loss=0.1161, cr_loss=0.3144, over 20957.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1413, cr_loss=0.366, over 4088908.60 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:28:43,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=840026.6666666666, ans=10.0 2024-09-19 00:29:10,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=840083.3333333334, ans=0.125 2024-09-19 00:29:10,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=840083.3333333334, ans=0.125 2024-09-19 00:29:12,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=840083.3333333334, ans=0.0 2024-09-19 00:29:14,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-19 00:29:32,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.260e+02 2.361e+02 2.506e+02 3.338e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-19 00:29:47,906 INFO [train.py:1198] (1/2) Epoch 47, batch 2550, loss[loss=0.2146, ctc_loss=0.1407, cr_loss=0.3693, over 20985.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3664, over 4084207.25 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:29:48,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840168.3333333334, ans=0.1 2024-09-19 00:30:00,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=840168.3333333334, ans=0.0 2024-09-19 00:30:32,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-09-19 00:30:34,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=840253.3333333334, ans=0.125 2024-09-19 00:30:53,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=840281.6666666666, ans=0.125 2024-09-19 00:30:56,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=840281.6666666666, ans=0.0 2024-09-19 00:31:06,118 INFO [train.py:1198] (1/2) Epoch 47, batch 2600, loss[loss=0.2132, ctc_loss=0.1386, cr_loss=0.3729, over 20966.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 4098806.03 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:31:12,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=840310.0, ans=0.0 2024-09-19 00:31:20,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=840338.3333333334, ans=0.125 2024-09-19 00:31:23,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=840338.3333333334, ans=0.0 2024-09-19 00:31:41,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-19 00:32:02,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=840395.0, ans=0.0 2024-09-19 00:32:06,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.293e+02 2.446e+02 2.585e+02 3.438e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-19 00:32:11,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840423.3333333334, ans=0.125 2024-09-19 00:32:22,166 INFO [train.py:1198] (1/2) Epoch 47, batch 2650, loss[loss=0.2141, ctc_loss=0.1396, cr_loss=0.3728, over 20795.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3697, over 4092812.15 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:33:01,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=840508.3333333334, ans=0.125 2024-09-19 00:33:31,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=840565.0, ans=0.0 2024-09-19 00:33:40,704 INFO [train.py:1198] (1/2) Epoch 47, batch 2700, loss[loss=0.2041, ctc_loss=0.1335, cr_loss=0.353, over 20798.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.369, over 4084730.59 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:34:17,659 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:34:40,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=840706.6666666666, ans=0.0 2024-09-19 00:34:41,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.246e+02 2.332e+02 2.535e+02 3.753e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-19 00:34:49,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=840706.6666666666, ans=0.125 2024-09-19 00:34:56,464 INFO [train.py:1198] (1/2) Epoch 47, batch 2750, loss[loss=0.2326, ctc_loss=0.1543, cr_loss=0.3915, over 19867.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3689, over 4079056.31 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:35:40,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=840820.0, ans=0.04949747468305833 2024-09-19 00:36:01,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-19 00:36:10,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-19 00:36:14,602 INFO [train.py:1198] (1/2) Epoch 47, batch 2800, loss[loss=0.1741, ctc_loss=0.1097, cr_loss=0.3219, over 20228.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3688, over 4094496.95 frames. ], batch size: 45, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:36:23,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2024-09-19 00:36:27,238 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:36:48,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=840933.3333333334, ans=0.125 2024-09-19 00:37:15,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.263e+02 2.383e+02 2.536e+02 3.961e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-19 00:37:30,422 INFO [train.py:1198] (1/2) Epoch 47, batch 2850, loss[loss=0.2051, ctc_loss=0.1331, cr_loss=0.3603, over 20972.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3678, over 4104918.34 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:37:31,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-19 00:37:32,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=841018.3333333334, ans=0.0 2024-09-19 00:38:12,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=841075.0, ans=15.0 2024-09-19 00:38:44,929 INFO [train.py:1198] (1/2) Epoch 47, batch 2900, loss[loss=0.1774, ctc_loss=0.1135, cr_loss=0.3195, over 19945.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3675, over 4102125.75 frames. ], batch size: 44, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:38:48,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-19 00:38:50,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=22.5 2024-09-19 00:39:48,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.287e+02 2.434e+02 2.608e+02 4.093e+02, threshold=4.867e+02, percent-clipped=0.0 2024-09-19 00:40:03,663 INFO [train.py:1198] (1/2) Epoch 47, batch 2950, loss[loss=0.2111, ctc_loss=0.1391, cr_loss=0.3603, over 21033.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.367, over 4115268.93 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:40:11,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841301.6666666666, ans=0.1 2024-09-19 00:40:53,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=841386.6666666666, ans=0.0 2024-09-19 00:41:00,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=841386.6666666666, ans=0.125 2024-09-19 00:41:19,078 INFO [train.py:1198] (1/2) Epoch 47, batch 3000, loss[loss=0.2148, ctc_loss=0.1444, cr_loss=0.3523, over 20676.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3668, over 4109288.07 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:41:19,079 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 00:41:38,019 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.03886, ctc_loss=0.03886, cr_loss=1.566e-14, over 944034.00 frames. 2024-09-19 00:41:38,019 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 00:41:45,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=841443.3333333334, ans=0.125 2024-09-19 00:42:01,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=841471.6666666666, ans=0.125 2024-09-19 00:42:40,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841556.6666666666, ans=0.125 2024-09-19 00:42:41,488 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.289e+02 2.426e+02 2.578e+02 4.191e+02, threshold=4.852e+02, percent-clipped=0.0 2024-09-19 00:42:56,353 INFO [train.py:1198] (1/2) Epoch 47, batch 3050, loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3752, over 20775.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.368, over 4114925.08 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:43:00,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-19 00:43:23,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841613.3333333334, ans=0.1 2024-09-19 00:43:24,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-09-19 00:43:27,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=841641.6666666666, ans=0.125 2024-09-19 00:43:38,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-19 00:44:02,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-19 00:44:11,676 INFO [train.py:1198] (1/2) Epoch 47, batch 3100, loss[loss=0.1911, ctc_loss=0.1232, cr_loss=0.3397, over 21055.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3669, over 4091389.37 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:44:13,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2024-09-19 00:44:42,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=841755.0, ans=0.2 2024-09-19 00:44:53,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=841783.3333333334, ans=0.2 2024-09-19 00:45:06,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=841811.6666666666, ans=0.2 2024-09-19 00:45:13,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=841840.0, ans=0.125 2024-09-19 00:45:14,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.246e+02 2.409e+02 2.625e+02 4.109e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 00:45:30,129 INFO [train.py:1198] (1/2) Epoch 47, batch 3150, loss[loss=0.228, ctc_loss=0.1533, cr_loss=0.3735, over 19588.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3675, over 4092420.88 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:45:34,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=841868.3333333334, ans=0.04949747468305833 2024-09-19 00:45:41,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=841868.3333333334, ans=0.0 2024-09-19 00:45:42,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841868.3333333334, ans=0.1 2024-09-19 00:46:20,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=841953.3333333334, ans=0.025 2024-09-19 00:46:20,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=841953.3333333334, ans=0.125 2024-09-19 00:46:45,679 INFO [train.py:1198] (1/2) Epoch 47, batch 3200, loss[loss=0.2491, ctc_loss=0.1646, cr_loss=0.4224, over 19411.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3672, over 4096374.40 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:47:39,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=842095.0, ans=0.125 2024-09-19 00:47:49,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.237e+02 2.388e+02 2.562e+02 3.736e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 00:48:01,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2024-09-19 00:48:04,948 INFO [train.py:1198] (1/2) Epoch 47, batch 3250, loss[loss=0.2211, ctc_loss=0.1485, cr_loss=0.363, over 20977.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3678, over 4096850.56 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:48:09,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842151.6666666666, ans=0.1 2024-09-19 00:48:15,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=842151.6666666666, ans=0.0 2024-09-19 00:49:17,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=842265.0, ans=0.0 2024-09-19 00:49:20,596 INFO [train.py:1198] (1/2) Epoch 47, batch 3300, loss[loss=0.1913, ctc_loss=0.1246, cr_loss=0.3332, over 20985.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3656, over 4105459.15 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:49:25,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=842293.3333333334, ans=0.5 2024-09-19 00:49:32,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 00:49:44,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-19 00:50:24,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.249e+02 2.399e+02 2.589e+02 3.401e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 00:50:32,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842406.6666666666, ans=0.1 2024-09-19 00:50:39,710 INFO [train.py:1198] (1/2) Epoch 47, batch 3350, loss[loss=0.2311, ctc_loss=0.155, cr_loss=0.3804, over 19458.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3659, over 4115472.86 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:50:44,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=842435.0, ans=0.0 2024-09-19 00:50:56,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842463.3333333334, ans=0.1 2024-09-19 00:51:22,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=842491.6666666666, ans=0.0 2024-09-19 00:51:39,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=842548.3333333334, ans=0.04949747468305833 2024-09-19 00:51:46,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=842548.3333333334, ans=0.125 2024-09-19 00:51:55,390 INFO [train.py:1198] (1/2) Epoch 47, batch 3400, loss[loss=0.2009, ctc_loss=0.1289, cr_loss=0.3601, over 20996.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.14, cr_loss=0.3646, over 4117374.27 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:52:12,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2024-09-19 00:52:15,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842605.0, ans=0.0 2024-09-19 00:52:32,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=842633.3333333334, ans=0.125 2024-09-19 00:52:56,043 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.239e+02 2.371e+02 2.561e+02 3.572e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-19 00:53:11,275 INFO [train.py:1198] (1/2) Epoch 47, batch 3450, loss[loss=0.2308, ctc_loss=0.1511, cr_loss=0.3988, over 21064.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3655, over 4115788.63 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:53:28,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-19 00:53:44,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=842775.0, ans=0.2 2024-09-19 00:53:48,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-19 00:54:18,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=842831.6666666666, ans=0.125 2024-09-19 00:54:18,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=842831.6666666666, ans=0.2 2024-09-19 00:54:29,736 INFO [train.py:1198] (1/2) Epoch 47, batch 3500, loss[loss=0.2062, ctc_loss=0.1384, cr_loss=0.3389, over 21081.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1416, cr_loss=0.3663, over 4098041.47 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:54:31,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=842860.0, ans=0.2 2024-09-19 00:54:50,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2024-09-19 00:54:52,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=842888.3333333334, ans=0.0 2024-09-19 00:55:12,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=842916.6666666666, ans=0.125 2024-09-19 00:55:16,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842945.0, ans=0.1 2024-09-19 00:55:27,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842945.0, ans=0.0 2024-09-19 00:55:29,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.306e+02 2.433e+02 2.594e+02 5.603e+02, threshold=4.866e+02, percent-clipped=1.0 2024-09-19 00:55:34,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842973.3333333334, ans=0.1 2024-09-19 00:55:42,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=842973.3333333334, ans=0.125 2024-09-19 00:55:45,188 INFO [train.py:1198] (1/2) Epoch 47, batch 3550, loss[loss=0.1757, ctc_loss=0.1108, cr_loss=0.3245, over 20967.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1416, cr_loss=0.3662, over 4078019.12 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:56:23,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=843058.3333333334, ans=0.125 2024-09-19 00:56:26,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=843058.3333333334, ans=0.125 2024-09-19 00:57:02,745 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:57:03,822 INFO [train.py:1198] (1/2) Epoch 47, batch 3600, loss[loss=0.1991, ctc_loss=0.1274, cr_loss=0.3582, over 20867.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3664, over 4084114.97 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:57:04,228 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:57:13,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=843143.3333333334, ans=0.125 2024-09-19 00:57:28,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-19 00:57:34,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=843200.0, ans=0.125 2024-09-19 00:57:37,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843200.0, ans=0.125 2024-09-19 00:57:48,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2024-09-19 00:58:04,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.279e+02 2.386e+02 2.544e+02 3.394e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-19 00:58:08,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-19 00:58:19,349 INFO [train.py:1198] (1/2) Epoch 47, batch 3650, loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3703, over 20708.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3668, over 4085742.62 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:59:09,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=843370.0, ans=0.0 2024-09-19 00:59:38,732 INFO [train.py:1198] (1/2) Epoch 47, batch 3700, loss[loss=0.2215, ctc_loss=0.1464, cr_loss=0.3755, over 20960.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3668, over 4092744.24 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:59:59,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=843455.0, ans=0.025 2024-09-19 01:00:01,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-19 01:00:22,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=843511.6666666666, ans=0.2 2024-09-19 01:00:36,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=843511.6666666666, ans=0.05 2024-09-19 01:00:39,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.298e+02 2.401e+02 2.614e+02 4.058e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-19 01:00:52,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=843568.3333333334, ans=0.2 2024-09-19 01:00:54,020 INFO [train.py:1198] (1/2) Epoch 47, batch 3750, loss[loss=0.2399, ctc_loss=0.1597, cr_loss=0.401, over 18312.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3684, over 4096579.69 frames. ], batch size: 108, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:01:05,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=843568.3333333334, ans=0.95 2024-09-19 01:01:08,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843596.6666666666, ans=0.1 2024-09-19 01:01:13,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=843596.6666666666, ans=0.125 2024-09-19 01:01:45,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=843653.3333333334, ans=0.07 2024-09-19 01:02:03,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=843681.6666666666, ans=0.125 2024-09-19 01:02:06,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843681.6666666666, ans=0.1 2024-09-19 01:02:11,801 INFO [train.py:1198] (1/2) Epoch 47, batch 3800, loss[loss=0.1715, ctc_loss=0.1111, cr_loss=0.3022, over 20035.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3685, over 4079358.67 frames. ], batch size: 44, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:02:25,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843738.3333333334, ans=0.1 2024-09-19 01:02:31,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843738.3333333334, ans=0.1 2024-09-19 01:02:39,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843738.3333333334, ans=0.125 2024-09-19 01:02:40,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843766.6666666666, ans=0.1 2024-09-19 01:02:57,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=843795.0, ans=0.125 2024-09-19 01:03:02,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2024-09-19 01:03:12,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.250e+02 2.380e+02 2.563e+02 3.540e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-19 01:03:26,890 INFO [train.py:1198] (1/2) Epoch 47, batch 3850, loss[loss=0.2294, ctc_loss=0.1526, cr_loss=0.3841, over 20876.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3701, over 4082890.92 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:03:28,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=843851.6666666666, ans=0.0 2024-09-19 01:04:45,679 INFO [train.py:1198] (1/2) Epoch 47, batch 3900, loss[loss=0.2298, ctc_loss=0.1522, cr_loss=0.388, over 20310.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3696, over 4090366.54 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:05:32,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844078.3333333334, ans=0.1 2024-09-19 01:05:45,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.300e+02 2.390e+02 2.560e+02 4.831e+02, threshold=4.780e+02, percent-clipped=1.0 2024-09-19 01:05:46,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-19 01:06:00,691 INFO [train.py:1198] (1/2) Epoch 47, batch 3950, loss[loss=0.2263, ctc_loss=0.1527, cr_loss=0.3681, over 20993.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3684, over 4084913.26 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:06:11,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=844135.0, ans=0.0 2024-09-19 01:06:17,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=844163.3333333334, ans=0.125 2024-09-19 01:06:28,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-19 01:06:35,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=844191.6666666666, ans=0.125 2024-09-19 01:06:42,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=844191.6666666666, ans=0.125 2024-09-19 01:06:56,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-19 01:07:02,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=844248.3333333334, ans=0.0 2024-09-19 01:07:08,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844248.3333333334, ans=0.125 2024-09-19 01:07:15,609 INFO [train.py:1198] (1/2) Epoch 47, batch 4000, loss[loss=0.2195, ctc_loss=0.1408, cr_loss=0.3939, over 20764.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1435, cr_loss=0.3702, over 4092103.36 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:08:19,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.285e+02 2.372e+02 2.493e+02 3.081e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-19 01:08:31,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=844390.0, ans=0.0 2024-09-19 01:08:33,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=15.0 2024-09-19 01:08:34,378 INFO [train.py:1198] (1/2) Epoch 47, batch 4050, loss[loss=0.2199, ctc_loss=0.1421, cr_loss=0.389, over 20688.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.3689, over 4098583.34 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 64.0 2024-09-19 01:08:49,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=844446.6666666666, ans=10.0 2024-09-19 01:09:04,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=844475.0, ans=0.125 2024-09-19 01:09:49,386 INFO [train.py:1198] (1/2) Epoch 47, batch 4100, loss[loss=0.2187, ctc_loss=0.1498, cr_loss=0.3441, over 20935.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3698, over 4087495.42 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:10:00,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-19 01:10:54,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.254e+02 2.350e+02 2.481e+02 3.420e+02, threshold=4.699e+02, percent-clipped=0.0 2024-09-19 01:10:54,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=844673.3333333334, ans=0.0 2024-09-19 01:10:54,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=844673.3333333334, ans=0.125 2024-09-19 01:11:07,751 INFO [train.py:1198] (1/2) Epoch 47, batch 4150, loss[loss=0.2338, ctc_loss=0.1568, cr_loss=0.3847, over 20979.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3683, over 4098473.71 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:12:16,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-19 01:12:23,365 INFO [train.py:1198] (1/2) Epoch 47, batch 4200, loss[loss=0.2099, ctc_loss=0.1377, cr_loss=0.361, over 20793.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3687, over 4093418.06 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:12:23,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844843.3333333334, ans=0.125 2024-09-19 01:12:25,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=844843.3333333334, ans=0.0 2024-09-19 01:12:28,084 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:12:32,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=844843.3333333334, ans=0.125 2024-09-19 01:12:54,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-19 01:12:55,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=844900.0, ans=0.125 2024-09-19 01:13:28,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.299e+02 2.454e+02 2.631e+02 3.444e+02, threshold=4.907e+02, percent-clipped=0.0 2024-09-19 01:13:39,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2024-09-19 01:13:41,571 INFO [train.py:1198] (1/2) Epoch 47, batch 4250, loss[loss=0.2012, ctc_loss=0.1317, cr_loss=0.3473, over 21039.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3689, over 4102152.90 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:13:56,798 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:14:13,828 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:14:22,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=845041.6666666666, ans=0.125 2024-09-19 01:14:45,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=845098.3333333334, ans=0.125 2024-09-19 01:14:57,676 INFO [train.py:1198] (1/2) Epoch 47, batch 4300, loss[loss=0.2622, ctc_loss=0.1825, cr_loss=0.3983, over 14629.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3693, over 4088807.06 frames. ], batch size: 149, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:15:16,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-19 01:15:20,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=845155.0, ans=0.2 2024-09-19 01:15:31,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-19 01:15:52,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-19 01:16:02,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.329e+02 2.471e+02 2.593e+02 4.269e+02, threshold=4.941e+02, percent-clipped=0.0 2024-09-19 01:16:15,562 INFO [train.py:1198] (1/2) Epoch 47, batch 4350, loss[loss=0.2587, ctc_loss=0.1771, cr_loss=0.4078, over 14901.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3681, over 4077174.37 frames. ], batch size: 150, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:16:31,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=845296.6666666666, ans=0.125 2024-09-19 01:16:37,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=845296.6666666666, ans=15.0 2024-09-19 01:16:50,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=845325.0, ans=0.0 2024-09-19 01:16:52,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=845325.0, ans=0.125 2024-09-19 01:16:55,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=845325.0, ans=0.125 2024-09-19 01:17:31,458 INFO [train.py:1198] (1/2) Epoch 47, batch 4400, loss[loss=0.2205, ctc_loss=0.1456, cr_loss=0.3745, over 20955.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3695, over 4090438.70 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:17:39,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=845410.0, ans=0.0 2024-09-19 01:17:39,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=845410.0, ans=0.2 2024-09-19 01:17:50,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-09-19 01:18:33,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.273e+02 2.423e+02 2.576e+02 6.541e+02, threshold=4.846e+02, percent-clipped=2.0 2024-09-19 01:18:43,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845523.3333333334, ans=0.0 2024-09-19 01:18:47,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=845523.3333333334, ans=0.125 2024-09-19 01:18:47,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=22.5 2024-09-19 01:18:49,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=845551.6666666666, ans=0.025 2024-09-19 01:18:50,294 INFO [train.py:1198] (1/2) Epoch 47, batch 4450, loss[loss=0.2178, ctc_loss=0.1416, cr_loss=0.3809, over 21073.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3704, over 4080281.95 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:19:12,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-19 01:20:05,590 INFO [train.py:1198] (1/2) Epoch 47, batch 4500, loss[loss=0.1967, ctc_loss=0.1272, cr_loss=0.3476, over 20889.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4080586.20 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:20:09,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=845693.3333333334, ans=0.125 2024-09-19 01:20:28,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=845721.6666666666, ans=0.025 2024-09-19 01:20:40,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=845750.0, ans=0.125 2024-09-19 01:20:54,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=845778.3333333334, ans=0.125 2024-09-19 01:20:58,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=845778.3333333334, ans=0.125 2024-09-19 01:21:07,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.280e+02 2.435e+02 2.608e+02 3.118e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 01:21:21,236 INFO [train.py:1198] (1/2) Epoch 47, batch 4550, loss[loss=0.2303, ctc_loss=0.154, cr_loss=0.3816, over 20052.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3684, over 4087203.59 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:21:36,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=845835.0, ans=0.2 2024-09-19 01:22:32,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-19 01:22:39,676 INFO [train.py:1198] (1/2) Epoch 47, batch 4600, loss[loss=0.185, ctc_loss=0.1225, cr_loss=0.3124, over 21067.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.369, over 4092700.02 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:23:14,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=846033.3333333334, ans=0.0 2024-09-19 01:23:27,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=846061.6666666666, ans=0.2 2024-09-19 01:23:42,303 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.276e+02 2.409e+02 2.600e+02 4.253e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-19 01:23:42,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=846090.0, ans=0.125 2024-09-19 01:23:42,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=846090.0, ans=0.0 2024-09-19 01:23:55,991 INFO [train.py:1198] (1/2) Epoch 47, batch 4650, loss[loss=0.1815, ctc_loss=0.1182, cr_loss=0.3168, over 20871.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3687, over 4092420.72 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:24:32,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=846175.0, ans=0.0 2024-09-19 01:24:44,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=846203.3333333334, ans=0.125 2024-09-19 01:24:44,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846203.3333333334, ans=0.1 2024-09-19 01:24:46,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=846203.3333333334, ans=0.0 2024-09-19 01:24:53,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846203.3333333334, ans=0.125 2024-09-19 01:24:54,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=846203.3333333334, ans=0.0 2024-09-19 01:25:05,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=846231.6666666666, ans=0.2 2024-09-19 01:25:14,002 INFO [train.py:1198] (1/2) Epoch 47, batch 4700, loss[loss=0.2531, ctc_loss=0.1682, cr_loss=0.4246, over 20712.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4089125.49 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:25:15,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=846260.0, ans=0.125 2024-09-19 01:26:16,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.278e+02 2.423e+02 2.538e+02 3.189e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-19 01:26:30,221 INFO [train.py:1198] (1/2) Epoch 47, batch 4750, loss[loss=0.2097, ctc_loss=0.1378, cr_loss=0.3593, over 20782.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3687, over 4091784.73 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:26:30,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=846401.6666666666, ans=0.125 2024-09-19 01:26:38,468 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-19 01:26:38,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-19 01:26:42,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846401.6666666666, ans=0.1 2024-09-19 01:27:28,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-19 01:27:42,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846515.0, ans=0.125 2024-09-19 01:27:48,515 INFO [train.py:1198] (1/2) Epoch 47, batch 4800, loss[loss=0.2443, ctc_loss=0.1616, cr_loss=0.4136, over 19445.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3671, over 4106074.59 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:27:59,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=846543.3333333334, ans=0.125 2024-09-19 01:28:05,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=846571.6666666666, ans=0.0 2024-09-19 01:28:27,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=846600.0, ans=0.125 2024-09-19 01:28:50,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.264e+02 2.373e+02 2.525e+02 3.424e+02, threshold=4.745e+02, percent-clipped=0.0 2024-09-19 01:29:01,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=846656.6666666666, ans=0.125 2024-09-19 01:29:04,102 INFO [train.py:1198] (1/2) Epoch 47, batch 4850, loss[loss=0.2257, ctc_loss=0.1488, cr_loss=0.3846, over 20119.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3674, over 4111274.63 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:29:21,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=846713.3333333334, ans=0.0 2024-09-19 01:29:36,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846741.6666666666, ans=0.1 2024-09-19 01:29:47,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=846741.6666666666, ans=0.125 2024-09-19 01:29:51,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846770.0, ans=0.1 2024-09-19 01:30:22,376 INFO [train.py:1198] (1/2) Epoch 47, batch 4900, loss[loss=0.1939, ctc_loss=0.127, cr_loss=0.3347, over 20869.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.3675, over 4097578.42 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:30:32,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=846826.6666666666, ans=0.0 2024-09-19 01:30:57,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=846883.3333333334, ans=0.0 2024-09-19 01:30:58,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=846883.3333333334, ans=0.0 2024-09-19 01:31:10,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=846911.6666666666, ans=0.5 2024-09-19 01:31:23,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.275e+02 2.390e+02 2.530e+02 4.993e+02, threshold=4.779e+02, percent-clipped=1.0 2024-09-19 01:31:36,833 INFO [train.py:1198] (1/2) Epoch 47, batch 4950, loss[loss=0.2221, ctc_loss=0.1486, cr_loss=0.3677, over 20991.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3678, over 4097196.36 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:31:57,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=846996.6666666666, ans=0.125 2024-09-19 01:32:11,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=847025.0, ans=0.125 2024-09-19 01:32:39,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=847081.6666666666, ans=0.04949747468305833 2024-09-19 01:32:50,710 INFO [train.py:1198] (1/2) Epoch 47, batch 5000, loss[loss=0.1946, ctc_loss=0.127, cr_loss=0.3378, over 20769.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3682, over 4098091.28 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:32:58,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-09-19 01:33:09,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847138.3333333334, ans=0.1 2024-09-19 01:33:27,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847166.6666666666, ans=0.1 2024-09-19 01:33:51,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.274e+02 2.407e+02 2.593e+02 3.251e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 01:34:04,902 INFO [train.py:1198] (1/2) Epoch 47, batch 5050, loss[loss=0.2341, ctc_loss=0.1569, cr_loss=0.3858, over 20263.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3682, over 4098069.76 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:34:20,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-19 01:34:45,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-19 01:35:03,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=847336.6666666666, ans=0.125 2024-09-19 01:35:21,092 INFO [train.py:1198] (1/2) Epoch 47, batch 5100, loss[loss=0.1771, ctc_loss=0.114, cr_loss=0.3157, over 20221.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3677, over 4100069.23 frames. ], batch size: 45, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:35:39,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=847421.6666666666, ans=0.0 2024-09-19 01:36:21,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.293e+02 2.411e+02 2.627e+02 4.162e+02, threshold=4.822e+02, percent-clipped=0.0 2024-09-19 01:36:35,180 INFO [train.py:1198] (1/2) Epoch 47, batch 5150, loss[loss=0.194, ctc_loss=0.1243, cr_loss=0.3489, over 20991.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1422, cr_loss=0.3666, over 4097699.54 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:37:04,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847591.6666666666, ans=0.1 2024-09-19 01:37:05,900 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.600e-03 2024-09-19 01:37:49,773 INFO [train.py:1198] (1/2) Epoch 47, batch 5200, loss[loss=0.2238, ctc_loss=0.144, cr_loss=0.399, over 20868.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3671, over 4069500.57 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:37:53,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=847676.6666666666, ans=10.0 2024-09-19 01:38:09,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847705.0, ans=0.1 2024-09-19 01:38:19,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-19 01:38:34,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=847761.6666666666, ans=0.125 2024-09-19 01:38:44,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=847761.6666666666, ans=0.0 2024-09-19 01:38:51,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.265e+02 2.390e+02 2.554e+02 3.857e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-19 01:38:59,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-09-19 01:39:04,378 INFO [train.py:1198] (1/2) Epoch 47, batch 5250, loss[loss=0.2366, ctc_loss=0.1555, cr_loss=0.4055, over 20984.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.368, over 4083289.09 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:39:34,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=847875.0, ans=0.2 2024-09-19 01:39:44,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=847875.0, ans=0.2 2024-09-19 01:40:18,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=847931.6666666666, ans=0.125 2024-09-19 01:40:20,750 INFO [train.py:1198] (1/2) Epoch 47, batch 5300, loss[loss=0.2098, ctc_loss=0.1366, cr_loss=0.3658, over 20771.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3683, over 4092113.41 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:40:37,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=847988.3333333334, ans=0.025 2024-09-19 01:40:57,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.02 vs. limit=6.0 2024-09-19 01:41:10,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=848045.0, ans=0.2 2024-09-19 01:41:13,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=848045.0, ans=0.95 2024-09-19 01:41:22,478 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.281e+02 2.400e+02 2.592e+02 4.852e+02, threshold=4.800e+02, percent-clipped=1.0 2024-09-19 01:41:36,063 INFO [train.py:1198] (1/2) Epoch 47, batch 5350, loss[loss=0.1941, ctc_loss=0.1254, cr_loss=0.3436, over 21053.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.3692, over 4096180.94 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:41:56,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=848130.0, ans=0.0 2024-09-19 01:42:41,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=848215.0, ans=0.125 2024-09-19 01:42:49,843 INFO [train.py:1198] (1/2) Epoch 47, batch 5400, loss[loss=0.2118, ctc_loss=0.139, cr_loss=0.3638, over 21052.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.369, over 4107981.58 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:42:54,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=848243.3333333334, ans=0.2 2024-09-19 01:42:59,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=848243.3333333334, ans=0.07 2024-09-19 01:43:06,748 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:43:28,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-19 01:43:29,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=848300.0, ans=0.125 2024-09-19 01:43:38,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-19 01:43:50,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.247e+02 2.380e+02 2.538e+02 4.427e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-19 01:43:54,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=848356.6666666666, ans=0.025 2024-09-19 01:44:04,424 INFO [train.py:1198] (1/2) Epoch 47, batch 5450, loss[loss=0.1782, ctc_loss=0.1133, cr_loss=0.3243, over 20941.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3685, over 4089229.44 frames. ], batch size: 49, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:44:06,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=848385.0, ans=0.2 2024-09-19 01:44:28,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=848413.3333333334, ans=0.125 2024-09-19 01:45:16,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848498.3333333334, ans=0.1 2024-09-19 01:45:18,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2024-09-19 01:45:20,914 INFO [train.py:1198] (1/2) Epoch 47, batch 5500, loss[loss=0.2418, ctc_loss=0.1572, cr_loss=0.4232, over 20661.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3693, over 4104059.77 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:46:23,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.326e+02 2.476e+02 2.674e+02 5.105e+02, threshold=4.951e+02, percent-clipped=2.0 2024-09-19 01:46:35,185 INFO [train.py:1198] (1/2) Epoch 47, batch 5550, loss[loss=0.2207, ctc_loss=0.1457, cr_loss=0.3753, over 20952.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3689, over 4103515.87 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:46:54,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=848696.6666666666, ans=0.125 2024-09-19 01:46:59,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=848696.6666666666, ans=0.0 2024-09-19 01:47:08,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=848725.0, ans=0.125 2024-09-19 01:47:25,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=848753.3333333334, ans=0.125 2024-09-19 01:47:42,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=848781.6666666666, ans=0.125 2024-09-19 01:47:49,454 INFO [train.py:1198] (1/2) Epoch 47, batch 5600, loss[loss=0.2633, ctc_loss=0.1766, cr_loss=0.4336, over 19560.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1428, cr_loss=0.3698, over 4095056.51 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:47:51,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=848810.0, ans=0.05 2024-09-19 01:48:07,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=848838.3333333334, ans=0.125 2024-09-19 01:48:17,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=848866.6666666666, ans=0.0 2024-09-19 01:48:51,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.281e+02 2.404e+02 2.553e+02 6.736e+02, threshold=4.807e+02, percent-clipped=1.0 2024-09-19 01:48:56,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=848923.3333333334, ans=0.0 2024-09-19 01:49:05,836 INFO [train.py:1198] (1/2) Epoch 47, batch 5650, loss[loss=0.1841, ctc_loss=0.1176, cr_loss=0.3323, over 20943.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3679, over 4098149.63 frames. ], batch size: 49, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:50:06,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=849065.0, ans=0.2 2024-09-19 01:50:19,628 INFO [train.py:1198] (1/2) Epoch 47, batch 5700, loss[loss=0.1681, ctc_loss=0.1067, cr_loss=0.3067, over 20955.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3673, over 4102657.77 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:50:30,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=849093.3333333334, ans=0.125 2024-09-19 01:50:59,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=849150.0, ans=0.0 2024-09-19 01:51:14,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=849178.3333333334, ans=0.025 2024-09-19 01:51:21,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.257e+02 2.398e+02 2.543e+02 4.347e+02, threshold=4.797e+02, percent-clipped=0.0 2024-09-19 01:51:29,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=849206.6666666666, ans=0.0 2024-09-19 01:51:33,123 INFO [train.py:1198] (1/2) Epoch 47, batch 5750, loss[loss=0.2388, ctc_loss=0.1578, cr_loss=0.4053, over 20963.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3676, over 4104115.04 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:51:49,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=849263.3333333334, ans=0.025 2024-09-19 01:52:06,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-19 01:52:19,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=849320.0, ans=0.125 2024-09-19 01:52:40,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-09-19 01:52:47,314 INFO [train.py:1198] (1/2) Epoch 47, batch 5800, loss[loss=0.2627, ctc_loss=0.1793, cr_loss=0.4167, over 18219.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3678, over 4099452.44 frames. ], batch size: 108, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:52:57,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=849376.6666666666, ans=0.0 2024-09-19 01:53:20,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849433.3333333334, ans=0.1 2024-09-19 01:53:53,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.270e+02 2.397e+02 2.536e+02 3.007e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-19 01:53:53,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=849490.0, ans=0.0 2024-09-19 01:54:03,704 INFO [train.py:1198] (1/2) Epoch 47, batch 5850, loss[loss=0.1797, ctc_loss=0.1193, cr_loss=0.302, over 19880.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3672, over 4094249.62 frames. ], batch size: 44, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:54:06,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=8.0 2024-09-19 01:54:20,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=849546.6666666666, ans=0.09899494936611666 2024-09-19 01:54:36,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=849575.0, ans=0.125 2024-09-19 01:54:47,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=22.5 2024-09-19 01:54:57,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849603.3333333334, ans=0.1 2024-09-19 01:55:18,630 INFO [train.py:1198] (1/2) Epoch 47, batch 5900, loss[loss=0.2324, ctc_loss=0.1554, cr_loss=0.3846, over 20980.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3663, over 4093515.55 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:55:20,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849660.0, ans=0.1 2024-09-19 01:56:01,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-19 01:56:06,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849745.0, ans=0.125 2024-09-19 01:56:09,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=849745.0, ans=0.0 2024-09-19 01:56:17,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=849773.3333333334, ans=0.0 2024-09-19 01:56:22,867 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.249e+02 2.392e+02 2.565e+02 5.866e+02, threshold=4.784e+02, percent-clipped=1.0 2024-09-19 01:56:33,247 INFO [train.py:1198] (1/2) Epoch 47, batch 5950, loss[loss=0.2162, ctc_loss=0.1397, cr_loss=0.3827, over 20626.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3671, over 4102468.73 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:56:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=849830.0, ans=0.125 2024-09-19 01:57:17,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849886.6666666666, ans=0.125 2024-09-19 01:57:48,922 INFO [train.py:1198] (1/2) Epoch 47, batch 6000, loss[loss=0.2352, ctc_loss=0.1556, cr_loss=0.398, over 20717.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3677, over 4092726.11 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:57:48,923 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 01:58:08,164 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.03884, ctc_loss=0.03884, cr_loss=1.564e-14, over 944034.00 frames. 2024-09-19 01:58:08,165 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 01:58:14,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=849943.3333333334, ans=0.0 2024-09-19 01:59:12,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.238e+02 2.392e+02 2.505e+02 3.423e+02, threshold=4.784e+02, percent-clipped=0.0 2024-09-19 01:59:18,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=850056.6666666666, ans=0.125 2024-09-19 01:59:23,725 INFO [train.py:1198] (1/2) Epoch 47, batch 6050, loss[loss=0.2008, ctc_loss=0.1342, cr_loss=0.3334, over 20901.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3682, over 4093479.16 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:59:37,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=850113.3333333334, ans=0.0 2024-09-19 01:59:39,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-19 02:00:08,155 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:00:37,812 INFO [train.py:1198] (1/2) Epoch 47, batch 6100, loss[loss=0.1815, ctc_loss=0.1162, cr_loss=0.3266, over 20996.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.368, over 4087254.14 frames. ], batch size: 48, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:01:28,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=850311.6666666666, ans=0.0 2024-09-19 02:01:40,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-19 02:01:42,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.282e+02 2.395e+02 2.596e+02 4.212e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 02:01:53,510 INFO [train.py:1198] (1/2) Epoch 47, batch 6150, loss[loss=0.2147, ctc_loss=0.1427, cr_loss=0.3601, over 20967.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1408, cr_loss=0.3653, over 4091591.49 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:02:52,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-19 02:02:53,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850481.6666666666, ans=0.125 2024-09-19 02:03:08,101 INFO [train.py:1198] (1/2) Epoch 47, batch 6200, loss[loss=0.2537, ctc_loss=0.1725, cr_loss=0.4061, over 18016.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3666, over 4055807.94 frames. ], batch size: 108, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:04:02,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850595.0, ans=0.1 2024-09-19 02:04:12,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.267e+02 2.460e+02 2.715e+02 5.797e+02, threshold=4.919e+02, percent-clipped=2.0 2024-09-19 02:04:22,867 INFO [train.py:1198] (1/2) Epoch 47, batch 6250, loss[loss=0.1918, ctc_loss=0.1236, cr_loss=0.3408, over 20328.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3676, over 4028601.30 frames. ], batch size: 45, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:04:23,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850651.6666666666, ans=0.125 2024-09-19 02:04:23,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850651.6666666666, ans=0.1 2024-09-19 02:04:36,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=850680.0, ans=0.0 2024-09-19 02:04:36,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=850680.0, ans=0.125 2024-09-19 02:05:37,903 INFO [train.py:1198] (1/2) Epoch 47, batch 6300, loss[loss=0.2488, ctc_loss=0.1707, cr_loss=0.391, over 14299.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3674, over 4000057.89 frames. ], batch size: 149, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:06:36,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=850906.6666666666, ans=0.125 2024-09-19 02:06:40,709 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.351e+02 2.619e+02 2.877e+02 4.452e+02, threshold=5.237e+02, percent-clipped=0.0 2024-09-19 02:06:48,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=850906.6666666666, ans=0.125 2024-09-19 02:06:50,787 INFO [train.py:1198] (1/2) Epoch 47, batch 6350, loss[loss=0.2452, ctc_loss=0.1698, cr_loss=0.377, over 14207.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1458, cr_loss=0.3691, over 3858424.17 frames. ], batch size: 149, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:06:56,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=850935.0, ans=0.2 2024-09-19 02:07:09,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=850963.3333333334, ans=0.125 2024-09-19 02:07:40,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-09-19 02:08:37,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851051.1666666666, ans=0.125 2024-09-19 02:08:38,365 INFO [train.py:1198] (1/2) Epoch 48, batch 0, loss[loss=0.2369, ctc_loss=0.1601, cr_loss=0.3838, over 19613.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1601, cr_loss=0.3838, over 19613.00 frames. ], batch size: 90, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:08:38,366 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 02:08:56,381 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.03874, ctc_loss=0.03874, cr_loss=1.596e-14, over 944034.00 frames. 2024-09-19 02:08:56,382 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 02:09:11,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=851079.5, ans=0.0 2024-09-19 02:09:22,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=851079.5, ans=0.1 2024-09-19 02:09:28,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-09-19 02:09:40,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=851136.1666666666, ans=0.125 2024-09-19 02:10:12,171 INFO [train.py:1198] (1/2) Epoch 48, batch 50, loss[loss=0.2189, ctc_loss=0.1432, cr_loss=0.3786, over 20751.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.14, cr_loss=0.3653, over 936939.38 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:10:12,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=851192.8333333334, ans=0.125 2024-09-19 02:10:16,835 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.313e+02 2.640e+02 2.936e+02 3.540e+02, threshold=5.280e+02, percent-clipped=0.0 2024-09-19 02:10:23,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=851192.8333333334, ans=0.125 2024-09-19 02:10:47,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=851249.5, ans=0.125 2024-09-19 02:11:09,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=851277.8333333334, ans=0.125 2024-09-19 02:11:27,137 INFO [train.py:1198] (1/2) Epoch 48, batch 100, loss[loss=0.1849, ctc_loss=0.1215, cr_loss=0.3169, over 20952.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3654, over 1635240.15 frames. ], batch size: 51, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:11:42,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.21 vs. limit=6.0 2024-09-19 02:11:50,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=851362.8333333334, ans=0.95 2024-09-19 02:11:53,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-09-19 02:12:18,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851419.5, ans=0.0 2024-09-19 02:12:24,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=851419.5, ans=0.125 2024-09-19 02:12:42,242 INFO [train.py:1198] (1/2) Epoch 48, batch 150, loss[loss=0.2546, ctc_loss=0.1695, cr_loss=0.4251, over 20869.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3668, over 2168682.21 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:12:48,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.249e+02 2.369e+02 2.580e+02 1.034e+03, threshold=4.738e+02, percent-clipped=1.0 2024-09-19 02:12:56,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=851504.5, ans=0.125 2024-09-19 02:13:29,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851561.1666666666, ans=0.1 2024-09-19 02:14:03,671 INFO [train.py:1198] (1/2) Epoch 48, batch 200, loss[loss=0.2235, ctc_loss=0.1483, cr_loss=0.376, over 20832.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3661, over 2601773.66 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:14:13,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=851617.8333333334, ans=0.2 2024-09-19 02:14:30,132 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:14:30,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=851646.1666666666, ans=10.0 2024-09-19 02:15:05,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=8.0 2024-09-19 02:15:19,757 INFO [train.py:1198] (1/2) Epoch 48, batch 250, loss[loss=0.241, ctc_loss=0.1606, cr_loss=0.4019, over 20981.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1411, cr_loss=0.3656, over 2932341.38 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:15:21,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-19 02:15:25,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.256e+02 2.366e+02 2.529e+02 4.135e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-19 02:15:33,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851787.8333333334, ans=0.1 2024-09-19 02:15:38,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851787.8333333334, ans=0.0 2024-09-19 02:15:55,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851816.1666666666, ans=0.1 2024-09-19 02:16:09,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2024-09-19 02:16:20,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=851872.8333333334, ans=0.0 2024-09-19 02:16:35,308 INFO [train.py:1198] (1/2) Epoch 48, batch 300, loss[loss=0.2694, ctc_loss=0.1807, cr_loss=0.4433, over 18457.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3674, over 3185022.08 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:16:35,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851901.1666666666, ans=0.1 2024-09-19 02:16:37,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851901.1666666666, ans=0.125 2024-09-19 02:16:43,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-19 02:16:49,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=851929.5, ans=0.125 2024-09-19 02:16:49,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851929.5, ans=0.1 2024-09-19 02:17:01,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851929.5, ans=0.125 2024-09-19 02:17:06,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-19 02:17:50,602 INFO [train.py:1198] (1/2) Epoch 48, batch 350, loss[loss=0.2036, ctc_loss=0.1306, cr_loss=0.3649, over 20968.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3671, over 3389785.24 frames. ], batch size: 48, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:17:56,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.240e+02 2.381e+02 2.503e+02 4.082e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-19 02:18:38,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-19 02:19:09,099 INFO [train.py:1198] (1/2) Epoch 48, batch 400, loss[loss=0.1899, ctc_loss=0.1247, cr_loss=0.3258, over 20979.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3686, over 3542675.25 frames. ], batch size: 48, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:19:16,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=852184.5, ans=0.125 2024-09-19 02:19:30,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=852212.8333333334, ans=0.0 2024-09-19 02:19:32,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=852212.8333333334, ans=0.125 2024-09-19 02:19:50,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=852241.1666666666, ans=0.125 2024-09-19 02:20:01,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-09-19 02:20:05,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852269.5, ans=0.1 2024-09-19 02:20:21,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=852297.8333333334, ans=0.125 2024-09-19 02:20:27,572 INFO [train.py:1198] (1/2) Epoch 48, batch 450, loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3653, over 20230.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.3691, over 3656581.14 frames. ], batch size: 74, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:20:33,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.278e+02 2.381e+02 2.515e+02 3.550e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-19 02:20:48,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=852354.5, ans=0.0 2024-09-19 02:21:04,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=852382.8333333334, ans=0.125 2024-09-19 02:21:11,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=852411.1666666666, ans=0.2 2024-09-19 02:21:22,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=852411.1666666666, ans=0.125 2024-09-19 02:21:25,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=852411.1666666666, ans=12.0 2024-09-19 02:21:40,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=852439.5, ans=0.125 2024-09-19 02:21:43,049 INFO [train.py:1198] (1/2) Epoch 48, batch 500, loss[loss=0.1796, ctc_loss=0.116, cr_loss=0.3179, over 20996.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3674, over 3756736.49 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:21:59,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=852496.1666666666, ans=0.04949747468305833 2024-09-19 02:22:37,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=852552.8333333334, ans=0.0 2024-09-19 02:22:52,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852581.1666666666, ans=0.0 2024-09-19 02:22:58,198 INFO [train.py:1198] (1/2) Epoch 48, batch 550, loss[loss=0.2727, ctc_loss=0.1851, cr_loss=0.4378, over 18273.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3686, over 3836170.89 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:23:04,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.207e+02 2.361e+02 2.535e+02 5.293e+02, threshold=4.723e+02, percent-clipped=1.0 2024-09-19 02:23:17,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=852637.8333333334, ans=0.125 2024-09-19 02:23:42,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-19 02:24:12,632 INFO [train.py:1198] (1/2) Epoch 48, batch 600, loss[loss=0.2086, ctc_loss=0.136, cr_loss=0.3631, over 20869.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3686, over 3896661.93 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:24:20,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=852751.1666666666, ans=0.07 2024-09-19 02:24:25,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2024-09-19 02:24:55,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=852807.8333333334, ans=0.04949747468305833 2024-09-19 02:25:02,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=852836.1666666666, ans=0.025 2024-09-19 02:25:06,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=852836.1666666666, ans=0.0 2024-09-19 02:25:31,633 INFO [train.py:1198] (1/2) Epoch 48, batch 650, loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3701, over 21025.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1417, cr_loss=0.3688, over 3955124.30 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:25:40,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.271e+02 2.376e+02 2.518e+02 3.814e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 02:26:07,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=852949.5, ans=0.025 2024-09-19 02:26:50,930 INFO [train.py:1198] (1/2) Epoch 48, batch 700, loss[loss=0.2291, ctc_loss=0.148, cr_loss=0.4055, over 20886.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3677, over 3986985.24 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:27:50,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853147.8333333334, ans=0.1 2024-09-19 02:28:02,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=853147.8333333334, ans=0.125 2024-09-19 02:28:04,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-19 02:28:06,800 INFO [train.py:1198] (1/2) Epoch 48, batch 750, loss[loss=0.2173, ctc_loss=0.1414, cr_loss=0.3793, over 20988.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.3672, over 4020364.69 frames. ], batch size: 67, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:28:12,752 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.312e+02 2.445e+02 2.580e+02 3.410e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-19 02:28:18,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=853176.1666666666, ans=0.125 2024-09-19 02:29:21,650 INFO [train.py:1198] (1/2) Epoch 48, batch 800, loss[loss=0.223, ctc_loss=0.1456, cr_loss=0.3869, over 21023.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1413, cr_loss=0.3682, over 4044935.10 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:29:26,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=853317.8333333334, ans=0.2 2024-09-19 02:29:37,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=853346.1666666666, ans=0.125 2024-09-19 02:29:47,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853346.1666666666, ans=0.125 2024-09-19 02:29:54,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=853374.5, ans=0.0 2024-09-19 02:29:57,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853374.5, ans=0.1 2024-09-19 02:29:59,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=853374.5, ans=0.0 2024-09-19 02:30:06,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-09-19 02:30:07,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-19 02:30:39,932 INFO [train.py:1198] (1/2) Epoch 48, batch 850, loss[loss=0.2444, ctc_loss=0.1657, cr_loss=0.3933, over 20682.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1418, cr_loss=0.3691, over 4067620.40 frames. ], batch size: 68, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:30:47,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.256e+02 2.398e+02 2.561e+02 3.164e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 02:31:45,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=853572.8333333334, ans=0.125 2024-09-19 02:31:58,691 INFO [train.py:1198] (1/2) Epoch 48, batch 900, loss[loss=0.2122, ctc_loss=0.1401, cr_loss=0.3604, over 20969.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1421, cr_loss=0.3687, over 4060862.89 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:32:08,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-19 02:32:17,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=853629.5, ans=0.0 2024-09-19 02:32:27,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=853657.8333333334, ans=0.0 2024-09-19 02:32:27,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=853657.8333333334, ans=0.07 2024-09-19 02:32:28,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853657.8333333334, ans=0.125 2024-09-19 02:32:29,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=853657.8333333334, ans=0.05 2024-09-19 02:32:35,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=853657.8333333334, ans=0.0 2024-09-19 02:32:38,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=853657.8333333334, ans=0.025 2024-09-19 02:32:48,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=853686.1666666666, ans=0.025 2024-09-19 02:32:50,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853686.1666666666, ans=0.1 2024-09-19 02:33:03,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=853714.5, ans=0.0 2024-09-19 02:33:06,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=853714.5, ans=0.125 2024-09-19 02:33:14,335 INFO [train.py:1198] (1/2) Epoch 48, batch 950, loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3705, over 20830.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3685, over 4069607.74 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:33:21,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.247e+02 2.334e+02 2.480e+02 3.525e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-19 02:33:54,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-19 02:34:04,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=853827.8333333334, ans=0.125 2024-09-19 02:34:25,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-19 02:34:29,488 INFO [train.py:1198] (1/2) Epoch 48, batch 1000, loss[loss=0.2593, ctc_loss=0.1746, cr_loss=0.4237, over 18298.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.3687, over 4079193.47 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:34:29,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853884.5, ans=0.1 2024-09-19 02:34:37,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=853884.5, ans=0.125 2024-09-19 02:34:45,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-09-19 02:34:52,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=853912.8333333334, ans=0.0 2024-09-19 02:35:22,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=853969.5, ans=0.2 2024-09-19 02:35:39,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853997.8333333334, ans=0.125 2024-09-19 02:35:41,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853997.8333333334, ans=0.125 2024-09-19 02:35:45,330 INFO [train.py:1198] (1/2) Epoch 48, batch 1050, loss[loss=0.2023, ctc_loss=0.1302, cr_loss=0.3608, over 21052.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1411, cr_loss=0.3674, over 4086647.11 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:35:51,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=854026.1666666666, ans=0.0 2024-09-19 02:35:51,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854026.1666666666, ans=0.125 2024-09-19 02:35:52,813 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.278e+02 2.387e+02 2.529e+02 3.134e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-19 02:36:41,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=854111.1666666666, ans=0.05 2024-09-19 02:37:06,969 INFO [train.py:1198] (1/2) Epoch 48, batch 1100, loss[loss=0.2391, ctc_loss=0.1577, cr_loss=0.4072, over 20657.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3674, over 4085793.98 frames. ], batch size: 71, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:37:10,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854167.8333333334, ans=0.1 2024-09-19 02:37:16,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=854167.8333333334, ans=0.125 2024-09-19 02:37:36,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-09-19 02:37:38,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=854224.5, ans=0.04949747468305833 2024-09-19 02:38:12,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=854281.1666666666, ans=0.125 2024-09-19 02:38:22,304 INFO [train.py:1198] (1/2) Epoch 48, batch 1150, loss[loss=0.2244, ctc_loss=0.1502, cr_loss=0.3711, over 20835.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3691, over 4097482.87 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:38:24,347 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:38:30,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.276e+02 2.373e+02 2.575e+02 3.097e+02, threshold=4.746e+02, percent-clipped=0.0 2024-09-19 02:38:30,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-19 02:38:36,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=854337.8333333334, ans=0.04949747468305833 2024-09-19 02:39:05,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-19 02:39:36,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=854451.1666666666, ans=0.2 2024-09-19 02:39:37,994 INFO [train.py:1198] (1/2) Epoch 48, batch 1200, loss[loss=0.2013, ctc_loss=0.133, cr_loss=0.3416, over 20834.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3685, over 4092794.33 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:39:47,592 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:40:54,043 INFO [train.py:1198] (1/2) Epoch 48, batch 1250, loss[loss=0.1913, ctc_loss=0.1248, cr_loss=0.3324, over 21042.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.368, over 4093393.59 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:41:01,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.279e+02 2.417e+02 2.564e+02 4.180e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-19 02:41:12,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=854621.1666666666, ans=0.125 2024-09-19 02:41:39,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854677.8333333334, ans=0.125 2024-09-19 02:41:50,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=854677.8333333334, ans=0.125 2024-09-19 02:41:53,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854677.8333333334, ans=0.0 2024-09-19 02:42:08,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-19 02:42:12,459 INFO [train.py:1198] (1/2) Epoch 48, batch 1300, loss[loss=0.2349, ctc_loss=0.153, cr_loss=0.4095, over 20987.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3687, over 4096806.67 frames. ], batch size: 67, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:42:33,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854762.8333333334, ans=0.125 2024-09-19 02:42:42,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=854791.1666666666, ans=0.05 2024-09-19 02:43:04,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=854819.5, ans=0.125 2024-09-19 02:43:17,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=854847.8333333334, ans=0.0 2024-09-19 02:43:26,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=854847.8333333334, ans=0.2 2024-09-19 02:43:26,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=22.5 2024-09-19 02:43:30,809 INFO [train.py:1198] (1/2) Epoch 48, batch 1350, loss[loss=0.1991, ctc_loss=0.1318, cr_loss=0.3364, over 20827.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3681, over 4103617.60 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:43:38,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.258e+02 2.428e+02 2.599e+02 6.040e+02, threshold=4.857e+02, percent-clipped=1.0 2024-09-19 02:43:52,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=854904.5, ans=0.2 2024-09-19 02:43:54,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=854904.5, ans=0.2 2024-09-19 02:44:47,131 INFO [train.py:1198] (1/2) Epoch 48, batch 1400, loss[loss=0.1726, ctc_loss=0.1116, cr_loss=0.3052, over 20962.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.3679, over 4108342.69 frames. ], batch size: 49, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:45:08,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=855046.1666666666, ans=0.0 2024-09-19 02:45:32,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2024-09-19 02:45:44,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=855102.8333333334, ans=0.0 2024-09-19 02:45:46,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=855131.1666666666, ans=0.125 2024-09-19 02:45:55,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=855131.1666666666, ans=0.09899494936611666 2024-09-19 02:46:02,225 INFO [train.py:1198] (1/2) Epoch 48, batch 1450, loss[loss=0.2026, ctc_loss=0.1328, cr_loss=0.3492, over 20888.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3684, over 4112678.78 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:46:09,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=22.5 2024-09-19 02:46:09,909 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.284e+02 2.416e+02 2.562e+02 3.091e+02, threshold=4.833e+02, percent-clipped=0.0 2024-09-19 02:46:21,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-19 02:46:23,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=855187.8333333334, ans=0.0 2024-09-19 02:46:25,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=855187.8333333334, ans=0.0 2024-09-19 02:46:47,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855244.5, ans=0.125 2024-09-19 02:47:17,317 INFO [train.py:1198] (1/2) Epoch 48, batch 1500, loss[loss=0.229, ctc_loss=0.1523, cr_loss=0.3836, over 19467.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3685, over 4112351.84 frames. ], batch size: 90, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:47:23,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855301.1666666666, ans=0.0 2024-09-19 02:47:44,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=855329.5, ans=0.125 2024-09-19 02:47:52,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=855357.8333333334, ans=0.2 2024-09-19 02:48:35,888 INFO [train.py:1198] (1/2) Epoch 48, batch 1550, loss[loss=0.2285, ctc_loss=0.1503, cr_loss=0.3909, over 20961.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3673, over 4110583.09 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:48:46,139 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.222e+02 2.350e+02 2.459e+02 8.607e+02, threshold=4.700e+02, percent-clipped=1.0 2024-09-19 02:48:48,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=855442.8333333334, ans=0.125 2024-09-19 02:49:01,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=855471.1666666666, ans=0.125 2024-09-19 02:49:13,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=855499.5, ans=0.2 2024-09-19 02:49:43,942 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:49:54,377 INFO [train.py:1198] (1/2) Epoch 48, batch 1600, loss[loss=0.1924, ctc_loss=0.1244, cr_loss=0.34, over 21009.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3659, over 4113903.47 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:49:56,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-19 02:49:59,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=855584.5, ans=0.0 2024-09-19 02:50:05,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=855584.5, ans=0.95 2024-09-19 02:50:05,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=855584.5, ans=0.125 2024-09-19 02:50:06,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=855584.5, ans=0.0 2024-09-19 02:50:40,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=855669.5, ans=0.2 2024-09-19 02:50:52,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=855669.5, ans=0.0 2024-09-19 02:50:52,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2024-09-19 02:51:10,354 INFO [train.py:1198] (1/2) Epoch 48, batch 1650, loss[loss=0.2498, ctc_loss=0.1666, cr_loss=0.4158, over 21016.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3658, over 4103126.77 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:51:17,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.214e+02 2.364e+02 2.505e+02 4.946e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-19 02:51:19,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=855726.1666666666, ans=0.05 2024-09-19 02:51:45,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=855782.8333333334, ans=0.0 2024-09-19 02:52:00,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855811.1666666666, ans=0.125 2024-09-19 02:52:24,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=855867.8333333334, ans=0.2 2024-09-19 02:52:25,296 INFO [train.py:1198] (1/2) Epoch 48, batch 1700, loss[loss=0.1963, ctc_loss=0.1256, cr_loss=0.3533, over 20870.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3674, over 4097838.98 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:53:15,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=855952.8333333334, ans=0.0 2024-09-19 02:53:22,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-19 02:53:43,976 INFO [train.py:1198] (1/2) Epoch 48, batch 1750, loss[loss=0.1953, ctc_loss=0.1254, cr_loss=0.3497, over 20792.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3677, over 4110002.73 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:53:51,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.260e+02 2.406e+02 2.570e+02 4.239e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-19 02:54:06,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=856037.8333333334, ans=0.0 2024-09-19 02:54:20,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=856066.1666666666, ans=0.2 2024-09-19 02:54:24,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=856066.1666666666, ans=0.0 2024-09-19 02:55:02,787 INFO [train.py:1198] (1/2) Epoch 48, batch 1800, loss[loss=0.2549, ctc_loss=0.1757, cr_loss=0.396, over 18457.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3682, over 4101342.29 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:55:19,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=856179.5, ans=0.125 2024-09-19 02:56:14,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=856264.5, ans=0.125 2024-09-19 02:56:18,205 INFO [train.py:1198] (1/2) Epoch 48, batch 1850, loss[loss=0.2441, ctc_loss=0.158, cr_loss=0.4307, over 19938.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3684, over 4105268.71 frames. ], batch size: 80, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:56:25,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.285e+02 2.402e+02 2.576e+02 3.154e+02, threshold=4.804e+02, percent-clipped=0.0 2024-09-19 02:56:38,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856321.1666666666, ans=0.125 2024-09-19 02:56:45,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=856321.1666666666, ans=0.125 2024-09-19 02:57:27,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=856406.1666666666, ans=0.025 2024-09-19 02:57:33,424 INFO [train.py:1198] (1/2) Epoch 48, batch 1900, loss[loss=0.2063, ctc_loss=0.1342, cr_loss=0.3604, over 20703.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3678, over 4105829.71 frames. ], batch size: 71, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:58:48,641 INFO [train.py:1198] (1/2) Epoch 48, batch 1950, loss[loss=0.19, ctc_loss=0.1266, cr_loss=0.317, over 20783.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3682, over 4111357.04 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:58:59,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.283e+02 2.415e+02 2.594e+02 7.163e+02, threshold=4.830e+02, percent-clipped=2.0 2024-09-19 02:59:06,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856604.5, ans=0.1 2024-09-19 02:59:51,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-19 03:00:10,227 INFO [train.py:1198] (1/2) Epoch 48, batch 2000, loss[loss=0.2613, ctc_loss=0.1748, cr_loss=0.4327, over 20053.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3678, over 4120306.64 frames. ], batch size: 80, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:00:24,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-19 03:00:27,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856746.1666666666, ans=0.125 2024-09-19 03:00:37,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=856746.1666666666, ans=0.025 2024-09-19 03:00:42,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=856774.5, ans=0.125 2024-09-19 03:00:48,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=856774.5, ans=0.125 2024-09-19 03:00:55,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-09-19 03:01:07,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=856802.8333333334, ans=0.0 2024-09-19 03:01:08,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-19 03:01:25,374 INFO [train.py:1198] (1/2) Epoch 48, batch 2050, loss[loss=0.2067, ctc_loss=0.1371, cr_loss=0.3477, over 21014.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.368, over 4106685.10 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:01:27,391 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:01:32,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.302e+02 2.419e+02 2.649e+02 3.552e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 03:02:11,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=856944.5, ans=0.0 2024-09-19 03:02:28,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=856972.8333333334, ans=0.0 2024-09-19 03:02:41,446 INFO [train.py:1198] (1/2) Epoch 48, batch 2100, loss[loss=0.1857, ctc_loss=0.121, cr_loss=0.3239, over 20973.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3682, over 4108274.12 frames. ], batch size: 51, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:02:45,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.02 vs. limit=6.0 2024-09-19 03:02:47,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=857001.1666666666, ans=0.0 2024-09-19 03:02:52,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=857001.1666666666, ans=0.2 2024-09-19 03:03:21,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=857057.8333333334, ans=0.0 2024-09-19 03:03:21,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-09-19 03:03:41,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=857114.5, ans=0.95 2024-09-19 03:03:48,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857114.5, ans=0.1 2024-09-19 03:03:54,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=857114.5, ans=0.125 2024-09-19 03:03:54,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-19 03:03:56,892 INFO [train.py:1198] (1/2) Epoch 48, batch 2150, loss[loss=0.2119, ctc_loss=0.1386, cr_loss=0.3667, over 20778.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3679, over 4102375.44 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:04:04,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.281e+02 2.415e+02 2.543e+02 3.843e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-19 03:04:26,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-19 03:05:00,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=857256.1666666666, ans=0.2 2024-09-19 03:05:00,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=857256.1666666666, ans=0.125 2024-09-19 03:05:15,662 INFO [train.py:1198] (1/2) Epoch 48, batch 2200, loss[loss=0.2297, ctc_loss=0.1514, cr_loss=0.3913, over 20698.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.368, over 4095816.07 frames. ], batch size: 71, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:05:32,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=857312.8333333334, ans=0.125 2024-09-19 03:05:44,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-09-19 03:05:57,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-19 03:06:20,968 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:06:22,464 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:06:29,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=857397.8333333334, ans=0.125 2024-09-19 03:06:34,005 INFO [train.py:1198] (1/2) Epoch 48, batch 2250, loss[loss=0.1805, ctc_loss=0.114, cr_loss=0.3325, over 20974.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3684, over 4088964.48 frames. ], batch size: 48, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:06:41,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.294e+02 2.466e+02 2.611e+02 3.349e+02, threshold=4.932e+02, percent-clipped=0.0 2024-09-19 03:06:45,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2024-09-19 03:07:00,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=857454.5, ans=0.2 2024-09-19 03:07:08,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=857482.8333333334, ans=0.125 2024-09-19 03:07:09,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=857482.8333333334, ans=0.0 2024-09-19 03:07:36,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=857539.5, ans=0.035 2024-09-19 03:07:50,007 INFO [train.py:1198] (1/2) Epoch 48, batch 2300, loss[loss=0.2174, ctc_loss=0.1411, cr_loss=0.3815, over 21067.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3682, over 4089504.95 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:08:13,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=22.5 2024-09-19 03:08:48,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857652.8333333334, ans=0.1 2024-09-19 03:09:06,058 INFO [train.py:1198] (1/2) Epoch 48, batch 2350, loss[loss=0.233, ctc_loss=0.1561, cr_loss=0.3845, over 20960.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3691, over 4088110.09 frames. ], batch size: 64, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:09:12,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=857709.5, ans=15.0 2024-09-19 03:09:13,539 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.266e+02 2.430e+02 2.610e+02 3.250e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 03:09:20,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=66.36 vs. limit=15.0 2024-09-19 03:09:21,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=857737.8333333334, ans=0.125 2024-09-19 03:09:36,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=857766.1666666666, ans=0.125 2024-09-19 03:09:41,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=857766.1666666666, ans=0.125 2024-09-19 03:09:55,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-19 03:10:03,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:10:21,259 INFO [train.py:1198] (1/2) Epoch 48, batch 2400, loss[loss=0.2175, ctc_loss=0.1443, cr_loss=0.3658, over 20978.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3692, over 4085272.57 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:10:44,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857879.5, ans=0.1 2024-09-19 03:11:21,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=857936.1666666666, ans=0.0 2024-09-19 03:11:28,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=857964.5, ans=15.0 2024-09-19 03:11:30,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=857964.5, ans=0.125 2024-09-19 03:11:42,564 INFO [train.py:1198] (1/2) Epoch 48, batch 2450, loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3415, over 19839.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3685, over 4079471.02 frames. ], batch size: 44, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:11:50,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.253e+02 2.414e+02 2.569e+02 3.182e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-19 03:12:24,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=858049.5, ans=0.0 2024-09-19 03:12:57,448 INFO [train.py:1198] (1/2) Epoch 48, batch 2500, loss[loss=0.1947, ctc_loss=0.1251, cr_loss=0.3477, over 21028.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3689, over 4095831.75 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:13:26,523 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:13:53,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-19 03:14:12,599 INFO [train.py:1198] (1/2) Epoch 48, batch 2550, loss[loss=0.2003, ctc_loss=0.1295, cr_loss=0.3541, over 20793.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3687, over 4095665.75 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:14:20,033 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.289e+02 2.379e+02 2.553e+02 3.056e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-19 03:14:57,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-19 03:15:07,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=858361.1666666666, ans=0.09899494936611666 2024-09-19 03:15:23,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=858389.5, ans=0.0 2024-09-19 03:15:27,910 INFO [train.py:1198] (1/2) Epoch 48, batch 2600, loss[loss=0.256, ctc_loss=0.172, cr_loss=0.4202, over 18395.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3689, over 4096119.04 frames. ], batch size: 108, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:15:50,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=858446.1666666666, ans=0.125 2024-09-19 03:16:46,233 INFO [train.py:1198] (1/2) Epoch 48, batch 2650, loss[loss=0.1847, ctc_loss=0.119, cr_loss=0.3285, over 20969.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3682, over 4095088.96 frames. ], batch size: 51, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:16:53,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.268e+02 2.414e+02 2.540e+02 6.222e+02, threshold=4.829e+02, percent-clipped=1.0 2024-09-19 03:17:15,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858587.8333333334, ans=0.1 2024-09-19 03:18:04,335 INFO [train.py:1198] (1/2) Epoch 48, batch 2700, loss[loss=0.2213, ctc_loss=0.1463, cr_loss=0.3751, over 20697.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.3689, over 4100357.41 frames. ], batch size: 71, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:18:33,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=858757.8333333334, ans=0.0 2024-09-19 03:19:19,552 INFO [train.py:1198] (1/2) Epoch 48, batch 2750, loss[loss=0.2317, ctc_loss=0.1498, cr_loss=0.4095, over 20836.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3689, over 4100634.87 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:19:27,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.314e+02 2.423e+02 2.576e+02 5.267e+02, threshold=4.845e+02, percent-clipped=1.0 2024-09-19 03:20:18,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858956.1666666666, ans=0.125 2024-09-19 03:20:33,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=858984.5, ans=0.2 2024-09-19 03:20:34,133 INFO [train.py:1198] (1/2) Epoch 48, batch 2800, loss[loss=0.1756, ctc_loss=0.1146, cr_loss=0.3051, over 20269.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1427, cr_loss=0.3693, over 4093808.20 frames. ], batch size: 45, lr: 1.71e-03, grad_scale: 64.0 2024-09-19 03:21:05,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-19 03:21:17,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-19 03:21:27,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=859069.5, ans=0.05 2024-09-19 03:21:50,213 INFO [train.py:1198] (1/2) Epoch 48, batch 2850, loss[loss=0.219, ctc_loss=0.1432, cr_loss=0.379, over 20657.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3681, over 4108772.17 frames. ], batch size: 71, lr: 1.71e-03, grad_scale: 64.0 2024-09-19 03:22:00,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.254e+02 2.381e+02 2.510e+02 2.916e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-19 03:22:08,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859154.5, ans=0.125 2024-09-19 03:22:10,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=859154.5, ans=0.025 2024-09-19 03:23:11,359 INFO [train.py:1198] (1/2) Epoch 48, batch 2900, loss[loss=0.1948, ctc_loss=0.1277, cr_loss=0.3352, over 21056.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.368, over 4108577.61 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:23:32,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=859296.1666666666, ans=0.125 2024-09-19 03:23:59,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=859352.8333333334, ans=0.04949747468305833 2024-09-19 03:24:26,472 INFO [train.py:1198] (1/2) Epoch 48, batch 2950, loss[loss=0.1939, ctc_loss=0.1265, cr_loss=0.337, over 19832.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3671, over 4112069.76 frames. ], batch size: 44, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:24:35,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.306e+02 2.439e+02 2.642e+02 5.256e+02, threshold=4.877e+02, percent-clipped=1.0 2024-09-19 03:25:42,282 INFO [train.py:1198] (1/2) Epoch 48, batch 3000, loss[loss=0.2275, ctc_loss=0.1505, cr_loss=0.385, over 20843.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1417, cr_loss=0.3678, over 4101777.97 frames. ], batch size: 65, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:25:42,283 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 03:26:00,293 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.03871, ctc_loss=0.03871, cr_loss=1.57e-14, over 944034.00 frames. 2024-09-19 03:26:00,293 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 03:26:20,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859579.5, ans=0.125 2024-09-19 03:26:41,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=859607.8333333334, ans=0.025 2024-09-19 03:26:58,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=859636.1666666666, ans=0.0 2024-09-19 03:27:00,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-19 03:27:16,189 INFO [train.py:1198] (1/2) Epoch 48, batch 3050, loss[loss=0.2375, ctc_loss=0.1565, cr_loss=0.4051, over 20951.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1423, cr_loss=0.3697, over 4107608.75 frames. ], batch size: 64, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:27:28,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2024-09-19 03:27:29,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.256e+02 2.379e+02 2.592e+02 3.671e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-19 03:27:33,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-19 03:27:53,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=859749.5, ans=0.95 2024-09-19 03:27:56,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-09-19 03:28:23,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=859806.1666666666, ans=0.95 2024-09-19 03:28:38,148 INFO [train.py:1198] (1/2) Epoch 48, batch 3100, loss[loss=0.1948, ctc_loss=0.128, cr_loss=0.334, over 20957.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3696, over 4117954.11 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:29:53,591 INFO [train.py:1198] (1/2) Epoch 48, batch 3150, loss[loss=0.1687, ctc_loss=0.1091, cr_loss=0.2981, over 21014.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1425, cr_loss=0.3694, over 4106107.04 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:30:02,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.276e+02 2.418e+02 2.611e+02 6.707e+02, threshold=4.835e+02, percent-clipped=1.0 2024-09-19 03:30:37,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=860061.1666666666, ans=0.125 2024-09-19 03:30:53,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860089.5, ans=0.1 2024-09-19 03:31:09,025 INFO [train.py:1198] (1/2) Epoch 48, batch 3200, loss[loss=0.2175, ctc_loss=0.1447, cr_loss=0.3636, over 20669.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3695, over 4098763.38 frames. ], batch size: 68, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:32:25,135 INFO [train.py:1198] (1/2) Epoch 48, batch 3250, loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3463, over 21050.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1426, cr_loss=0.3692, over 4096182.21 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:32:34,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.309e+02 2.430e+02 2.595e+02 3.227e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 03:32:36,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=860259.5, ans=0.0 2024-09-19 03:32:52,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=860287.8333333334, ans=0.95 2024-09-19 03:32:55,988 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:33:25,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=22.5 2024-09-19 03:33:44,295 INFO [train.py:1198] (1/2) Epoch 48, batch 3300, loss[loss=0.1843, ctc_loss=0.1192, cr_loss=0.3256, over 20985.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3694, over 4105413.06 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:33:47,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=860401.1666666666, ans=0.125 2024-09-19 03:33:56,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=860401.1666666666, ans=0.0 2024-09-19 03:34:03,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=860429.5, ans=0.125 2024-09-19 03:34:10,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860429.5, ans=0.1 2024-09-19 03:34:19,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=860457.8333333334, ans=0.0 2024-09-19 03:34:20,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860457.8333333334, ans=0.1 2024-09-19 03:34:39,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-19 03:35:02,870 INFO [train.py:1198] (1/2) Epoch 48, batch 3350, loss[loss=0.1808, ctc_loss=0.1172, cr_loss=0.3181, over 20023.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3692, over 4104093.86 frames. ], batch size: 44, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:35:11,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.281e+02 2.389e+02 2.508e+02 3.043e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-19 03:35:18,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=860571.1666666666, ans=0.2 2024-09-19 03:35:19,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=860571.1666666666, ans=0.0 2024-09-19 03:35:42,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-19 03:35:54,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=860627.8333333334, ans=0.125 2024-09-19 03:36:18,985 INFO [train.py:1198] (1/2) Epoch 48, batch 3400, loss[loss=0.2201, ctc_loss=0.1473, cr_loss=0.3642, over 20827.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3694, over 4102442.79 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:36:37,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860712.8333333334, ans=0.125 2024-09-19 03:37:11,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860769.5, ans=0.125 2024-09-19 03:37:22,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=860797.8333333334, ans=0.125 2024-09-19 03:37:34,283 INFO [train.py:1198] (1/2) Epoch 48, batch 3450, loss[loss=0.2196, ctc_loss=0.1444, cr_loss=0.3755, over 21070.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3693, over 4091176.24 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:37:43,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.300e+02 2.426e+02 2.554e+02 6.184e+02, threshold=4.852e+02, percent-clipped=1.0 2024-09-19 03:38:01,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=860854.5, ans=0.125 2024-09-19 03:38:18,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=860911.1666666666, ans=0.0 2024-09-19 03:38:18,134 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:38:36,436 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:38:40,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860939.5, ans=0.125 2024-09-19 03:38:49,504 INFO [train.py:1198] (1/2) Epoch 48, batch 3500, loss[loss=0.2585, ctc_loss=0.1745, cr_loss=0.4201, over 19937.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1432, cr_loss=0.3704, over 4087575.06 frames. ], batch size: 80, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:38:49,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860967.8333333334, ans=0.1 2024-09-19 03:38:52,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-19 03:39:13,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=860996.1666666666, ans=0.0 2024-09-19 03:39:51,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=861081.1666666666, ans=0.125 2024-09-19 03:39:58,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861081.1666666666, ans=0.1 2024-09-19 03:40:10,415 INFO [train.py:1198] (1/2) Epoch 48, batch 3550, loss[loss=0.208, ctc_loss=0.136, cr_loss=0.3602, over 20941.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1435, cr_loss=0.371, over 4088819.18 frames. ], batch size: 60, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:40:18,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-19 03:40:19,616 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.284e+02 2.398e+02 2.559e+02 3.127e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 03:40:27,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=861137.8333333334, ans=0.125 2024-09-19 03:40:43,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=861166.1666666666, ans=15.0 2024-09-19 03:41:14,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=861222.8333333334, ans=0.125 2024-09-19 03:41:26,480 INFO [train.py:1198] (1/2) Epoch 48, batch 3600, loss[loss=0.2377, ctc_loss=0.1591, cr_loss=0.3928, over 21014.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1429, cr_loss=0.3699, over 4102160.55 frames. ], batch size: 61, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:41:49,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.97 vs. limit=6.0 2024-09-19 03:41:56,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=861307.8333333334, ans=0.0 2024-09-19 03:42:29,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861364.5, ans=0.1 2024-09-19 03:42:42,416 INFO [train.py:1198] (1/2) Epoch 48, batch 3650, loss[loss=0.2071, ctc_loss=0.1341, cr_loss=0.3653, over 21005.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3704, over 4108248.52 frames. ], batch size: 61, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:42:50,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=861392.8333333334, ans=0.2 2024-09-19 03:42:51,532 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.290e+02 2.415e+02 2.618e+02 3.744e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-19 03:43:29,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=861477.8333333334, ans=0.125 2024-09-19 03:43:56,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-19 03:43:58,419 INFO [train.py:1198] (1/2) Epoch 48, batch 3700, loss[loss=0.2284, ctc_loss=0.1497, cr_loss=0.3931, over 20836.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1429, cr_loss=0.3695, over 4102770.79 frames. ], batch size: 65, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:45:05,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=861647.8333333334, ans=0.2 2024-09-19 03:45:16,834 INFO [train.py:1198] (1/2) Epoch 48, batch 3750, loss[loss=0.2095, ctc_loss=0.1356, cr_loss=0.37, over 21050.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3683, over 4085978.07 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:45:26,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.284e+02 2.451e+02 2.639e+02 4.339e+02, threshold=4.902e+02, percent-clipped=0.0 2024-09-19 03:45:55,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=861732.8333333334, ans=0.0 2024-09-19 03:45:59,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=861732.8333333334, ans=0.125 2024-09-19 03:46:35,866 INFO [train.py:1198] (1/2) Epoch 48, batch 3800, loss[loss=0.2344, ctc_loss=0.1566, cr_loss=0.3892, over 20938.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3685, over 4081695.67 frames. ], batch size: 60, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:46:36,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2024-09-19 03:46:43,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861817.8333333334, ans=0.1 2024-09-19 03:47:23,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=861902.8333333334, ans=0.125 2024-09-19 03:47:47,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=861931.1666666666, ans=0.0 2024-09-19 03:47:51,462 INFO [train.py:1198] (1/2) Epoch 48, batch 3850, loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3712, over 21091.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3664, over 4075882.38 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:47:51,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=861959.5, ans=0.0 2024-09-19 03:48:00,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.244e+02 2.366e+02 2.611e+02 5.786e+02, threshold=4.732e+02, percent-clipped=1.0 2024-09-19 03:48:17,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=861987.8333333334, ans=0.125 2024-09-19 03:48:23,797 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:48:53,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=862072.8333333334, ans=0.0 2024-09-19 03:49:06,846 INFO [train.py:1198] (1/2) Epoch 48, batch 3900, loss[loss=0.2353, ctc_loss=0.1606, cr_loss=0.3735, over 20644.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3667, over 4072825.68 frames. ], batch size: 71, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:49:42,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2024-09-19 03:49:45,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=862157.8333333334, ans=0.125 2024-09-19 03:49:58,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=862186.1666666666, ans=0.125 2024-09-19 03:50:25,888 INFO [train.py:1198] (1/2) Epoch 48, batch 3950, loss[loss=0.1867, ctc_loss=0.1204, cr_loss=0.3317, over 20976.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1424, cr_loss=0.3671, over 4057479.42 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:50:30,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=862242.8333333334, ans=0.2 2024-09-19 03:50:32,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=862242.8333333334, ans=0.125 2024-09-19 03:50:35,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.312e+02 2.455e+02 2.589e+02 4.106e+02, threshold=4.911e+02, percent-clipped=0.0 2024-09-19 03:50:58,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862299.5, ans=0.1 2024-09-19 03:51:28,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=862356.1666666666, ans=0.125 2024-09-19 03:51:45,000 INFO [train.py:1198] (1/2) Epoch 48, batch 4000, loss[loss=0.2007, ctc_loss=0.1308, cr_loss=0.3495, over 20982.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1413, cr_loss=0.3651, over 4062001.62 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:53:00,253 INFO [train.py:1198] (1/2) Epoch 48, batch 4050, loss[loss=0.2432, ctc_loss=0.1626, cr_loss=0.4031, over 20830.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1412, cr_loss=0.3649, over 4080061.28 frames. ], batch size: 65, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:53:08,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.240e+02 2.415e+02 2.538e+02 3.576e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-19 03:54:00,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=862639.5, ans=0.025 2024-09-19 03:54:15,330 INFO [train.py:1198] (1/2) Epoch 48, batch 4100, loss[loss=0.2465, ctc_loss=0.1617, cr_loss=0.4241, over 21034.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1415, cr_loss=0.3658, over 4077588.64 frames. ], batch size: 62, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:54:44,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-09-19 03:55:08,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=862752.8333333334, ans=0.0 2024-09-19 03:55:22,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=862781.1666666666, ans=0.0 2024-09-19 03:55:30,729 INFO [train.py:1198] (1/2) Epoch 48, batch 4150, loss[loss=0.2021, ctc_loss=0.1327, cr_loss=0.3469, over 21076.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1413, cr_loss=0.3658, over 4078140.10 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:55:31,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=862809.5, ans=0.125 2024-09-19 03:55:39,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.224e+02 2.365e+02 2.504e+02 3.395e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-19 03:56:02,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-19 03:56:08,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862866.1666666666, ans=0.1 2024-09-19 03:56:12,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862866.1666666666, ans=0.1 2024-09-19 03:56:32,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-19 03:56:51,814 INFO [train.py:1198] (1/2) Epoch 48, batch 4200, loss[loss=0.1801, ctc_loss=0.1166, cr_loss=0.3176, over 21068.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3665, over 4074798.68 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:57:40,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=863036.1666666666, ans=0.125 2024-09-19 03:58:04,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863064.5, ans=0.125 2024-09-19 03:58:07,144 INFO [train.py:1198] (1/2) Epoch 48, batch 4250, loss[loss=0.2301, ctc_loss=0.1557, cr_loss=0.3719, over 20076.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3671, over 4071046.57 frames. ], batch size: 80, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:58:09,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863092.8333333334, ans=0.125 2024-09-19 03:58:16,261 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.324e+02 2.431e+02 2.611e+02 4.083e+02, threshold=4.862e+02, percent-clipped=0.0 2024-09-19 03:58:51,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.02 vs. limit=10.0 2024-09-19 03:59:02,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863177.8333333334, ans=0.1 2024-09-19 03:59:21,981 INFO [train.py:1198] (1/2) Epoch 48, batch 4300, loss[loss=0.1975, ctc_loss=0.1273, cr_loss=0.3507, over 21048.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3671, over 4054547.51 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:00:08,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863319.5, ans=0.125 2024-09-19 04:00:08,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863319.5, ans=0.1 2024-09-19 04:00:09,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=863319.5, ans=0.125 2024-09-19 04:00:17,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863319.5, ans=0.125 2024-09-19 04:00:35,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863347.8333333334, ans=0.125 2024-09-19 04:00:36,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863376.1666666666, ans=0.1 2024-09-19 04:00:38,087 INFO [train.py:1198] (1/2) Epoch 48, batch 4350, loss[loss=0.1851, ctc_loss=0.1199, cr_loss=0.3259, over 21073.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.143, cr_loss=0.3691, over 4074665.81 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:00:47,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.255e+02 2.397e+02 2.540e+02 3.175e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-19 04:01:16,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863432.8333333334, ans=0.1 2024-09-19 04:01:17,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863432.8333333334, ans=0.1 2024-09-19 04:01:46,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=863489.5, ans=0.5 2024-09-19 04:01:56,772 INFO [train.py:1198] (1/2) Epoch 48, batch 4400, loss[loss=0.2181, ctc_loss=0.1437, cr_loss=0.372, over 21042.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3676, over 4082869.72 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:01:59,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=863517.8333333334, ans=0.025 2024-09-19 04:02:24,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863546.1666666666, ans=0.1 2024-09-19 04:02:29,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=863574.5, ans=0.125 2024-09-19 04:02:48,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=863602.8333333334, ans=0.025 2024-09-19 04:03:09,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=863631.1666666666, ans=0.125 2024-09-19 04:03:09,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=863631.1666666666, ans=0.125 2024-09-19 04:03:15,480 INFO [train.py:1198] (1/2) Epoch 48, batch 4450, loss[loss=0.1874, ctc_loss=0.1223, cr_loss=0.3255, over 20870.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3683, over 4089097.52 frames. ], batch size: 57, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:03:24,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.278e+02 2.417e+02 2.577e+02 3.487e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-19 04:03:26,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.06 vs. limit=10.0 2024-09-19 04:03:31,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-19 04:04:11,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=863744.5, ans=0.125 2024-09-19 04:04:31,267 INFO [train.py:1198] (1/2) Epoch 48, batch 4500, loss[loss=0.2284, ctc_loss=0.1535, cr_loss=0.3748, over 20379.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.369, over 4088825.35 frames. ], batch size: 74, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:04:34,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863801.1666666666, ans=0.125 2024-09-19 04:04:40,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=863801.1666666666, ans=0.09899494936611666 2024-09-19 04:05:27,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863886.1666666666, ans=0.1 2024-09-19 04:05:47,270 INFO [train.py:1198] (1/2) Epoch 48, batch 4550, loss[loss=0.1994, ctc_loss=0.1279, cr_loss=0.3575, over 20997.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3667, over 4101379.59 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:05:48,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.92 vs. limit=6.0 2024-09-19 04:05:56,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.314e+02 2.436e+02 2.564e+02 3.850e+02, threshold=4.872e+02, percent-clipped=0.0 2024-09-19 04:06:02,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863971.1666666666, ans=0.125 2024-09-19 04:07:02,845 INFO [train.py:1198] (1/2) Epoch 48, batch 4600, loss[loss=0.227, ctc_loss=0.1492, cr_loss=0.389, over 20970.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3681, over 4090895.61 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:07:06,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=864084.5, ans=0.05 2024-09-19 04:07:21,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=864112.8333333334, ans=0.0 2024-09-19 04:07:21,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=864112.8333333334, ans=0.125 2024-09-19 04:07:47,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=864141.1666666666, ans=0.125 2024-09-19 04:08:05,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864197.8333333334, ans=0.1 2024-09-19 04:08:11,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=12.0 2024-09-19 04:08:15,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=864197.8333333334, ans=0.0 2024-09-19 04:08:24,289 INFO [train.py:1198] (1/2) Epoch 48, batch 4650, loss[loss=0.2226, ctc_loss=0.1453, cr_loss=0.3865, over 20311.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3674, over 4089298.48 frames. ], batch size: 74, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:08:26,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=864226.1666666666, ans=0.0 2024-09-19 04:08:33,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.287e+02 2.427e+02 2.595e+02 1.027e+03, threshold=4.854e+02, percent-clipped=1.0 2024-09-19 04:08:34,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-19 04:09:23,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=864339.5, ans=0.02 2024-09-19 04:09:36,960 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:09:39,776 INFO [train.py:1198] (1/2) Epoch 48, batch 4700, loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3722, over 20956.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3682, over 4086158.93 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:10:10,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=864424.5, ans=0.125 2024-09-19 04:10:31,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=864452.8333333334, ans=0.125 2024-09-19 04:10:32,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864452.8333333334, ans=0.125 2024-09-19 04:10:55,110 INFO [train.py:1198] (1/2) Epoch 48, batch 4750, loss[loss=0.2007, ctc_loss=0.132, cr_loss=0.3431, over 21075.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3671, over 4093241.56 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:11:04,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.276e+02 2.425e+02 2.615e+02 3.429e+02, threshold=4.850e+02, percent-clipped=0.0 2024-09-19 04:11:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=864537.8333333334, ans=0.125 2024-09-19 04:11:28,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864566.1666666666, ans=0.1 2024-09-19 04:11:41,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=22.5 2024-09-19 04:11:42,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=864594.5, ans=0.2 2024-09-19 04:11:45,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=864594.5, ans=0.0 2024-09-19 04:12:10,791 INFO [train.py:1198] (1/2) Epoch 48, batch 4800, loss[loss=0.2251, ctc_loss=0.1479, cr_loss=0.3857, over 20827.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3665, over 4101060.19 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:12:12,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=864651.1666666666, ans=0.2 2024-09-19 04:12:31,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=864679.5, ans=0.0 2024-09-19 04:12:37,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=864679.5, ans=0.0 2024-09-19 04:12:37,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-09-19 04:12:40,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864707.8333333334, ans=0.1 2024-09-19 04:13:27,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=864764.5, ans=0.0 2024-09-19 04:13:29,678 INFO [train.py:1198] (1/2) Epoch 48, batch 4850, loss[loss=0.2156, ctc_loss=0.1426, cr_loss=0.3654, over 21081.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3676, over 4090486.66 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:13:29,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864792.8333333334, ans=0.1 2024-09-19 04:13:40,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.289e+02 2.411e+02 2.617e+02 4.852e+02, threshold=4.823e+02, percent-clipped=1.0 2024-09-19 04:14:24,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=864877.8333333334, ans=0.0 2024-09-19 04:14:24,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=864877.8333333334, ans=0.125 2024-09-19 04:14:27,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-19 04:14:48,123 INFO [train.py:1198] (1/2) Epoch 48, batch 4900, loss[loss=0.2778, ctc_loss=0.2008, cr_loss=0.3851, over 14287.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3662, over 4088871.10 frames. ], batch size: 149, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:14:48,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=864934.5, ans=0.0 2024-09-19 04:14:49,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=864934.5, ans=0.2 2024-09-19 04:15:38,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=865019.5, ans=0.125 2024-09-19 04:15:41,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=865019.5, ans=0.125 2024-09-19 04:16:03,344 INFO [train.py:1198] (1/2) Epoch 48, batch 4950, loss[loss=0.198, ctc_loss=0.13, cr_loss=0.3401, over 20945.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3664, over 4092101.97 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:16:13,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.276e+02 2.404e+02 2.529e+02 3.480e+02, threshold=4.808e+02, percent-clipped=0.0 2024-09-19 04:16:38,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=865132.8333333334, ans=0.125 2024-09-19 04:17:17,521 INFO [train.py:1198] (1/2) Epoch 48, batch 5000, loss[loss=0.2367, ctc_loss=0.1564, cr_loss=0.4016, over 20803.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3678, over 4100381.61 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:17:28,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=865217.8333333334, ans=0.125 2024-09-19 04:18:11,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=865302.8333333334, ans=0.125 2024-09-19 04:18:31,783 INFO [train.py:1198] (1/2) Epoch 48, batch 5050, loss[loss=0.2538, ctc_loss=0.169, cr_loss=0.4241, over 18070.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3687, over 4083295.74 frames. ], batch size: 108, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:18:39,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=865359.5, ans=0.125 2024-09-19 04:18:42,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.233e+02 2.415e+02 2.584e+02 4.916e+02, threshold=4.830e+02, percent-clipped=1.0 2024-09-19 04:19:07,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865416.1666666666, ans=0.125 2024-09-19 04:19:18,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865444.5, ans=0.125 2024-09-19 04:19:42,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-09-19 04:19:46,446 INFO [train.py:1198] (1/2) Epoch 48, batch 5100, loss[loss=0.1946, ctc_loss=0.1274, cr_loss=0.3361, over 20941.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1424, cr_loss=0.3686, over 4083306.97 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:20:06,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=865529.5, ans=0.0 2024-09-19 04:20:15,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=865557.8333333334, ans=0.0 2024-09-19 04:20:33,089 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:21:01,000 INFO [train.py:1198] (1/2) Epoch 48, batch 5150, loss[loss=0.2115, ctc_loss=0.1403, cr_loss=0.3559, over 20813.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3673, over 4079395.83 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:21:12,773 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.256e+02 2.401e+02 2.512e+02 4.424e+02, threshold=4.802e+02, percent-clipped=0.0 2024-09-19 04:21:26,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=865671.1666666666, ans=0.0 2024-09-19 04:21:28,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-19 04:21:30,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865671.1666666666, ans=0.125 2024-09-19 04:21:54,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=865727.8333333334, ans=0.0 2024-09-19 04:22:00,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865727.8333333334, ans=0.1 2024-09-19 04:22:07,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=865756.1666666666, ans=0.05 2024-09-19 04:22:07,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865756.1666666666, ans=0.1 2024-09-19 04:22:10,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-19 04:22:17,484 INFO [train.py:1198] (1/2) Epoch 48, batch 5200, loss[loss=0.1936, ctc_loss=0.1242, cr_loss=0.3466, over 20971.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3676, over 4070421.70 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:22:20,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865784.5, ans=0.1 2024-09-19 04:22:52,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=865841.1666666666, ans=0.5 2024-09-19 04:23:05,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=865869.5, ans=0.0 2024-09-19 04:23:31,523 INFO [train.py:1198] (1/2) Epoch 48, batch 5250, loss[loss=0.2027, ctc_loss=0.1325, cr_loss=0.351, over 21052.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3685, over 4064697.20 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:23:33,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=865926.1666666666, ans=0.0 2024-09-19 04:23:43,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.283e+02 2.408e+02 2.633e+02 5.231e+02, threshold=4.816e+02, percent-clipped=1.0 2024-09-19 04:24:07,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865982.8333333334, ans=0.125 2024-09-19 04:24:11,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865982.8333333334, ans=0.125 2024-09-19 04:24:17,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=866011.1666666666, ans=0.125 2024-09-19 04:24:48,724 INFO [train.py:1198] (1/2) Epoch 48, batch 5300, loss[loss=0.2213, ctc_loss=0.1459, cr_loss=0.3771, over 21070.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3697, over 4064473.11 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:25:57,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=866181.1666666666, ans=0.125 2024-09-19 04:26:03,256 INFO [train.py:1198] (1/2) Epoch 48, batch 5350, loss[loss=0.2397, ctc_loss=0.159, cr_loss=0.4031, over 20625.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3692, over 4062167.03 frames. ], batch size: 68, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:26:08,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=866209.5, ans=0.035 2024-09-19 04:26:15,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.320e+02 2.437e+02 2.636e+02 3.633e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 04:26:46,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=866294.5, ans=0.0 2024-09-19 04:26:58,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=866294.5, ans=0.125 2024-09-19 04:27:03,222 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:27:17,599 INFO [train.py:1198] (1/2) Epoch 48, batch 5400, loss[loss=0.1906, ctc_loss=0.1225, cr_loss=0.3403, over 20957.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3698, over 4056389.19 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:27:47,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=866407.8333333334, ans=0.2 2024-09-19 04:28:04,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=866436.1666666666, ans=0.125 2024-09-19 04:28:09,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-19 04:28:20,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=866464.5, ans=0.025 2024-09-19 04:28:32,124 INFO [train.py:1198] (1/2) Epoch 48, batch 5450, loss[loss=0.2259, ctc_loss=0.1495, cr_loss=0.382, over 20742.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3698, over 4066355.69 frames. ], batch size: 71, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:28:42,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866492.8333333334, ans=0.1 2024-09-19 04:28:45,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.254e+02 2.383e+02 2.550e+02 3.825e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-19 04:29:25,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=866577.8333333334, ans=0.125 2024-09-19 04:29:45,979 INFO [train.py:1198] (1/2) Epoch 48, batch 5500, loss[loss=0.1824, ctc_loss=0.1179, cr_loss=0.3223, over 21062.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1437, cr_loss=0.3703, over 4055881.67 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:30:22,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866691.1666666666, ans=0.1 2024-09-19 04:30:45,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-19 04:31:01,755 INFO [train.py:1198] (1/2) Epoch 48, batch 5550, loss[loss=0.2528, ctc_loss=0.1704, cr_loss=0.412, over 18157.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3695, over 4068175.43 frames. ], batch size: 108, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:31:07,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-19 04:31:15,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.255e+02 2.403e+02 2.586e+02 4.346e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-19 04:31:30,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=866832.8333333334, ans=0.125 2024-09-19 04:32:16,351 INFO [train.py:1198] (1/2) Epoch 48, batch 5600, loss[loss=0.2118, ctc_loss=0.1365, cr_loss=0.3766, over 20880.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3694, over 4074862.75 frames. ], batch size: 57, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:32:43,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=866946.1666666666, ans=0.125 2024-09-19 04:33:04,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2024-09-19 04:33:26,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=867031.1666666666, ans=0.05 2024-09-19 04:33:33,110 INFO [train.py:1198] (1/2) Epoch 48, batch 5650, loss[loss=0.2467, ctc_loss=0.165, cr_loss=0.4086, over 18418.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3671, over 4077629.56 frames. ], batch size: 108, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:33:36,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=26.90 vs. limit=15.0 2024-09-19 04:33:39,313 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:33:47,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.233e+02 2.390e+02 2.509e+02 3.656e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-19 04:34:03,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-19 04:34:06,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-19 04:34:47,567 INFO [train.py:1198] (1/2) Epoch 48, batch 5700, loss[loss=0.2247, ctc_loss=0.1474, cr_loss=0.3869, over 20925.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3679, over 4080160.15 frames. ], batch size: 60, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:35:26,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=867257.8333333334, ans=0.0 2024-09-19 04:35:38,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=867286.1666666666, ans=0.09899494936611666 2024-09-19 04:36:02,071 INFO [train.py:1198] (1/2) Epoch 48, batch 5750, loss[loss=0.1718, ctc_loss=0.1116, cr_loss=0.3013, over 19819.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3672, over 4084809.73 frames. ], batch size: 44, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:36:16,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.236e+02 2.330e+02 2.492e+02 3.466e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-19 04:36:18,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=867371.1666666666, ans=0.125 2024-09-19 04:36:30,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=867399.5, ans=0.125 2024-09-19 04:36:33,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=867399.5, ans=0.2 2024-09-19 04:36:41,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867399.5, ans=0.1 2024-09-19 04:36:41,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=867399.5, ans=0.09899494936611666 2024-09-19 04:36:50,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=867427.8333333334, ans=0.0 2024-09-19 04:37:16,497 INFO [train.py:1198] (1/2) Epoch 48, batch 5800, loss[loss=0.2057, ctc_loss=0.1334, cr_loss=0.3615, over 19827.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.368, over 4091630.33 frames. ], batch size: 44, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:37:27,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-19 04:38:23,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=867597.8333333334, ans=0.125 2024-09-19 04:38:31,112 INFO [train.py:1198] (1/2) Epoch 48, batch 5850, loss[loss=0.1857, ctc_loss=0.1203, cr_loss=0.3271, over 20998.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3693, over 4095487.22 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:38:34,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=867626.1666666666, ans=0.0 2024-09-19 04:38:35,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=867626.1666666666, ans=0.125 2024-09-19 04:38:46,309 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.317e+02 2.437e+02 2.557e+02 4.299e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 04:39:14,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=867682.8333333334, ans=0.07 2024-09-19 04:39:28,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=867711.1666666666, ans=0.125 2024-09-19 04:39:33,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=867739.5, ans=0.0 2024-09-19 04:39:47,816 INFO [train.py:1198] (1/2) Epoch 48, batch 5900, loss[loss=0.2149, ctc_loss=0.1388, cr_loss=0.3804, over 20954.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3689, over 4094666.86 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:39:56,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-09-19 04:40:13,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=867796.1666666666, ans=0.1 2024-09-19 04:40:28,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867824.5, ans=0.125 2024-09-19 04:40:35,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867852.8333333334, ans=0.1 2024-09-19 04:41:01,824 INFO [train.py:1198] (1/2) Epoch 48, batch 5950, loss[loss=0.244, ctc_loss=0.1597, cr_loss=0.4212, over 20832.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3694, over 4095193.46 frames. ], batch size: 65, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:41:16,700 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.293e+02 2.409e+02 2.557e+02 3.514e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 04:41:20,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867937.8333333334, ans=0.125 2024-09-19 04:41:34,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867966.1666666666, ans=0.125 2024-09-19 04:41:37,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=867966.1666666666, ans=0.125 2024-09-19 04:41:43,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867966.1666666666, ans=0.125 2024-09-19 04:41:47,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=867994.5, ans=0.125 2024-09-19 04:41:48,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867994.5, ans=0.1 2024-09-19 04:42:03,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=868022.8333333334, ans=0.125 2024-09-19 04:42:18,320 INFO [train.py:1198] (1/2) Epoch 48, batch 6000, loss[loss=0.2316, ctc_loss=0.156, cr_loss=0.3781, over 20844.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 4102634.16 frames. ], batch size: 65, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:42:18,320 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 04:42:36,917 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.03868, ctc_loss=0.03868, cr_loss=1.598e-14, over 944034.00 frames. 2024-09-19 04:42:36,917 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 04:42:38,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=868051.1666666666, ans=0.125 2024-09-19 04:42:46,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=868051.1666666666, ans=0.2 2024-09-19 04:42:53,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=868079.5, ans=0.125 2024-09-19 04:43:28,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=12.0 2024-09-19 04:43:47,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=868164.5, ans=0.2 2024-09-19 04:43:51,435 INFO [train.py:1198] (1/2) Epoch 48, batch 6050, loss[loss=0.2152, ctc_loss=0.1415, cr_loss=0.3688, over 21012.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3677, over 4095137.18 frames. ], batch size: 63, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:44:06,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.280e+02 2.421e+02 2.646e+02 3.730e+02, threshold=4.841e+02, percent-clipped=0.0 2024-09-19 04:44:13,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=868221.1666666666, ans=0.0 2024-09-19 04:44:19,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=868221.1666666666, ans=0.025 2024-09-19 04:44:35,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=868277.8333333334, ans=0.0 2024-09-19 04:44:55,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=868306.1666666666, ans=0.125 2024-09-19 04:44:58,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2024-09-19 04:45:00,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=868306.1666666666, ans=0.035 2024-09-19 04:45:06,512 INFO [train.py:1198] (1/2) Epoch 48, batch 6100, loss[loss=0.2291, ctc_loss=0.15, cr_loss=0.3955, over 20888.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3678, over 4100040.89 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:45:39,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=868391.1666666666, ans=0.125 2024-09-19 04:45:48,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=12.0 2024-09-19 04:46:20,309 INFO [train.py:1198] (1/2) Epoch 48, batch 6150, loss[loss=0.2561, ctc_loss=0.1794, cr_loss=0.3832, over 13792.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3672, over 4098123.11 frames. ], batch size: 150, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:46:27,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868476.1666666666, ans=0.1 2024-09-19 04:46:28,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868476.1666666666, ans=0.1 2024-09-19 04:46:35,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.274e+02 2.392e+02 2.563e+02 3.142e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-19 04:46:44,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=868504.5, ans=0.0 2024-09-19 04:47:28,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-19 04:47:35,174 INFO [train.py:1198] (1/2) Epoch 48, batch 6200, loss[loss=0.2147, ctc_loss=0.1426, cr_loss=0.3605, over 20708.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3675, over 4079000.49 frames. ], batch size: 71, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:48:35,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=868731.1666666666, ans=0.035 2024-09-19 04:48:48,581 INFO [train.py:1198] (1/2) Epoch 48, batch 6250, loss[loss=0.1836, ctc_loss=0.1172, cr_loss=0.3322, over 20970.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1422, cr_loss=0.3666, over 4072880.21 frames. ], batch size: 50, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:49:03,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.282e+02 2.464e+02 2.645e+02 5.276e+02, threshold=4.928e+02, percent-clipped=1.0 2024-09-19 04:49:26,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=22.5 2024-09-19 04:49:35,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-19 04:49:44,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=868844.5, ans=0.125 2024-09-19 04:50:03,024 INFO [train.py:1198] (1/2) Epoch 48, batch 6300, loss[loss=0.2351, ctc_loss=0.1564, cr_loss=0.3939, over 20339.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3693, over 4031929.25 frames. ], batch size: 74, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:50:51,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=12.0 2024-09-19 04:51:09,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=869014.5, ans=0.125 2024-09-19 04:51:16,108 INFO [train.py:1198] (1/2) Epoch 48, batch 6350, loss[loss=0.2654, ctc_loss=0.1857, cr_loss=0.3985, over 14969.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.372, over 3834714.65 frames. ], batch size: 149, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:51:30,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.505e+02 2.792e+02 3.033e+02 6.518e+02, threshold=5.583e+02, percent-clipped=1.0 2024-09-19 04:53:04,383 INFO [train.py:1198] (1/2) Epoch 49, batch 0, loss[loss=0.2414, ctc_loss=0.1587, cr_loss=0.4133, over 21016.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1587, cr_loss=0.4133, over 21016.00 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:53:04,384 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 04:53:22,701 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.03846, ctc_loss=0.03846, cr_loss=1.688e-14, over 944034.00 frames. 2024-09-19 04:53:22,702 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 04:53:35,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=869159.0, ans=0.07 2024-09-19 04:53:41,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=869187.3333333334, ans=0.125 2024-09-19 04:53:42,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=869187.3333333334, ans=0.125 2024-09-19 04:54:33,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=869272.3333333334, ans=0.125 2024-09-19 04:54:38,092 INFO [train.py:1198] (1/2) Epoch 49, batch 50, loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3533, over 20930.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1437, cr_loss=0.3738, over 927226.00 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:54:52,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=22.5 2024-09-19 04:55:08,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.336e+02 2.495e+02 2.828e+02 4.684e+02, threshold=4.990e+02, percent-clipped=0.0 2024-09-19 04:55:53,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=869442.3333333334, ans=0.2 2024-09-19 04:55:54,102 INFO [train.py:1198] (1/2) Epoch 49, batch 100, loss[loss=0.2218, ctc_loss=0.1468, cr_loss=0.3746, over 20871.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1424, cr_loss=0.3702, over 1625266.36 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:55:58,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=869442.3333333334, ans=0.0 2024-09-19 04:55:58,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=869442.3333333334, ans=0.0 2024-09-19 04:57:08,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869555.6666666666, ans=0.1 2024-09-19 04:57:11,521 INFO [train.py:1198] (1/2) Epoch 49, batch 150, loss[loss=0.1853, ctc_loss=0.119, cr_loss=0.3317, over 20868.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1425, cr_loss=0.3705, over 2167407.25 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:57:41,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.225e+02 2.348e+02 2.491e+02 3.092e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-19 04:58:27,418 INFO [train.py:1198] (1/2) Epoch 49, batch 200, loss[loss=0.2194, ctc_loss=0.145, cr_loss=0.3722, over 20980.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3669, over 2594907.80 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:58:27,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=869725.6666666666, ans=0.125 2024-09-19 04:58:30,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869725.6666666666, ans=0.125 2024-09-19 04:59:13,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869782.3333333334, ans=0.125 2024-09-19 04:59:27,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=869810.6666666666, ans=0.125 2024-09-19 04:59:37,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=869839.0, ans=0.0 2024-09-19 04:59:46,468 INFO [train.py:1198] (1/2) Epoch 49, batch 250, loss[loss=0.2086, ctc_loss=0.1353, cr_loss=0.3666, over 21064.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.3684, over 2933255.60 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:59:51,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=12.0 2024-09-19 05:00:16,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.266e+02 2.387e+02 2.568e+02 3.879e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 05:00:31,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=869952.3333333334, ans=0.125 2024-09-19 05:01:01,808 INFO [train.py:1198] (1/2) Epoch 49, batch 300, loss[loss=0.2168, ctc_loss=0.1419, cr_loss=0.3744, over 20829.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.3681, over 3186463.59 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:01:02,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-19 05:02:16,910 INFO [train.py:1198] (1/2) Epoch 49, batch 350, loss[loss=0.2114, ctc_loss=0.1393, cr_loss=0.3608, over 20871.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3671, over 3384511.07 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:02:18,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=870150.6666666666, ans=0.125 2024-09-19 05:02:50,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.257e+02 2.370e+02 2.569e+02 3.205e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-19 05:03:35,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2024-09-19 05:03:35,925 INFO [train.py:1198] (1/2) Epoch 49, batch 400, loss[loss=0.2005, ctc_loss=0.1319, cr_loss=0.3428, over 20651.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3663, over 3557529.42 frames. ], batch size: 66, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:03:48,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.980e-03 2024-09-19 05:04:05,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=12.0 2024-09-19 05:04:10,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870349.0, ans=0.125 2024-09-19 05:04:54,246 INFO [train.py:1198] (1/2) Epoch 49, batch 450, loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3395, over 21018.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1404, cr_loss=0.3666, over 3685032.26 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:05:17,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870462.3333333334, ans=0.1 2024-09-19 05:05:23,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870490.6666666666, ans=0.125 2024-09-19 05:05:24,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.285e+02 2.393e+02 2.528e+02 6.540e+02, threshold=4.786e+02, percent-clipped=1.0 2024-09-19 05:05:26,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=870490.6666666666, ans=0.025 2024-09-19 05:05:40,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=22.5 2024-09-19 05:05:44,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=870519.0, ans=10.0 2024-09-19 05:05:52,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-19 05:06:09,394 INFO [train.py:1198] (1/2) Epoch 49, batch 500, loss[loss=0.1922, ctc_loss=0.125, cr_loss=0.3363, over 20799.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1398, cr_loss=0.3643, over 3780034.78 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:06:33,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=870604.0, ans=0.2 2024-09-19 05:06:40,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=870632.3333333334, ans=0.5 2024-09-19 05:06:41,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-19 05:06:42,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-09-19 05:06:47,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870632.3333333334, ans=0.125 2024-09-19 05:07:08,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=870689.0, ans=0.125 2024-09-19 05:07:24,734 INFO [train.py:1198] (1/2) Epoch 49, batch 550, loss[loss=0.2679, ctc_loss=0.1782, cr_loss=0.4488, over 19950.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1398, cr_loss=0.3644, over 3859652.18 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:07:28,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=870717.3333333334, ans=0.125 2024-09-19 05:07:54,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.217e+02 2.376e+02 2.506e+02 3.935e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-19 05:07:59,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870774.0, ans=0.125 2024-09-19 05:08:42,914 INFO [train.py:1198] (1/2) Epoch 49, batch 600, loss[loss=0.2094, ctc_loss=0.1346, cr_loss=0.374, over 20781.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1397, cr_loss=0.3645, over 3907750.14 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:08:58,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=870887.3333333334, ans=0.125 2024-09-19 05:09:26,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=870944.0, ans=0.2 2024-09-19 05:09:38,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870944.0, ans=0.1 2024-09-19 05:09:41,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.66 vs. limit=6.0 2024-09-19 05:09:52,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=870972.3333333334, ans=0.0 2024-09-19 05:09:57,914 INFO [train.py:1198] (1/2) Epoch 49, batch 650, loss[loss=0.2396, ctc_loss=0.157, cr_loss=0.4127, over 18029.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.14, cr_loss=0.3655, over 3953591.29 frames. ], batch size: 108, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:10:30,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.267e+02 2.419e+02 2.577e+02 3.151e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 05:10:34,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871057.3333333334, ans=0.0 2024-09-19 05:10:36,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=871057.3333333334, ans=0.2 2024-09-19 05:10:46,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=871085.6666666666, ans=0.09899494936611666 2024-09-19 05:11:03,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=871114.0, ans=0.025 2024-09-19 05:11:03,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=871114.0, ans=0.2 2024-09-19 05:11:15,235 INFO [train.py:1198] (1/2) Epoch 49, batch 700, loss[loss=0.2222, ctc_loss=0.147, cr_loss=0.3758, over 21038.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1404, cr_loss=0.3659, over 3991648.02 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:11:42,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=871170.6666666666, ans=0.0 2024-09-19 05:11:54,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=871199.0, ans=0.125 2024-09-19 05:11:55,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2024-09-19 05:12:30,431 INFO [train.py:1198] (1/2) Epoch 49, batch 750, loss[loss=0.2123, ctc_loss=0.1382, cr_loss=0.3709, over 20791.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3665, over 3992855.01 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:12:32,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=871284.0, ans=0.125 2024-09-19 05:13:00,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.271e+02 2.384e+02 2.575e+02 4.807e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-19 05:13:04,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=12.0 2024-09-19 05:13:36,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=871397.3333333334, ans=0.5 2024-09-19 05:13:46,022 INFO [train.py:1198] (1/2) Epoch 49, batch 800, loss[loss=0.2123, ctc_loss=0.1384, cr_loss=0.3697, over 20946.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3654, over 4007433.55 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:14:01,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=871454.0, ans=0.07 2024-09-19 05:14:02,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-19 05:15:05,313 INFO [train.py:1198] (1/2) Epoch 49, batch 850, loss[loss=0.1704, ctc_loss=0.1067, cr_loss=0.3184, over 21056.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3659, over 4029041.94 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:15:35,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.360e+02 2.457e+02 2.650e+02 3.350e+02, threshold=4.914e+02, percent-clipped=0.0 2024-09-19 05:16:23,676 INFO [train.py:1198] (1/2) Epoch 49, batch 900, loss[loss=0.2069, ctc_loss=0.1331, cr_loss=0.3691, over 20928.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3659, over 4054776.06 frames. ], batch size: 49, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:16:34,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871709.0, ans=0.1 2024-09-19 05:16:59,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=871765.6666666666, ans=0.125 2024-09-19 05:17:13,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=871794.0, ans=0.05 2024-09-19 05:17:27,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=871822.3333333334, ans=0.0 2024-09-19 05:17:34,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871822.3333333334, ans=0.1 2024-09-19 05:17:39,229 INFO [train.py:1198] (1/2) Epoch 49, batch 950, loss[loss=0.191, ctc_loss=0.1255, cr_loss=0.3275, over 20956.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1406, cr_loss=0.3647, over 4055313.40 frames. ], batch size: 48, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:17:40,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2024-09-19 05:18:05,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-19 05:18:09,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.259e+02 2.375e+02 2.568e+02 3.161e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 05:18:26,490 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:18:51,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=871964.0, ans=0.0 2024-09-19 05:18:54,540 INFO [train.py:1198] (1/2) Epoch 49, batch 1000, loss[loss=0.2234, ctc_loss=0.1454, cr_loss=0.39, over 21079.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3653, over 4069324.07 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:19:05,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=871992.3333333334, ans=0.0 2024-09-19 05:19:22,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-19 05:19:23,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872049.0, ans=0.1 2024-09-19 05:19:49,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=872077.3333333334, ans=0.05 2024-09-19 05:20:13,322 INFO [train.py:1198] (1/2) Epoch 49, batch 1050, loss[loss=0.1834, ctc_loss=0.1192, cr_loss=0.3213, over 19942.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3663, over 4073351.50 frames. ], batch size: 44, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:20:43,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.304e+02 2.454e+02 2.606e+02 3.776e+02, threshold=4.908e+02, percent-clipped=0.0 2024-09-19 05:20:53,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=872190.6666666666, ans=0.2 2024-09-19 05:20:54,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=872190.6666666666, ans=0.1 2024-09-19 05:21:11,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=872219.0, ans=0.07 2024-09-19 05:21:16,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=872247.3333333334, ans=0.0 2024-09-19 05:21:29,597 INFO [train.py:1198] (1/2) Epoch 49, batch 1100, loss[loss=0.2048, ctc_loss=0.1364, cr_loss=0.3417, over 21050.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1412, cr_loss=0.3659, over 4077537.57 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:21:34,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=872275.6666666666, ans=0.0 2024-09-19 05:22:06,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-19 05:22:14,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=872332.3333333334, ans=0.0 2024-09-19 05:22:23,181 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:22:26,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-19 05:22:38,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=872389.0, ans=0.025 2024-09-19 05:22:49,131 INFO [train.py:1198] (1/2) Epoch 49, batch 1150, loss[loss=0.2544, ctc_loss=0.1733, cr_loss=0.4058, over 18656.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1408, cr_loss=0.365, over 4080096.99 frames. ], batch size: 108, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:22:51,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=872417.3333333334, ans=0.0 2024-09-19 05:23:19,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.289e+02 2.377e+02 2.497e+02 3.518e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-19 05:23:30,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=872474.0, ans=0.05 2024-09-19 05:24:04,830 INFO [train.py:1198] (1/2) Epoch 49, batch 1200, loss[loss=0.2193, ctc_loss=0.1468, cr_loss=0.3626, over 21092.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3667, over 4084992.98 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:24:18,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872587.3333333334, ans=0.125 2024-09-19 05:24:27,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=872587.3333333334, ans=0.0 2024-09-19 05:25:01,098 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:25:12,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=872672.3333333334, ans=0.125 2024-09-19 05:25:15,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=872672.3333333334, ans=0.2 2024-09-19 05:25:20,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=872700.6666666666, ans=0.07 2024-09-19 05:25:21,352 INFO [train.py:1198] (1/2) Epoch 49, batch 1250, loss[loss=0.2076, ctc_loss=0.1375, cr_loss=0.3508, over 20849.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.367, over 4094040.46 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:25:31,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.87 vs. limit=10.0 2024-09-19 05:25:40,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=872729.0, ans=0.0 2024-09-19 05:25:52,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=872729.0, ans=0.125 2024-09-19 05:25:54,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.274e+02 2.409e+02 2.576e+02 4.717e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 05:26:14,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=872785.6666666666, ans=0.0 2024-09-19 05:26:28,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872814.0, ans=0.125 2024-09-19 05:26:30,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2024-09-19 05:26:40,838 INFO [train.py:1198] (1/2) Epoch 49, batch 1300, loss[loss=0.2124, ctc_loss=0.14, cr_loss=0.362, over 20819.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.366, over 4094833.63 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:27:24,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=872899.0, ans=0.125 2024-09-19 05:27:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=872927.3333333334, ans=0.2 2024-09-19 05:27:46,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=872955.6666666666, ans=0.0 2024-09-19 05:27:58,675 INFO [train.py:1198] (1/2) Epoch 49, batch 1350, loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.3367, over 21054.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3671, over 4075726.73 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:28:05,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-09-19 05:28:29,236 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.209e+02 2.364e+02 2.594e+02 5.305e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-19 05:28:40,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=873040.6666666666, ans=10.0 2024-09-19 05:29:07,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=873097.3333333334, ans=0.125 2024-09-19 05:29:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=873097.3333333334, ans=0.5 2024-09-19 05:29:15,015 INFO [train.py:1198] (1/2) Epoch 49, batch 1400, loss[loss=0.2353, ctc_loss=0.1545, cr_loss=0.404, over 21025.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 4088803.94 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:29:19,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=873125.6666666666, ans=0.125 2024-09-19 05:29:19,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=873125.6666666666, ans=0.07 2024-09-19 05:29:33,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=873154.0, ans=0.025 2024-09-19 05:29:36,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=873154.0, ans=0.125 2024-09-19 05:29:57,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=873182.3333333334, ans=0.5 2024-09-19 05:30:07,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-09-19 05:30:23,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=873239.0, ans=0.2 2024-09-19 05:30:30,649 INFO [train.py:1198] (1/2) Epoch 49, batch 1450, loss[loss=0.2223, ctc_loss=0.1444, cr_loss=0.3897, over 21018.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3675, over 4097276.44 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:30:38,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=873267.3333333334, ans=0.125 2024-09-19 05:31:01,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.243e+02 2.369e+02 2.539e+02 3.109e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-19 05:31:05,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=873324.0, ans=0.0 2024-09-19 05:31:23,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873352.3333333334, ans=0.1 2024-09-19 05:31:38,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-19 05:31:50,056 INFO [train.py:1198] (1/2) Epoch 49, batch 1500, loss[loss=0.2516, ctc_loss=0.1697, cr_loss=0.4098, over 20803.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3667, over 4088902.41 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:32:02,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=873409.0, ans=0.125 2024-09-19 05:32:04,003 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:32:08,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=873437.3333333334, ans=0.0 2024-09-19 05:32:14,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=873437.3333333334, ans=0.125 2024-09-19 05:32:29,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873465.6666666666, ans=0.0 2024-09-19 05:32:40,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=873494.0, ans=0.04949747468305833 2024-09-19 05:33:06,016 INFO [train.py:1198] (1/2) Epoch 49, batch 1550, loss[loss=0.1998, ctc_loss=0.1291, cr_loss=0.3536, over 21072.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3656, over 4088067.73 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:33:15,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873550.6666666666, ans=0.1 2024-09-19 05:33:31,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2024-09-19 05:33:39,111 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.256e+02 2.388e+02 2.544e+02 3.857e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-19 05:34:02,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-19 05:34:18,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873664.0, ans=0.125 2024-09-19 05:34:23,979 INFO [train.py:1198] (1/2) Epoch 49, batch 1600, loss[loss=0.2107, ctc_loss=0.1386, cr_loss=0.3602, over 20661.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3648, over 4086378.55 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:34:27,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=873692.3333333334, ans=0.04949747468305833 2024-09-19 05:34:46,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=873720.6666666666, ans=0.125 2024-09-19 05:34:47,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-19 05:35:08,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=873777.3333333334, ans=0.125 2024-09-19 05:35:39,988 INFO [train.py:1198] (1/2) Epoch 49, batch 1650, loss[loss=0.2302, ctc_loss=0.1522, cr_loss=0.3897, over 19944.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1401, cr_loss=0.3645, over 4094856.26 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:35:50,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873834.0, ans=0.1 2024-09-19 05:36:10,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.253e+02 2.363e+02 2.502e+02 3.911e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-19 05:36:30,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=873919.0, ans=0.125 2024-09-19 05:36:47,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-19 05:36:55,620 INFO [train.py:1198] (1/2) Epoch 49, batch 1700, loss[loss=0.226, ctc_loss=0.148, cr_loss=0.3903, over 20990.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1401, cr_loss=0.3645, over 4106314.50 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:37:00,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-19 05:37:15,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=874004.0, ans=0.125 2024-09-19 05:37:17,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-19 05:37:39,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-19 05:37:47,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874060.6666666666, ans=0.1 2024-09-19 05:38:08,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=874089.0, ans=0.125 2024-09-19 05:38:09,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=874089.0, ans=0.125 2024-09-19 05:38:11,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=874089.0, ans=0.125 2024-09-19 05:38:13,849 INFO [train.py:1198] (1/2) Epoch 49, batch 1750, loss[loss=0.2457, ctc_loss=0.1704, cr_loss=0.3762, over 14712.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3667, over 4095742.24 frames. ], batch size: 150, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:38:45,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.335e+02 2.473e+02 2.656e+02 3.671e+02, threshold=4.946e+02, percent-clipped=0.0 2024-09-19 05:38:50,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=874174.0, ans=0.125 2024-09-19 05:39:16,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-19 05:39:33,115 INFO [train.py:1198] (1/2) Epoch 49, batch 1800, loss[loss=0.1791, ctc_loss=0.1158, cr_loss=0.3166, over 19888.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3673, over 4106139.38 frames. ], batch size: 44, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:40:29,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=874344.0, ans=0.2 2024-09-19 05:40:48,697 INFO [train.py:1198] (1/2) Epoch 49, batch 1850, loss[loss=0.2143, ctc_loss=0.1399, cr_loss=0.3715, over 20871.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3663, over 4111533.31 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:40:56,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=874400.6666666666, ans=0.125 2024-09-19 05:41:20,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.286e+02 2.400e+02 2.579e+02 4.218e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 05:41:25,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=874457.3333333334, ans=0.125 2024-09-19 05:41:26,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=874457.3333333334, ans=0.07 2024-09-19 05:41:52,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=874514.0, ans=0.125 2024-09-19 05:42:04,190 INFO [train.py:1198] (1/2) Epoch 49, batch 1900, loss[loss=0.198, ctc_loss=0.1297, cr_loss=0.3416, over 20785.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3649, over 4104241.91 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:42:28,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=874570.6666666666, ans=0.2 2024-09-19 05:42:43,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=874599.0, ans=0.125 2024-09-19 05:42:48,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-09-19 05:42:52,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=874627.3333333334, ans=0.2 2024-09-19 05:43:02,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.02 vs. limit=22.5 2024-09-19 05:43:09,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=874655.6666666666, ans=0.07 2024-09-19 05:43:22,702 INFO [train.py:1198] (1/2) Epoch 49, batch 1950, loss[loss=0.2427, ctc_loss=0.1614, cr_loss=0.4066, over 20319.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1397, cr_loss=0.3638, over 4108863.41 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:43:35,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=874684.0, ans=0.125 2024-09-19 05:43:41,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=874712.3333333334, ans=0.0 2024-09-19 05:43:42,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874712.3333333334, ans=0.1 2024-09-19 05:43:42,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=874712.3333333334, ans=0.125 2024-09-19 05:43:54,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.253e+02 2.406e+02 2.618e+02 3.664e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 05:44:01,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=874740.6666666666, ans=0.0 2024-09-19 05:44:31,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874797.3333333334, ans=0.125 2024-09-19 05:44:41,326 INFO [train.py:1198] (1/2) Epoch 49, batch 2000, loss[loss=0.2072, ctc_loss=0.1347, cr_loss=0.3623, over 21067.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3655, over 4093868.84 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:45:01,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=874854.0, ans=0.025 2024-09-19 05:45:09,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=874854.0, ans=0.025 2024-09-19 05:45:57,241 INFO [train.py:1198] (1/2) Epoch 49, batch 2050, loss[loss=0.234, ctc_loss=0.1553, cr_loss=0.3935, over 20962.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3656, over 4106030.92 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:46:26,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875024.0, ans=0.125 2024-09-19 05:46:29,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.262e+02 2.437e+02 2.521e+02 3.820e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 05:46:45,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=875052.3333333334, ans=0.025 2024-09-19 05:47:00,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=875080.6666666666, ans=0.125 2024-09-19 05:47:12,688 INFO [train.py:1198] (1/2) Epoch 49, batch 2100, loss[loss=0.2512, ctc_loss=0.1681, cr_loss=0.4159, over 20722.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3667, over 4098528.84 frames. ], batch size: 71, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:48:12,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=875222.3333333334, ans=0.025 2024-09-19 05:48:19,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-19 05:48:24,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875222.3333333334, ans=0.1 2024-09-19 05:48:28,560 INFO [train.py:1198] (1/2) Epoch 49, batch 2150, loss[loss=0.2058, ctc_loss=0.1336, cr_loss=0.3609, over 20973.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1406, cr_loss=0.3652, over 4086331.22 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:48:54,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=875279.0, ans=0.0 2024-09-19 05:48:57,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=875279.0, ans=0.05 2024-09-19 05:49:03,025 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.237e+02 2.459e+02 2.612e+02 3.793e+02, threshold=4.918e+02, percent-clipped=0.0 2024-09-19 05:49:03,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=875307.3333333334, ans=0.2 2024-09-19 05:49:46,298 INFO [train.py:1198] (1/2) Epoch 49, batch 2200, loss[loss=0.2065, ctc_loss=0.1364, cr_loss=0.3505, over 20890.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3672, over 4083178.87 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:49:56,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=875392.3333333334, ans=0.025 2024-09-19 05:50:05,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-19 05:51:05,259 INFO [train.py:1198] (1/2) Epoch 49, batch 2250, loss[loss=0.183, ctc_loss=0.1178, cr_loss=0.3258, over 20011.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1418, cr_loss=0.3682, over 4089091.77 frames. ], batch size: 44, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:51:37,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.293e+02 2.440e+02 2.592e+02 4.236e+02, threshold=4.879e+02, percent-clipped=0.0 2024-09-19 05:51:57,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=875619.0, ans=0.0 2024-09-19 05:52:18,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=875647.3333333334, ans=0.0 2024-09-19 05:52:18,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=875647.3333333334, ans=0.125 2024-09-19 05:52:20,910 INFO [train.py:1198] (1/2) Epoch 49, batch 2300, loss[loss=0.2467, ctc_loss=0.1629, cr_loss=0.4189, over 21005.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.369, over 4064493.38 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:52:25,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=875675.6666666666, ans=0.125 2024-09-19 05:52:32,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-19 05:52:49,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875732.3333333334, ans=0.125 2024-09-19 05:53:11,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-19 05:53:36,112 INFO [train.py:1198] (1/2) Epoch 49, batch 2350, loss[loss=0.2279, ctc_loss=0.152, cr_loss=0.3795, over 21064.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1434, cr_loss=0.3709, over 4079872.46 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:53:42,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=22.5 2024-09-19 05:53:49,895 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:54:07,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.283e+02 2.395e+02 2.536e+02 3.291e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 05:54:53,945 INFO [train.py:1198] (1/2) Epoch 49, batch 2400, loss[loss=0.2273, ctc_loss=0.1521, cr_loss=0.3759, over 21028.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.369, over 4074846.56 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:54:58,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=875959.0, ans=0.125 2024-09-19 05:55:08,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-19 05:55:18,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=875987.3333333334, ans=0.0 2024-09-19 05:55:22,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=876015.6666666666, ans=0.0 2024-09-19 05:55:34,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=876015.6666666666, ans=0.125 2024-09-19 05:55:34,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876015.6666666666, ans=0.0 2024-09-19 05:56:11,715 INFO [train.py:1198] (1/2) Epoch 49, batch 2450, loss[loss=0.1858, ctc_loss=0.1194, cr_loss=0.3321, over 20990.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3681, over 4081277.54 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:56:18,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=876100.6666666666, ans=0.025 2024-09-19 05:56:24,309 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:56:33,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=876129.0, ans=0.125 2024-09-19 05:56:43,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.206e+02 2.361e+02 2.528e+02 4.866e+02, threshold=4.722e+02, percent-clipped=1.0 2024-09-19 05:57:06,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=876185.6666666666, ans=0.125 2024-09-19 05:57:27,403 INFO [train.py:1198] (1/2) Epoch 49, batch 2500, loss[loss=0.1969, ctc_loss=0.1283, cr_loss=0.3433, over 21066.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3674, over 4082091.25 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:57:52,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=876270.6666666666, ans=0.125 2024-09-19 05:58:42,631 INFO [train.py:1198] (1/2) Epoch 49, batch 2550, loss[loss=0.2299, ctc_loss=0.1564, cr_loss=0.3674, over 19582.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3668, over 4091000.73 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:58:45,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876384.0, ans=0.1 2024-09-19 05:59:14,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.260e+02 2.421e+02 2.630e+02 4.143e+02, threshold=4.843e+02, percent-clipped=0.0 2024-09-19 05:59:28,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=876469.0, ans=0.125 2024-09-19 05:59:30,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-09-19 05:59:40,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=876469.0, ans=0.0 2024-09-19 05:59:47,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=876497.3333333334, ans=0.0 2024-09-19 05:59:58,004 INFO [train.py:1198] (1/2) Epoch 49, batch 2600, loss[loss=0.1854, ctc_loss=0.121, cr_loss=0.3216, over 20977.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1412, cr_loss=0.366, over 4095989.99 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:00:23,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=876554.0, ans=0.125 2024-09-19 06:00:31,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=876582.3333333334, ans=0.125 2024-09-19 06:00:44,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=876610.6666666666, ans=0.125 2024-09-19 06:00:49,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876610.6666666666, ans=0.125 2024-09-19 06:01:06,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876639.0, ans=0.1 2024-09-19 06:01:13,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=876639.0, ans=0.2 2024-09-19 06:01:16,427 INFO [train.py:1198] (1/2) Epoch 49, batch 2650, loss[loss=0.223, ctc_loss=0.1491, cr_loss=0.3695, over 20318.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3663, over 4100196.29 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:01:16,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=876667.3333333334, ans=0.125 2024-09-19 06:01:18,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-09-19 06:01:41,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=15.0 2024-09-19 06:01:51,436 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.246e+02 2.382e+02 2.567e+02 4.659e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-19 06:02:17,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=876752.3333333334, ans=0.125 2024-09-19 06:02:25,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876780.6666666666, ans=0.1 2024-09-19 06:02:33,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=876780.6666666666, ans=0.0 2024-09-19 06:02:34,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=876809.0, ans=0.125 2024-09-19 06:02:35,747 INFO [train.py:1198] (1/2) Epoch 49, batch 2700, loss[loss=0.2089, ctc_loss=0.14, cr_loss=0.3449, over 20643.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.3657, over 4090547.88 frames. ], batch size: 66, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:02:53,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=876837.3333333334, ans=0.025 2024-09-19 06:03:48,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=876922.3333333334, ans=0.0 2024-09-19 06:03:50,964 INFO [train.py:1198] (1/2) Epoch 49, batch 2750, loss[loss=0.2027, ctc_loss=0.1307, cr_loss=0.3598, over 20975.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.14, cr_loss=0.3641, over 4091948.97 frames. ], batch size: 48, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:04:03,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=876950.6666666666, ans=0.0 2024-09-19 06:04:22,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-19 06:04:22,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.269e+02 2.435e+02 2.609e+02 3.166e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 06:04:38,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=877035.6666666666, ans=0.0 2024-09-19 06:04:41,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-19 06:04:51,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877064.0, ans=0.1 2024-09-19 06:05:06,634 INFO [train.py:1198] (1/2) Epoch 49, batch 2800, loss[loss=0.2326, ctc_loss=0.1542, cr_loss=0.3919, over 20022.00 frames. ], tot_loss[loss=0.212, ctc_loss=0.1394, cr_loss=0.3632, over 4104397.67 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:05:18,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-19 06:05:43,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=877149.0, ans=0.2 2024-09-19 06:05:43,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2024-09-19 06:06:25,229 INFO [train.py:1198] (1/2) Epoch 49, batch 2850, loss[loss=0.2268, ctc_loss=0.1501, cr_loss=0.3838, over 21018.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1392, cr_loss=0.3628, over 4111649.40 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:06:31,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-19 06:06:35,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=877234.0, ans=0.015 2024-09-19 06:06:39,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-19 06:06:56,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.265e+02 2.353e+02 2.511e+02 6.871e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-19 06:06:58,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 06:07:01,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877290.6666666666, ans=0.125 2024-09-19 06:07:07,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=877290.6666666666, ans=0.125 2024-09-19 06:07:42,888 INFO [train.py:1198] (1/2) Epoch 49, batch 2900, loss[loss=0.2061, ctc_loss=0.1337, cr_loss=0.3619, over 21052.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3665, over 4105690.42 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:08:10,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-19 06:08:43,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=877489.0, ans=0.125 2024-09-19 06:08:58,499 INFO [train.py:1198] (1/2) Epoch 49, batch 2950, loss[loss=0.2367, ctc_loss=0.1575, cr_loss=0.396, over 20961.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3672, over 4115053.05 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:09:07,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877517.3333333334, ans=0.1 2024-09-19 06:09:21,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=877545.6666666666, ans=0.0 2024-09-19 06:09:29,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.260e+02 2.394e+02 2.539e+02 2.981e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 06:09:36,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-19 06:09:59,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877630.6666666666, ans=0.125 2024-09-19 06:10:12,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=877659.0, ans=0.2 2024-09-19 06:10:13,865 INFO [train.py:1198] (1/2) Epoch 49, batch 3000, loss[loss=0.2025, ctc_loss=0.1312, cr_loss=0.3561, over 20877.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.367, over 4113732.67 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:10:13,865 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 06:10:24,748 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1747, 3.1224, 2.5515, 2.9503], device='cuda:1') 2024-09-19 06:10:31,741 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.03905, ctc_loss=0.03905, cr_loss=1.593e-14, over 944034.00 frames. 2024-09-19 06:10:31,741 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 06:10:42,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877659.0, ans=0.125 2024-09-19 06:10:46,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-09-19 06:11:37,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877772.3333333334, ans=0.1 2024-09-19 06:11:50,771 INFO [train.py:1198] (1/2) Epoch 49, batch 3050, loss[loss=0.1872, ctc_loss=0.1228, cr_loss=0.3222, over 20984.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3675, over 4116964.57 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:12:10,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=877829.0, ans=0.125 2024-09-19 06:12:22,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.331e+02 2.450e+02 2.618e+02 3.633e+02, threshold=4.901e+02, percent-clipped=0.0 2024-09-19 06:12:33,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=877857.3333333334, ans=0.0 2024-09-19 06:12:38,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=877885.6666666666, ans=0.2 2024-09-19 06:12:46,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=877885.6666666666, ans=0.125 2024-09-19 06:12:56,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=877914.0, ans=0.125 2024-09-19 06:13:10,094 INFO [train.py:1198] (1/2) Epoch 49, batch 3100, loss[loss=0.2727, ctc_loss=0.1882, cr_loss=0.4224, over 14697.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3674, over 4110920.51 frames. ], batch size: 150, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:13:19,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877942.3333333334, ans=0.1 2024-09-19 06:13:36,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=877970.6666666666, ans=0.125 2024-09-19 06:13:46,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=22.5 2024-09-19 06:14:02,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=878027.3333333334, ans=0.125 2024-09-19 06:14:20,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878055.6666666666, ans=0.1 2024-09-19 06:14:20,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-19 06:14:25,885 INFO [train.py:1198] (1/2) Epoch 49, batch 3150, loss[loss=0.2435, ctc_loss=0.1645, cr_loss=0.3953, over 20043.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.368, over 4100588.82 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:14:41,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=878112.3333333334, ans=0.0 2024-09-19 06:14:43,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-19 06:14:59,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.293e+02 2.399e+02 2.527e+02 8.432e+02, threshold=4.798e+02, percent-clipped=1.0 2024-09-19 06:15:41,636 INFO [train.py:1198] (1/2) Epoch 49, batch 3200, loss[loss=0.2375, ctc_loss=0.1551, cr_loss=0.4121, over 20303.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3656, over 4110099.03 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:15:44,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=878225.6666666666, ans=0.125 2024-09-19 06:16:24,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878282.3333333334, ans=0.1 2024-09-19 06:16:57,213 INFO [train.py:1198] (1/2) Epoch 49, batch 3250, loss[loss=0.2464, ctc_loss=0.1702, cr_loss=0.3812, over 14374.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1402, cr_loss=0.3649, over 4110802.44 frames. ], batch size: 149, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:17:33,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.268e+02 2.398e+02 2.600e+02 3.640e+02, threshold=4.797e+02, percent-clipped=0.0 2024-09-19 06:17:38,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-19 06:17:58,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=878452.3333333334, ans=0.125 2024-09-19 06:18:13,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-19 06:18:16,014 INFO [train.py:1198] (1/2) Epoch 49, batch 3300, loss[loss=0.2192, ctc_loss=0.1437, cr_loss=0.3776, over 20781.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1401, cr_loss=0.3646, over 4121155.33 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:19:22,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=878622.3333333334, ans=0.125 2024-09-19 06:19:34,738 INFO [train.py:1198] (1/2) Epoch 49, batch 3350, loss[loss=0.2083, ctc_loss=0.1369, cr_loss=0.357, over 20939.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1404, cr_loss=0.3652, over 4117886.11 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:20:08,228 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.294e+02 2.391e+02 2.564e+02 4.774e+02, threshold=4.781e+02, percent-clipped=0.0 2024-09-19 06:20:20,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=878735.6666666666, ans=0.125 2024-09-19 06:20:20,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-19 06:20:22,305 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:20:25,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878735.6666666666, ans=0.125 2024-09-19 06:20:49,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=878792.3333333334, ans=0.125 2024-09-19 06:20:50,622 INFO [train.py:1198] (1/2) Epoch 49, batch 3400, loss[loss=0.203, ctc_loss=0.1319, cr_loss=0.3553, over 21004.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3647, over 4114925.62 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:21:10,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=878820.6666666666, ans=0.125 2024-09-19 06:21:29,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-09-19 06:22:06,312 INFO [train.py:1198] (1/2) Epoch 49, batch 3450, loss[loss=0.1844, ctc_loss=0.1196, cr_loss=0.3241, over 21058.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1398, cr_loss=0.3638, over 4114886.01 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:22:07,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-19 06:22:09,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=878934.0, ans=0.0 2024-09-19 06:22:23,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-09-19 06:22:39,475 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.282e+02 2.423e+02 2.611e+02 3.875e+02, threshold=4.846e+02, percent-clipped=0.0 2024-09-19 06:22:47,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-19 06:22:53,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-19 06:23:17,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-19 06:23:24,847 INFO [train.py:1198] (1/2) Epoch 49, batch 3500, loss[loss=0.2003, ctc_loss=0.1291, cr_loss=0.3559, over 21049.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1399, cr_loss=0.3648, over 4116059.71 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:23:41,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=879104.0, ans=0.125 2024-09-19 06:23:44,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879104.0, ans=0.1 2024-09-19 06:24:43,638 INFO [train.py:1198] (1/2) Epoch 49, batch 3550, loss[loss=0.2081, ctc_loss=0.1368, cr_loss=0.3565, over 21034.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3659, over 4116065.82 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:24:58,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=879245.6666666666, ans=0.0 2024-09-19 06:25:16,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.268e+02 2.388e+02 2.557e+02 3.888e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-19 06:25:25,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-19 06:25:27,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879302.3333333334, ans=0.1 2024-09-19 06:25:57,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=879359.0, ans=0.0 2024-09-19 06:25:58,667 INFO [train.py:1198] (1/2) Epoch 49, batch 3600, loss[loss=0.1715, ctc_loss=0.1092, cr_loss=0.3119, over 20931.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3671, over 4117407.77 frames. ], batch size: 48, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:26:08,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=879359.0, ans=0.125 2024-09-19 06:26:20,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-19 06:26:54,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879444.0, ans=0.125 2024-09-19 06:27:14,429 INFO [train.py:1198] (1/2) Epoch 49, batch 3650, loss[loss=0.2306, ctc_loss=0.1526, cr_loss=0.3904, over 19956.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3667, over 4105495.39 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:27:25,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=879500.6666666666, ans=0.125 2024-09-19 06:27:40,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-19 06:27:47,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.233e+02 2.351e+02 2.545e+02 5.315e+02, threshold=4.701e+02, percent-clipped=1.0 2024-09-19 06:28:23,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-19 06:28:30,070 INFO [train.py:1198] (1/2) Epoch 49, batch 3700, loss[loss=0.2088, ctc_loss=0.1347, cr_loss=0.3703, over 21056.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3663, over 4091157.62 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:29:00,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879670.6666666666, ans=0.1 2024-09-19 06:29:09,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=879699.0, ans=0.0 2024-09-19 06:29:24,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=879727.3333333334, ans=0.2 2024-09-19 06:29:40,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=879755.6666666666, ans=0.125 2024-09-19 06:29:48,135 INFO [train.py:1198] (1/2) Epoch 49, batch 3750, loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3459, over 20973.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3661, over 4093371.27 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:30:00,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=879784.0, ans=0.025 2024-09-19 06:30:07,904 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:30:25,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.220e+02 2.318e+02 2.478e+02 2.994e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-19 06:30:43,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=879869.0, ans=0.0 2024-09-19 06:30:49,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=879897.3333333334, ans=0.2 2024-09-19 06:31:06,101 INFO [train.py:1198] (1/2) Epoch 49, batch 3800, loss[loss=0.2367, ctc_loss=0.1567, cr_loss=0.4, over 20868.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3661, over 4083106.39 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:32:21,717 INFO [train.py:1198] (1/2) Epoch 49, batch 3850, loss[loss=0.2559, ctc_loss=0.1727, cr_loss=0.4162, over 20649.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.3669, over 4081781.43 frames. ], batch size: 66, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:32:27,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-19 06:32:56,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.261e+02 2.383e+02 2.507e+02 3.054e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-19 06:32:58,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=880124.0, ans=0.2 2024-09-19 06:33:13,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=880152.3333333334, ans=0.025 2024-09-19 06:33:32,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=880180.6666666666, ans=0.125 2024-09-19 06:33:37,911 INFO [train.py:1198] (1/2) Epoch 49, batch 3900, loss[loss=0.2113, ctc_loss=0.1386, cr_loss=0.3638, over 21059.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.369, over 4077165.97 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:33:52,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=880237.3333333334, ans=0.125 2024-09-19 06:33:52,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=880237.3333333334, ans=0.2 2024-09-19 06:34:02,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880237.3333333334, ans=0.1 2024-09-19 06:34:27,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-19 06:34:28,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880294.0, ans=0.1 2024-09-19 06:34:56,590 INFO [train.py:1198] (1/2) Epoch 49, batch 3950, loss[loss=0.1962, ctc_loss=0.1278, cr_loss=0.3419, over 20873.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3695, over 4080509.62 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:34:56,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=880350.6666666666, ans=0.025 2024-09-19 06:35:31,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.290e+02 2.471e+02 2.656e+02 3.794e+02, threshold=4.941e+02, percent-clipped=0.0 2024-09-19 06:35:54,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=880435.6666666666, ans=0.0 2024-09-19 06:36:07,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=880464.0, ans=0.125 2024-09-19 06:36:14,956 INFO [train.py:1198] (1/2) Epoch 49, batch 4000, loss[loss=0.1896, ctc_loss=0.1222, cr_loss=0.3372, over 20992.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1422, cr_loss=0.3698, over 4092451.88 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:36:25,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=880492.3333333334, ans=0.125 2024-09-19 06:37:31,307 INFO [train.py:1198] (1/2) Epoch 49, batch 4050, loss[loss=0.2378, ctc_loss=0.1559, cr_loss=0.4099, over 20596.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1413, cr_loss=0.3685, over 4104538.62 frames. ], batch size: 71, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:37:47,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-19 06:37:57,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=880662.3333333334, ans=0.0 2024-09-19 06:37:57,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2024-09-19 06:38:06,195 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.291e+02 2.397e+02 2.530e+02 3.380e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-19 06:38:14,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880690.6666666666, ans=0.125 2024-09-19 06:38:15,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=880719.0, ans=0.125 2024-09-19 06:38:23,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=880719.0, ans=0.0 2024-09-19 06:38:47,374 INFO [train.py:1198] (1/2) Epoch 49, batch 4100, loss[loss=0.2084, ctc_loss=0.1356, cr_loss=0.3642, over 21058.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.368, over 4095559.33 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:38:47,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=880775.6666666666, ans=0.0 2024-09-19 06:38:58,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=880775.6666666666, ans=0.025 2024-09-19 06:39:19,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=880832.3333333334, ans=0.0 2024-09-19 06:39:22,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=880832.3333333334, ans=0.125 2024-09-19 06:39:48,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=880889.0, ans=0.2 2024-09-19 06:40:03,588 INFO [train.py:1198] (1/2) Epoch 49, batch 4150, loss[loss=0.2138, ctc_loss=0.1384, cr_loss=0.3772, over 21012.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3677, over 4096234.52 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:40:03,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=880917.3333333334, ans=0.125 2024-09-19 06:40:06,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=880917.3333333334, ans=0.0 2024-09-19 06:40:13,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-19 06:40:33,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=880945.6666666666, ans=0.125 2024-09-19 06:40:41,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.294e+02 2.413e+02 2.622e+02 3.774e+02, threshold=4.826e+02, percent-clipped=0.0 2024-09-19 06:40:44,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=880974.0, ans=0.125 2024-09-19 06:40:57,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=881002.3333333334, ans=0.125 2024-09-19 06:41:13,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=881030.6666666666, ans=0.125 2024-09-19 06:41:21,784 INFO [train.py:1198] (1/2) Epoch 49, batch 4200, loss[loss=0.2355, ctc_loss=0.1582, cr_loss=0.3865, over 20416.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1418, cr_loss=0.369, over 4100703.11 frames. ], batch size: 74, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:41:25,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881059.0, ans=0.1 2024-09-19 06:42:11,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-19 06:42:18,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=881144.0, ans=0.025 2024-09-19 06:42:22,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=881144.0, ans=0.015 2024-09-19 06:42:23,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=881144.0, ans=0.0 2024-09-19 06:42:41,261 INFO [train.py:1198] (1/2) Epoch 49, batch 4250, loss[loss=0.1958, ctc_loss=0.1293, cr_loss=0.3325, over 21003.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.369, over 4110598.28 frames. ], batch size: 63, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:42:58,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881229.0, ans=0.1 2024-09-19 06:43:16,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.285e+02 2.411e+02 2.605e+02 4.016e+02, threshold=4.822e+02, percent-clipped=0.0 2024-09-19 06:43:39,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=881285.6666666666, ans=0.07 2024-09-19 06:43:56,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=881342.3333333334, ans=0.125 2024-09-19 06:43:57,213 INFO [train.py:1198] (1/2) Epoch 49, batch 4300, loss[loss=0.236, ctc_loss=0.1585, cr_loss=0.3875, over 20087.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1423, cr_loss=0.3698, over 4115493.06 frames. ], batch size: 80, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:44:23,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=881370.6666666666, ans=0.125 2024-09-19 06:44:39,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=881399.0, ans=0.0 2024-09-19 06:44:44,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-19 06:45:04,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=881455.6666666666, ans=0.125 2024-09-19 06:45:12,897 INFO [train.py:1198] (1/2) Epoch 49, batch 4350, loss[loss=0.181, ctc_loss=0.1186, cr_loss=0.3119, over 19871.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1415, cr_loss=0.3683, over 4116684.46 frames. ], batch size: 44, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:45:27,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=881512.3333333334, ans=0.125 2024-09-19 06:45:48,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.276e+02 2.445e+02 2.604e+02 3.378e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-19 06:45:54,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=881540.6666666666, ans=0.0 2024-09-19 06:46:18,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2024-09-19 06:46:32,163 INFO [train.py:1198] (1/2) Epoch 49, batch 4400, loss[loss=0.2446, ctc_loss=0.166, cr_loss=0.3929, over 14648.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1421, cr_loss=0.3691, over 4100213.38 frames. ], batch size: 149, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:47:07,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=881682.3333333334, ans=0.0 2024-09-19 06:47:32,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-19 06:47:48,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881739.0, ans=0.1 2024-09-19 06:47:51,087 INFO [train.py:1198] (1/2) Epoch 49, batch 4450, loss[loss=0.2258, ctc_loss=0.1495, cr_loss=0.3817, over 20915.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3687, over 4086783.71 frames. ], batch size: 60, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:48:12,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=881795.6666666666, ans=0.0 2024-09-19 06:48:16,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-19 06:48:27,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.320e+02 2.495e+02 2.668e+02 3.983e+02, threshold=4.991e+02, percent-clipped=0.0 2024-09-19 06:48:34,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=881824.0, ans=0.125 2024-09-19 06:49:07,096 INFO [train.py:1198] (1/2) Epoch 49, batch 4500, loss[loss=0.1812, ctc_loss=0.1171, cr_loss=0.3206, over 20272.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3692, over 4085081.31 frames. ], batch size: 45, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:49:10,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=881909.0, ans=0.0 2024-09-19 06:49:15,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:49:18,253 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:49:21,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2024-09-19 06:49:31,828 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:50:12,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=882022.3333333334, ans=0.2 2024-09-19 06:50:19,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=12.0 2024-09-19 06:50:20,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=882022.3333333334, ans=0.125 2024-09-19 06:50:23,130 INFO [train.py:1198] (1/2) Epoch 49, batch 4550, loss[loss=0.2434, ctc_loss=0.162, cr_loss=0.4071, over 20361.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1425, cr_loss=0.3699, over 4083224.89 frames. ], batch size: 74, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:50:59,451 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.229e+02 2.333e+02 2.525e+02 5.159e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-19 06:51:38,779 INFO [train.py:1198] (1/2) Epoch 49, batch 4600, loss[loss=0.1741, ctc_loss=0.1129, cr_loss=0.3058, over 20955.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1419, cr_loss=0.3691, over 4088930.10 frames. ], batch size: 50, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:51:57,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=882220.6666666666, ans=0.125 2024-09-19 06:52:57,232 INFO [train.py:1198] (1/2) Epoch 49, batch 4650, loss[loss=0.2192, ctc_loss=0.1442, cr_loss=0.3751, over 21007.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1418, cr_loss=0.3686, over 4099958.04 frames. ], batch size: 61, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:53:32,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=882390.6666666666, ans=0.2 2024-09-19 06:53:32,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=882390.6666666666, ans=0.0 2024-09-19 06:53:33,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=882390.6666666666, ans=0.125 2024-09-19 06:53:36,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.308e+02 2.420e+02 2.560e+02 3.188e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-19 06:53:41,208 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:53:56,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=882419.0, ans=0.125 2024-09-19 06:54:06,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=12.0 2024-09-19 06:54:15,450 INFO [train.py:1198] (1/2) Epoch 49, batch 4700, loss[loss=0.1864, ctc_loss=0.1211, cr_loss=0.3263, over 20999.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1413, cr_loss=0.3679, over 4100923.90 frames. ], batch size: 52, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:54:18,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=882475.6666666666, ans=0.125 2024-09-19 06:54:20,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882475.6666666666, ans=0.125 2024-09-19 06:54:47,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882532.3333333334, ans=0.125 2024-09-19 06:54:55,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882532.3333333334, ans=0.125 2024-09-19 06:55:09,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.82 vs. limit=10.0 2024-09-19 06:55:15,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.06 vs. limit=6.0 2024-09-19 06:55:31,411 INFO [train.py:1198] (1/2) Epoch 49, batch 4750, loss[loss=0.1725, ctc_loss=0.1117, cr_loss=0.3037, over 20938.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3667, over 4110745.72 frames. ], batch size: 51, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:55:58,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-09-19 06:56:06,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-09-19 06:56:07,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.318e+02 2.432e+02 2.570e+02 3.803e+02, threshold=4.863e+02, percent-clipped=0.0 2024-09-19 06:56:46,449 INFO [train.py:1198] (1/2) Epoch 49, batch 4800, loss[loss=0.2567, ctc_loss=0.1698, cr_loss=0.4348, over 20961.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.141, cr_loss=0.3677, over 4111431.55 frames. ], batch size: 64, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:56:49,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=882759.0, ans=0.125 2024-09-19 06:56:56,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-19 06:57:15,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=882815.6666666666, ans=0.125 2024-09-19 06:57:36,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=882844.0, ans=0.0 2024-09-19 06:57:36,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=882844.0, ans=0.0 2024-09-19 06:57:44,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-19 06:57:50,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=882872.3333333334, ans=0.2 2024-09-19 06:57:52,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2024-09-19 06:58:00,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=882872.3333333334, ans=0.0 2024-09-19 06:58:02,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-19 06:58:05,087 INFO [train.py:1198] (1/2) Epoch 49, batch 4850, loss[loss=0.2457, ctc_loss=0.1622, cr_loss=0.4178, over 19501.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1414, cr_loss=0.3685, over 4107396.52 frames. ], batch size: 90, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:58:40,940 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.266e+02 2.402e+02 2.535e+02 8.779e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-19 06:58:44,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-19 06:59:04,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=15.0 2024-09-19 06:59:12,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883014.0, ans=0.1 2024-09-19 06:59:16,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883014.0, ans=0.125 2024-09-19 06:59:23,969 INFO [train.py:1198] (1/2) Epoch 49, batch 4900, loss[loss=0.2354, ctc_loss=0.1562, cr_loss=0.3955, over 20939.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1423, cr_loss=0.3698, over 4101416.73 frames. ], batch size: 64, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:59:24,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=883042.3333333334, ans=0.125 2024-09-19 06:59:31,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=883042.3333333334, ans=0.125 2024-09-19 06:59:34,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=883042.3333333334, ans=0.0 2024-09-19 06:59:46,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=883070.6666666666, ans=0.125 2024-09-19 07:00:27,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883155.6666666666, ans=0.1 2024-09-19 07:00:30,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-19 07:00:39,005 INFO [train.py:1198] (1/2) Epoch 49, batch 4950, loss[loss=0.1824, ctc_loss=0.1189, cr_loss=0.3172, over 20953.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.3689, over 4095371.97 frames. ], batch size: 50, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:00:49,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-09-19 07:00:52,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=883212.3333333334, ans=0.0 2024-09-19 07:01:14,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.287e+02 2.498e+02 2.641e+02 4.233e+02, threshold=4.997e+02, percent-clipped=0.0 2024-09-19 07:01:17,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=883240.6666666666, ans=0.09899494936611666 2024-09-19 07:01:53,349 INFO [train.py:1198] (1/2) Epoch 49, batch 5000, loss[loss=0.2075, ctc_loss=0.1364, cr_loss=0.3555, over 20996.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1425, cr_loss=0.3697, over 4102330.23 frames. ], batch size: 55, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:02:38,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883410.6666666666, ans=0.125 2024-09-19 07:03:08,168 INFO [train.py:1198] (1/2) Epoch 49, batch 5050, loss[loss=0.2126, ctc_loss=0.1377, cr_loss=0.3748, over 21068.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1425, cr_loss=0.3695, over 4096180.21 frames. ], batch size: 59, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:03:43,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.281e+02 2.422e+02 2.560e+02 3.112e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-19 07:04:23,215 INFO [train.py:1198] (1/2) Epoch 49, batch 5100, loss[loss=0.2151, ctc_loss=0.1395, cr_loss=0.3781, over 20777.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1425, cr_loss=0.37, over 4107770.58 frames. ], batch size: 56, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:04:47,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=883637.3333333334, ans=0.2 2024-09-19 07:04:59,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=883665.6666666666, ans=0.2 2024-09-19 07:05:35,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-19 07:05:38,104 INFO [train.py:1198] (1/2) Epoch 49, batch 5150, loss[loss=0.2325, ctc_loss=0.1538, cr_loss=0.3934, over 21029.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1422, cr_loss=0.3695, over 4102262.97 frames. ], batch size: 62, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:06:04,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=12.0 2024-09-19 07:06:16,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.260e+02 2.417e+02 2.553e+02 3.484e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-19 07:06:19,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=883807.3333333334, ans=0.0 2024-09-19 07:06:33,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-09-19 07:06:55,547 INFO [train.py:1198] (1/2) Epoch 49, batch 5200, loss[loss=0.2001, ctc_loss=0.1313, cr_loss=0.3438, over 20883.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1414, cr_loss=0.3678, over 4116348.54 frames. ], batch size: 54, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:07:04,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=883892.3333333334, ans=0.0 2024-09-19 07:07:07,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=883892.3333333334, ans=0.04949747468305833 2024-09-19 07:07:15,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=12.0 2024-09-19 07:07:22,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=883920.6666666666, ans=0.0 2024-09-19 07:07:25,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=883949.0, ans=0.125 2024-09-19 07:07:42,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-19 07:08:13,058 INFO [train.py:1198] (1/2) Epoch 49, batch 5250, loss[loss=0.2253, ctc_loss=0.1474, cr_loss=0.3893, over 20840.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3683, over 4098401.03 frames. ], batch size: 65, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:08:28,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=884062.3333333334, ans=0.125 2024-09-19 07:08:43,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884090.6666666666, ans=0.1 2024-09-19 07:08:48,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.316e+02 2.504e+02 2.645e+02 6.124e+02, threshold=5.008e+02, percent-clipped=1.0 2024-09-19 07:09:13,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=884147.3333333334, ans=0.025 2024-09-19 07:09:26,946 INFO [train.py:1198] (1/2) Epoch 49, batch 5300, loss[loss=0.1872, ctc_loss=0.1199, cr_loss=0.3366, over 20952.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3666, over 4090209.31 frames. ], batch size: 49, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:09:36,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884175.6666666666, ans=0.1 2024-09-19 07:10:00,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=884232.3333333334, ans=0.125 2024-09-19 07:10:01,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=884232.3333333334, ans=0.09899494936611666 2024-09-19 07:10:30,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=884289.0, ans=0.0 2024-09-19 07:10:41,392 INFO [train.py:1198] (1/2) Epoch 49, batch 5350, loss[loss=0.1988, ctc_loss=0.1307, cr_loss=0.3408, over 21006.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3681, over 4086788.63 frames. ], batch size: 52, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:10:49,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884317.3333333334, ans=0.125 2024-09-19 07:11:16,952 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.327e+02 2.505e+02 2.655e+02 3.238e+02, threshold=5.009e+02, percent-clipped=0.0 2024-09-19 07:11:24,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884402.3333333334, ans=0.1 2024-09-19 07:11:30,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=884402.3333333334, ans=0.0 2024-09-19 07:11:41,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=884430.6666666666, ans=0.0 2024-09-19 07:11:56,033 INFO [train.py:1198] (1/2) Epoch 49, batch 5400, loss[loss=0.2357, ctc_loss=0.1538, cr_loss=0.4094, over 20967.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3688, over 4098379.62 frames. ], batch size: 64, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:12:07,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-19 07:12:29,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=884515.6666666666, ans=0.025 2024-09-19 07:12:31,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884515.6666666666, ans=0.1 2024-09-19 07:12:56,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=884572.3333333334, ans=0.125 2024-09-19 07:13:10,985 INFO [train.py:1198] (1/2) Epoch 49, batch 5450, loss[loss=0.2153, ctc_loss=0.1409, cr_loss=0.3718, over 21044.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3691, over 4101403.35 frames. ], batch size: 62, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:13:23,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=884600.6666666666, ans=0.0 2024-09-19 07:13:24,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884629.0, ans=0.1 2024-09-19 07:13:29,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884629.0, ans=0.125 2024-09-19 07:13:46,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.275e+02 2.412e+02 2.577e+02 4.328e+02, threshold=4.823e+02, percent-clipped=0.0 2024-09-19 07:13:48,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=15.0 2024-09-19 07:14:15,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=884714.0, ans=0.0 2024-09-19 07:14:25,267 INFO [train.py:1198] (1/2) Epoch 49, batch 5500, loss[loss=0.1888, ctc_loss=0.1237, cr_loss=0.3254, over 20935.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3677, over 4097704.56 frames. ], batch size: 50, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:15:01,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884799.0, ans=0.1 2024-09-19 07:15:10,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=15.0 2024-09-19 07:15:11,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=884827.3333333334, ans=0.04949747468305833 2024-09-19 07:15:26,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=884855.6666666666, ans=0.0 2024-09-19 07:15:26,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=884855.6666666666, ans=0.125 2024-09-19 07:15:30,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=884855.6666666666, ans=0.2 2024-09-19 07:15:35,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-19 07:15:39,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2024-09-19 07:15:42,124 INFO [train.py:1198] (1/2) Epoch 49, batch 5550, loss[loss=0.1649, ctc_loss=0.107, cr_loss=0.2892, over 20953.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.368, over 4101153.93 frames. ], batch size: 48, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 07:16:19,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.310e+02 2.440e+02 2.629e+02 5.251e+02, threshold=4.879e+02, percent-clipped=1.0 2024-09-19 07:16:31,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884969.0, ans=0.1 2024-09-19 07:16:32,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=884969.0, ans=0.0 2024-09-19 07:16:56,583 INFO [train.py:1198] (1/2) Epoch 49, batch 5600, loss[loss=0.2138, ctc_loss=0.1404, cr_loss=0.3669, over 21061.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3682, over 4099538.15 frames. ], batch size: 59, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:17:01,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885025.6666666666, ans=0.1 2024-09-19 07:17:04,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=885025.6666666666, ans=0.05 2024-09-19 07:17:10,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=885054.0, ans=0.125 2024-09-19 07:17:19,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=885054.0, ans=0.0 2024-09-19 07:17:27,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=885082.3333333334, ans=0.125 2024-09-19 07:17:31,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885082.3333333334, ans=0.1 2024-09-19 07:17:54,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=885110.6666666666, ans=12.0 2024-09-19 07:18:13,441 INFO [train.py:1198] (1/2) Epoch 49, batch 5650, loss[loss=0.2179, ctc_loss=0.1422, cr_loss=0.3787, over 21046.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3671, over 4090261.29 frames. ], batch size: 56, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:18:21,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=885167.3333333334, ans=0.125 2024-09-19 07:18:31,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885195.6666666666, ans=0.1 2024-09-19 07:18:35,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885195.6666666666, ans=0.1 2024-09-19 07:18:50,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.302e+02 2.418e+02 2.566e+02 3.557e+02, threshold=4.836e+02, percent-clipped=0.0 2024-09-19 07:18:52,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-19 07:19:01,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-19 07:19:27,588 INFO [train.py:1198] (1/2) Epoch 49, batch 5700, loss[loss=0.2259, ctc_loss=0.1496, cr_loss=0.3815, over 20130.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3674, over 4100118.16 frames. ], batch size: 80, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:19:35,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=885309.0, ans=0.125 2024-09-19 07:19:50,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=885337.3333333334, ans=0.2 2024-09-19 07:19:56,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-19 07:20:08,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-19 07:20:24,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885394.0, ans=0.1 2024-09-19 07:20:38,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-09-19 07:20:41,434 INFO [train.py:1198] (1/2) Epoch 49, batch 5750, loss[loss=0.173, ctc_loss=0.1117, cr_loss=0.3065, over 20997.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 4104055.38 frames. ], batch size: 51, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:21:02,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=885479.0, ans=0.0 2024-09-19 07:21:06,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-19 07:21:08,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=885479.0, ans=0.125 2024-09-19 07:21:10,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.12 vs. limit=6.0 2024-09-19 07:21:18,603 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.306e+02 2.426e+02 2.589e+02 3.226e+02, threshold=4.851e+02, percent-clipped=0.0 2024-09-19 07:21:26,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885535.6666666666, ans=0.1 2024-09-19 07:21:37,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=885535.6666666666, ans=0.025 2024-09-19 07:21:44,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=885564.0, ans=0.04949747468305833 2024-09-19 07:21:50,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=885564.0, ans=0.125 2024-09-19 07:21:56,107 INFO [train.py:1198] (1/2) Epoch 49, batch 5800, loss[loss=0.216, ctc_loss=0.1415, cr_loss=0.3723, over 20959.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3678, over 4104753.50 frames. ], batch size: 60, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:22:01,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-09-19 07:22:08,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=885592.3333333334, ans=0.2 2024-09-19 07:22:09,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=885620.6666666666, ans=0.07 2024-09-19 07:22:16,980 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:22:36,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2024-09-19 07:22:42,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885677.3333333334, ans=0.125 2024-09-19 07:22:48,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=885677.3333333334, ans=0.125 2024-09-19 07:22:48,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=885677.3333333334, ans=0.0 2024-09-19 07:22:59,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=885705.6666666666, ans=0.2 2024-09-19 07:23:11,147 INFO [train.py:1198] (1/2) Epoch 49, batch 5850, loss[loss=0.262, ctc_loss=0.1771, cr_loss=0.4245, over 18386.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3675, over 4096424.77 frames. ], batch size: 108, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:23:17,651 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:23:25,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=22.5 2024-09-19 07:23:50,808 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.257e+02 2.371e+02 2.513e+02 3.999e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-19 07:24:00,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=885819.0, ans=0.0 2024-09-19 07:24:00,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-19 07:24:28,050 INFO [train.py:1198] (1/2) Epoch 49, batch 5900, loss[loss=0.1752, ctc_loss=0.1136, cr_loss=0.308, over 20972.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3688, over 4079904.84 frames. ], batch size: 51, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:24:56,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=885932.3333333334, ans=0.125 2024-09-19 07:25:02,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=885932.3333333334, ans=0.0 2024-09-19 07:25:04,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=885932.3333333334, ans=0.0 2024-09-19 07:25:42,627 INFO [train.py:1198] (1/2) Epoch 49, batch 5950, loss[loss=0.2065, ctc_loss=0.1361, cr_loss=0.3518, over 20938.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3688, over 4084663.41 frames. ], batch size: 60, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:25:46,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886017.3333333334, ans=0.1 2024-09-19 07:26:01,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=886045.6666666666, ans=0.125 2024-09-19 07:26:09,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886045.6666666666, ans=0.1 2024-09-19 07:26:22,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.336e+02 2.472e+02 2.589e+02 3.765e+02, threshold=4.944e+02, percent-clipped=0.0 2024-09-19 07:26:31,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=886102.3333333334, ans=0.2 2024-09-19 07:26:53,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886130.6666666666, ans=0.1 2024-09-19 07:26:59,445 INFO [train.py:1198] (1/2) Epoch 49, batch 6000, loss[loss=0.1941, ctc_loss=0.127, cr_loss=0.3358, over 20994.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3678, over 4099586.82 frames. ], batch size: 63, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:26:59,445 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 07:27:15,747 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.6515, 6.4048, 6.2520, 5.7769], device='cuda:1') 2024-09-19 07:27:18,262 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.03867, ctc_loss=0.03867, cr_loss=1.6e-14, over 944034.00 frames. 2024-09-19 07:27:18,262 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 07:27:20,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=886159.0, ans=0.0 2024-09-19 07:27:26,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=886159.0, ans=0.125 2024-09-19 07:27:29,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-19 07:28:31,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-19 07:28:33,454 INFO [train.py:1198] (1/2) Epoch 49, batch 6050, loss[loss=0.224, ctc_loss=0.1447, cr_loss=0.3967, over 20971.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3669, over 4101089.93 frames. ], batch size: 58, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:28:49,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886329.0, ans=0.125 2024-09-19 07:28:57,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2024-09-19 07:29:10,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.265e+02 2.419e+02 2.591e+02 5.269e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-19 07:29:18,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=886385.6666666666, ans=0.125 2024-09-19 07:29:26,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886385.6666666666, ans=0.1 2024-09-19 07:29:27,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=886385.6666666666, ans=0.125 2024-09-19 07:29:30,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=886385.6666666666, ans=0.125 2024-09-19 07:29:48,151 INFO [train.py:1198] (1/2) Epoch 49, batch 6100, loss[loss=0.2275, ctc_loss=0.1475, cr_loss=0.4003, over 20685.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3671, over 4094589.06 frames. ], batch size: 66, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:30:00,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-09-19 07:30:14,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=886470.6666666666, ans=0.0 2024-09-19 07:30:25,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=886499.0, ans=0.04949747468305833 2024-09-19 07:30:42,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=886527.3333333334, ans=0.025 2024-09-19 07:31:02,124 INFO [train.py:1198] (1/2) Epoch 49, batch 6150, loss[loss=0.2311, ctc_loss=0.155, cr_loss=0.3806, over 19995.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3679, over 4091796.98 frames. ], batch size: 80, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:31:05,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=886584.0, ans=0.0 2024-09-19 07:31:31,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=886640.6666666666, ans=0.125 2024-09-19 07:31:33,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886640.6666666666, ans=0.1 2024-09-19 07:31:40,259 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.294e+02 2.443e+02 2.619e+02 4.038e+02, threshold=4.886e+02, percent-clipped=0.0 2024-09-19 07:31:46,321 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:32:16,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=886725.6666666666, ans=0.0 2024-09-19 07:32:17,256 INFO [train.py:1198] (1/2) Epoch 49, batch 6200, loss[loss=0.2329, ctc_loss=0.154, cr_loss=0.3943, over 20966.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1445, cr_loss=0.3716, over 4062630.04 frames. ], batch size: 58, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:32:37,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886754.0, ans=0.1 2024-09-19 07:33:31,745 INFO [train.py:1198] (1/2) Epoch 49, batch 6250, loss[loss=0.2127, ctc_loss=0.1402, cr_loss=0.3625, over 20724.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1466, cr_loss=0.3739, over 4009251.45 frames. ], batch size: 71, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:33:51,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=886895.6666666666, ans=0.0 2024-09-19 07:34:08,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.182e+02 2.374e+02 2.493e+02 2.740e+02 6.672e+02, threshold=4.985e+02, percent-clipped=2.0 2024-09-19 07:34:09,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=886924.0, ans=0.0 2024-09-19 07:34:22,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886952.3333333334, ans=0.1 2024-09-19 07:34:44,933 INFO [train.py:1198] (1/2) Epoch 49, batch 6300, loss[loss=0.227, ctc_loss=0.1505, cr_loss=0.3829, over 18504.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1474, cr_loss=0.3745, over 3967446.58 frames. ], batch size: 108, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:35:10,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=887037.3333333334, ans=0.0 2024-09-19 07:35:53,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-19 07:35:56,836 INFO [train.py:1198] (1/2) Epoch 49, batch 6350, loss[loss=0.2754, ctc_loss=0.1955, cr_loss=0.3993, over 14159.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1513, cr_loss=0.3768, over 3782539.19 frames. ], batch size: 149, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:36:12,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=887179.0, ans=0.2 2024-09-19 07:36:12,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=887179.0, ans=0.0 2024-09-19 07:36:14,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=887179.0, ans=0.0 2024-09-19 07:36:21,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-09-19 07:36:31,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.659e+02 2.858e+02 3.070e+02 6.716e+02, threshold=5.716e+02, percent-clipped=1.0 2024-09-19 07:37:43,472 INFO [train.py:1198] (1/2) Epoch 50, batch 0, loss[loss=0.2058, ctc_loss=0.1356, cr_loss=0.3509, over 21048.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1356, cr_loss=0.3509, over 21048.00 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:37:43,473 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 07:37:58,312 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7081, 2.2566, 2.2589, 2.5642, 2.3008, 2.4212, 1.8533, 1.8347], device='cuda:1') 2024-09-19 07:38:00,742 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6396, 5.3681, 5.2330, 4.8051], device='cuda:1') 2024-09-19 07:38:04,301 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.03842, ctc_loss=0.03842, cr_loss=1.713e-14, over 944034.00 frames. 2024-09-19 07:38:04,301 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 07:38:40,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=887323.5, ans=0.2 2024-09-19 07:38:42,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=887323.5, ans=0.125 2024-09-19 07:38:44,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-09-19 07:38:48,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-19 07:38:56,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=887351.8333333334, ans=0.125 2024-09-19 07:39:04,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=887380.1666666666, ans=0.125 2024-09-19 07:39:06,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-19 07:39:19,534 INFO [train.py:1198] (1/2) Epoch 50, batch 50, loss[loss=0.2083, ctc_loss=0.1365, cr_loss=0.359, over 20899.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3665, over 927159.29 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:40:12,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=887493.5, ans=0.125 2024-09-19 07:40:13,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.275e+02 2.418e+02 2.747e+02 4.190e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-19 07:40:37,938 INFO [train.py:1198] (1/2) Epoch 50, batch 100, loss[loss=0.2466, ctc_loss=0.1647, cr_loss=0.4095, over 14824.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3658, over 1635357.83 frames. ], batch size: 149, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:40:40,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=887550.1666666666, ans=0.125 2024-09-19 07:40:47,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887550.1666666666, ans=0.125 2024-09-19 07:40:47,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=22.5 2024-09-19 07:41:00,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887578.5, ans=0.1 2024-09-19 07:41:32,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=887635.1666666666, ans=0.125 2024-09-19 07:41:47,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=887663.5, ans=0.0 2024-09-19 07:41:48,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=887663.5, ans=0.125 2024-09-19 07:41:53,070 INFO [train.py:1198] (1/2) Epoch 50, batch 150, loss[loss=0.19, ctc_loss=0.1231, cr_loss=0.3345, over 20931.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1405, cr_loss=0.3647, over 2178545.79 frames. ], batch size: 49, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:41:54,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=887691.8333333334, ans=0.125 2024-09-19 07:42:03,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=887691.8333333334, ans=0.025 2024-09-19 07:42:09,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=887720.1666666666, ans=0.125 2024-09-19 07:42:44,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.231e+02 2.391e+02 2.516e+02 3.152e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-19 07:43:08,319 INFO [train.py:1198] (1/2) Epoch 50, batch 200, loss[loss=0.2182, ctc_loss=0.1435, cr_loss=0.3738, over 19985.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3665, over 2611446.18 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:44:27,310 INFO [train.py:1198] (1/2) Epoch 50, batch 250, loss[loss=0.2147, ctc_loss=0.1396, cr_loss=0.3755, over 20705.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3673, over 2929062.57 frames. ], batch size: 71, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:44:49,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=888003.5, ans=0.0 2024-09-19 07:45:05,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-19 07:45:17,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=888060.1666666666, ans=0.125 2024-09-19 07:45:17,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=888060.1666666666, ans=0.2 2024-09-19 07:45:18,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.313e+02 2.452e+02 2.591e+02 3.246e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-19 07:45:18,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=888060.1666666666, ans=0.2 2024-09-19 07:45:29,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-19 07:45:37,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888088.5, ans=0.1 2024-09-19 07:45:45,779 INFO [train.py:1198] (1/2) Epoch 50, batch 300, loss[loss=0.2212, ctc_loss=0.1488, cr_loss=0.362, over 19554.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1397, cr_loss=0.3652, over 3202072.56 frames. ], batch size: 90, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:45:55,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-19 07:46:06,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=888145.1666666666, ans=0.0 2024-09-19 07:46:10,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888145.1666666666, ans=0.1 2024-09-19 07:46:19,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=888173.5, ans=0.125 2024-09-19 07:46:30,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888201.8333333334, ans=0.125 2024-09-19 07:46:30,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=888201.8333333334, ans=0.0 2024-09-19 07:47:01,475 INFO [train.py:1198] (1/2) Epoch 50, batch 350, loss[loss=0.2258, ctc_loss=0.1504, cr_loss=0.377, over 21014.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1395, cr_loss=0.365, over 3411707.47 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:47:19,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-09-19 07:47:32,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=22.5 2024-09-19 07:47:52,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.282e+02 2.368e+02 2.527e+02 3.636e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-19 07:48:01,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=888371.8333333334, ans=0.125 2024-09-19 07:48:16,101 INFO [train.py:1198] (1/2) Epoch 50, batch 400, loss[loss=0.2531, ctc_loss=0.1697, cr_loss=0.417, over 18596.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1407, cr_loss=0.3667, over 3552624.22 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:48:34,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=888428.5, ans=0.125 2024-09-19 07:48:43,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=888428.5, ans=0.125 2024-09-19 07:49:34,592 INFO [train.py:1198] (1/2) Epoch 50, batch 450, loss[loss=0.1972, ctc_loss=0.1271, cr_loss=0.3503, over 21078.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3681, over 3684274.70 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:49:57,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=888570.1666666666, ans=0.09899494936611666 2024-09-19 07:49:57,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-19 07:50:21,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=888626.8333333334, ans=0.125 2024-09-19 07:50:23,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=22.5 2024-09-19 07:50:26,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.265e+02 2.395e+02 2.492e+02 3.193e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 07:50:50,414 INFO [train.py:1198] (1/2) Epoch 50, batch 500, loss[loss=0.2233, ctc_loss=0.1469, cr_loss=0.3819, over 20360.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3682, over 3773257.54 frames. ], batch size: 74, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:51:24,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=888740.1666666666, ans=0.0 2024-09-19 07:52:09,561 INFO [train.py:1198] (1/2) Epoch 50, batch 550, loss[loss=0.2428, ctc_loss=0.1616, cr_loss=0.4059, over 19998.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.368, over 3841543.54 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:53:00,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.266e+02 2.376e+02 2.535e+02 3.229e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-19 07:53:21,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888938.5, ans=0.125 2024-09-19 07:53:24,380 INFO [train.py:1198] (1/2) Epoch 50, batch 600, loss[loss=0.2055, ctc_loss=0.1339, cr_loss=0.3581, over 21073.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3677, over 3907097.26 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:54:00,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=889023.5, ans=0.025 2024-09-19 07:54:19,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-09-19 07:54:34,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=889080.1666666666, ans=0.125 2024-09-19 07:54:39,934 INFO [train.py:1198] (1/2) Epoch 50, batch 650, loss[loss=0.252, ctc_loss=0.169, cr_loss=0.4148, over 20650.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3669, over 3942366.60 frames. ], batch size: 68, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:54:52,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2024-09-19 07:55:10,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-19 07:55:34,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.301e+02 2.436e+02 2.631e+02 6.907e+02, threshold=4.873e+02, percent-clipped=1.0 2024-09-19 07:55:58,238 INFO [train.py:1198] (1/2) Epoch 50, batch 700, loss[loss=0.2096, ctc_loss=0.1375, cr_loss=0.3606, over 20771.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3649, over 3986467.19 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:56:04,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889250.1666666666, ans=0.125 2024-09-19 07:56:27,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=889306.8333333334, ans=0.125 2024-09-19 07:56:48,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=889335.1666666666, ans=0.0 2024-09-19 07:57:14,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.00 vs. limit=6.0 2024-09-19 07:57:16,880 INFO [train.py:1198] (1/2) Epoch 50, batch 750, loss[loss=0.2332, ctc_loss=0.1504, cr_loss=0.4139, over 20929.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3667, over 3993955.88 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:57:18,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=889391.8333333334, ans=0.04949747468305833 2024-09-19 07:57:26,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-19 07:57:36,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=889420.1666666666, ans=0.025 2024-09-19 07:58:08,068 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.284e+02 2.389e+02 2.497e+02 3.132e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-19 07:58:32,354 INFO [train.py:1198] (1/2) Epoch 50, batch 800, loss[loss=0.1823, ctc_loss=0.1197, cr_loss=0.313, over 20973.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3668, over 4026930.47 frames. ], batch size: 48, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:58:35,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=889533.5, ans=0.125 2024-09-19 07:59:14,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-19 07:59:23,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889618.5, ans=0.1 2024-09-19 07:59:23,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-19 07:59:47,204 INFO [train.py:1198] (1/2) Epoch 50, batch 850, loss[loss=0.2233, ctc_loss=0.1509, cr_loss=0.3623, over 20317.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3672, over 4042948.29 frames. ], batch size: 74, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:00:03,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=889703.5, ans=0.0 2024-09-19 08:00:21,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:00:34,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=889760.1666666666, ans=0.125 2024-09-19 08:00:36,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=889760.1666666666, ans=0.125 2024-09-19 08:00:37,481 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.266e+02 2.406e+02 2.507e+02 3.196e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-19 08:00:50,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=889788.5, ans=0.125 2024-09-19 08:01:01,825 INFO [train.py:1198] (1/2) Epoch 50, batch 900, loss[loss=0.1991, ctc_loss=0.1289, cr_loss=0.3511, over 21072.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3666, over 4058656.72 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:01:12,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889816.8333333334, ans=0.1 2024-09-19 08:01:36,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=889873.5, ans=0.0 2024-09-19 08:02:20,122 INFO [train.py:1198] (1/2) Epoch 50, batch 950, loss[loss=0.1781, ctc_loss=0.1141, cr_loss=0.32, over 20971.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1406, cr_loss=0.3663, over 4064878.55 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:02:56,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=890015.1666666666, ans=0.07 2024-09-19 08:03:05,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=890015.1666666666, ans=0.0 2024-09-19 08:03:14,468 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.262e+02 2.398e+02 2.595e+02 3.730e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 08:03:14,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=890043.5, ans=0.125 2024-09-19 08:03:38,764 INFO [train.py:1198] (1/2) Epoch 50, batch 1000, loss[loss=0.2527, ctc_loss=0.1669, cr_loss=0.4291, over 19930.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3673, over 4075661.29 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:04:00,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-19 08:04:32,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=890185.1666666666, ans=0.125 2024-09-19 08:04:55,211 INFO [train.py:1198] (1/2) Epoch 50, batch 1050, loss[loss=0.2198, ctc_loss=0.1475, cr_loss=0.3616, over 20651.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3688, over 4085563.22 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:05:13,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=890270.1666666666, ans=0.125 2024-09-19 08:05:37,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890298.5, ans=0.125 2024-09-19 08:05:39,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=890326.8333333334, ans=0.0 2024-09-19 08:05:46,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.296e+02 2.435e+02 2.565e+02 3.108e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 08:06:02,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=890355.1666666666, ans=0.2 2024-09-19 08:06:11,223 INFO [train.py:1198] (1/2) Epoch 50, batch 1100, loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3694, over 20717.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3675, over 4091593.04 frames. ], batch size: 71, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:07:02,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890468.5, ans=0.0 2024-09-19 08:07:02,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:07:07,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=890468.5, ans=0.0 2024-09-19 08:07:10,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=890468.5, ans=0.125 2024-09-19 08:07:22,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-09-19 08:07:29,751 INFO [train.py:1198] (1/2) Epoch 50, batch 1150, loss[loss=0.1843, ctc_loss=0.1181, cr_loss=0.331, over 21054.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3673, over 4096566.59 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:07:46,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=890553.5, ans=0.125 2024-09-19 08:07:55,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890553.5, ans=0.125 2024-09-19 08:08:01,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=890581.8333333334, ans=0.0 2024-09-19 08:08:03,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=22.5 2024-09-19 08:08:08,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=890581.8333333334, ans=0.2 2024-09-19 08:08:20,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.263e+02 2.381e+02 2.513e+02 8.725e+02, threshold=4.761e+02, percent-clipped=1.0 2024-09-19 08:08:31,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=890638.5, ans=0.0 2024-09-19 08:08:36,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-19 08:08:41,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890638.5, ans=0.1 2024-09-19 08:08:47,482 INFO [train.py:1198] (1/2) Epoch 50, batch 1200, loss[loss=0.2481, ctc_loss=0.1689, cr_loss=0.3958, over 18296.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3686, over 4098102.59 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:09:06,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=890695.1666666666, ans=0.125 2024-09-19 08:09:06,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=890695.1666666666, ans=0.2 2024-09-19 08:09:33,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=890751.8333333334, ans=0.07 2024-09-19 08:10:02,646 INFO [train.py:1198] (1/2) Epoch 50, batch 1250, loss[loss=0.2366, ctc_loss=0.1572, cr_loss=0.397, over 20843.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.143, cr_loss=0.3705, over 4092007.76 frames. ], batch size: 65, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:10:03,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=890808.5, ans=0.125 2024-09-19 08:10:16,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=890836.8333333334, ans=0.125 2024-09-19 08:10:27,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=890836.8333333334, ans=0.025 2024-09-19 08:10:31,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=890865.1666666666, ans=0.125 2024-09-19 08:10:32,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=890865.1666666666, ans=0.125 2024-09-19 08:10:36,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2024-09-19 08:10:53,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.278e+02 2.388e+02 2.563e+02 3.317e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-19 08:11:16,587 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:11:17,673 INFO [train.py:1198] (1/2) Epoch 50, batch 1300, loss[loss=0.2239, ctc_loss=0.1497, cr_loss=0.3706, over 20658.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1429, cr_loss=0.3709, over 4099370.85 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:11:20,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890950.1666666666, ans=0.1 2024-09-19 08:11:40,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=890978.5, ans=0.2 2024-09-19 08:12:04,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=891035.1666666666, ans=0.125 2024-09-19 08:12:18,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=891063.5, ans=0.025 2024-09-19 08:12:29,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=891063.5, ans=15.0 2024-09-19 08:12:35,615 INFO [train.py:1198] (1/2) Epoch 50, batch 1350, loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3743, over 20627.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1426, cr_loss=0.3701, over 4098026.29 frames. ], batch size: 75, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:12:37,647 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:12:56,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-09-19 08:13:26,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.350e+02 2.446e+02 2.605e+02 3.274e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-19 08:13:39,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=891205.1666666666, ans=0.125 2024-09-19 08:13:49,970 INFO [train.py:1198] (1/2) Epoch 50, batch 1400, loss[loss=0.2406, ctc_loss=0.1603, cr_loss=0.4016, over 20051.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1427, cr_loss=0.3705, over 4091230.09 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:13:51,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=891233.5, ans=0.0 2024-09-19 08:13:56,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=891233.5, ans=0.0 2024-09-19 08:14:58,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891346.8333333334, ans=0.1 2024-09-19 08:15:09,338 INFO [train.py:1198] (1/2) Epoch 50, batch 1450, loss[loss=0.1784, ctc_loss=0.1165, cr_loss=0.3095, over 20971.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1435, cr_loss=0.372, over 4093286.11 frames. ], batch size: 50, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:15:32,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=891403.5, ans=0.2 2024-09-19 08:16:00,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.268e+02 2.401e+02 2.572e+02 3.033e+02, threshold=4.802e+02, percent-clipped=0.0 2024-09-19 08:16:25,147 INFO [train.py:1198] (1/2) Epoch 50, batch 1500, loss[loss=0.2459, ctc_loss=0.1654, cr_loss=0.4027, over 20090.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3699, over 4098789.50 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:17:03,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-19 08:17:06,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-19 08:17:26,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=891630.1666666666, ans=0.025 2024-09-19 08:17:28,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=891630.1666666666, ans=0.125 2024-09-19 08:17:32,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=891630.1666666666, ans=0.125 2024-09-19 08:17:40,101 INFO [train.py:1198] (1/2) Epoch 50, batch 1550, loss[loss=0.1677, ctc_loss=0.1098, cr_loss=0.2897, over 20964.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1417, cr_loss=0.3685, over 4102713.58 frames. ], batch size: 48, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:18:03,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=891686.8333333334, ans=0.125 2024-09-19 08:18:15,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891715.1666666666, ans=0.1 2024-09-19 08:18:34,206 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.272e+02 2.394e+02 2.514e+02 2.979e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 08:18:40,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-19 08:18:43,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.69 vs. limit=6.0 2024-09-19 08:18:58,443 INFO [train.py:1198] (1/2) Epoch 50, batch 1600, loss[loss=0.2216, ctc_loss=0.1435, cr_loss=0.3906, over 20893.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1413, cr_loss=0.368, over 4112717.72 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:19:18,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891828.5, ans=0.125 2024-09-19 08:19:30,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=891856.8333333334, ans=0.125 2024-09-19 08:20:09,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=891913.5, ans=10.0 2024-09-19 08:20:14,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=891913.5, ans=0.0 2024-09-19 08:20:17,019 INFO [train.py:1198] (1/2) Epoch 50, batch 1650, loss[loss=0.2453, ctc_loss=0.1635, cr_loss=0.4089, over 20091.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3691, over 4091387.90 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:20:29,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891941.8333333334, ans=0.1 2024-09-19 08:20:53,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=891998.5, ans=0.125 2024-09-19 08:21:05,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=892026.8333333334, ans=0.125 2024-09-19 08:21:10,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.317e+02 2.511e+02 2.608e+02 4.038e+02, threshold=5.022e+02, percent-clipped=0.0 2024-09-19 08:21:14,831 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:21:32,361 INFO [train.py:1198] (1/2) Epoch 50, batch 1700, loss[loss=0.1741, ctc_loss=0.1147, cr_loss=0.2973, over 19808.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3689, over 4083596.19 frames. ], batch size: 44, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:21:39,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.67 vs. limit=10.0 2024-09-19 08:22:10,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=892140.1666666666, ans=0.0 2024-09-19 08:22:12,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892140.1666666666, ans=0.125 2024-09-19 08:22:18,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=892168.5, ans=0.0 2024-09-19 08:22:24,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=892168.5, ans=0.125 2024-09-19 08:22:33,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=892196.8333333334, ans=0.2 2024-09-19 08:22:48,274 INFO [train.py:1198] (1/2) Epoch 50, batch 1750, loss[loss=0.1873, ctc_loss=0.1221, cr_loss=0.3258, over 21010.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1413, cr_loss=0.368, over 4087611.73 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:23:02,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=892253.5, ans=0.125 2024-09-19 08:23:40,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.279e+02 2.409e+02 2.572e+02 3.164e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-19 08:23:48,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892338.5, ans=0.1 2024-09-19 08:23:57,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=892338.5, ans=0.0 2024-09-19 08:24:03,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=892338.5, ans=0.125 2024-09-19 08:24:05,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=892366.8333333334, ans=0.0 2024-09-19 08:24:06,246 INFO [train.py:1198] (1/2) Epoch 50, batch 1800, loss[loss=0.1893, ctc_loss=0.122, cr_loss=0.3361, over 20983.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.3691, over 4092042.30 frames. ], batch size: 48, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:25:13,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=892480.1666666666, ans=0.025 2024-09-19 08:25:25,175 INFO [train.py:1198] (1/2) Epoch 50, batch 1850, loss[loss=0.1877, ctc_loss=0.1218, cr_loss=0.3292, over 20980.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3678, over 4092355.68 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:26:06,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892565.1666666666, ans=0.1 2024-09-19 08:26:17,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.277e+02 2.407e+02 2.527e+02 3.076e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-19 08:26:24,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-19 08:26:40,728 INFO [train.py:1198] (1/2) Epoch 50, batch 1900, loss[loss=0.1898, ctc_loss=0.1229, cr_loss=0.3349, over 20975.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3661, over 4089148.24 frames. ], batch size: 48, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:26:42,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892650.1666666666, ans=0.125 2024-09-19 08:27:40,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.10 vs. limit=10.0 2024-09-19 08:27:55,919 INFO [train.py:1198] (1/2) Epoch 50, batch 1950, loss[loss=0.2233, ctc_loss=0.1498, cr_loss=0.3675, over 20676.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3681, over 4071202.07 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:28:20,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=22.5 2024-09-19 08:28:27,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=892848.5, ans=0.125 2024-09-19 08:28:38,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-19 08:28:48,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.320e+02 2.442e+02 2.576e+02 3.122e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-19 08:28:58,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=892905.1666666666, ans=0.035 2024-09-19 08:29:10,747 INFO [train.py:1198] (1/2) Epoch 50, batch 2000, loss[loss=0.2362, ctc_loss=0.1583, cr_loss=0.3896, over 20987.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3665, over 4081335.24 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:29:23,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.13 vs. limit=6.0 2024-09-19 08:30:28,908 INFO [train.py:1198] (1/2) Epoch 50, batch 2050, loss[loss=0.2236, ctc_loss=0.1463, cr_loss=0.3867, over 21035.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3655, over 4081161.01 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:30:45,518 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:30:56,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=893103.5, ans=0.125 2024-09-19 08:31:04,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=893131.8333333334, ans=0.125 2024-09-19 08:31:09,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893131.8333333334, ans=0.125 2024-09-19 08:31:23,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.256e+02 2.370e+02 2.621e+02 3.459e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-19 08:31:36,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893188.5, ans=0.1 2024-09-19 08:31:37,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=893188.5, ans=0.125 2024-09-19 08:31:46,585 INFO [train.py:1198] (1/2) Epoch 50, batch 2100, loss[loss=0.1686, ctc_loss=0.1076, cr_loss=0.3049, over 19860.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1402, cr_loss=0.3641, over 4068424.75 frames. ], batch size: 44, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:32:08,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893245.1666666666, ans=0.1 2024-09-19 08:32:22,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=893273.5, ans=0.0 2024-09-19 08:32:22,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=893273.5, ans=0.07 2024-09-19 08:32:55,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-19 08:33:02,274 INFO [train.py:1198] (1/2) Epoch 50, batch 2150, loss[loss=0.2452, ctc_loss=0.1673, cr_loss=0.3895, over 18354.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.14, cr_loss=0.3644, over 4084056.76 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:33:31,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893415.1666666666, ans=0.0 2024-09-19 08:33:44,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=893415.1666666666, ans=0.2 2024-09-19 08:33:55,041 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.258e+02 2.419e+02 2.581e+02 3.408e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 08:33:58,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=893443.5, ans=0.0 2024-09-19 08:34:17,334 INFO [train.py:1198] (1/2) Epoch 50, batch 2200, loss[loss=0.2409, ctc_loss=0.1599, cr_loss=0.4053, over 20699.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1403, cr_loss=0.3646, over 4088143.34 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:34:19,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-19 08:34:26,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=893500.1666666666, ans=0.0 2024-09-19 08:34:52,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=12.0 2024-09-19 08:35:04,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=893585.1666666666, ans=0.125 2024-09-19 08:35:10,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-19 08:35:27,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=893613.5, ans=0.0 2024-09-19 08:35:35,077 INFO [train.py:1198] (1/2) Epoch 50, batch 2250, loss[loss=0.2144, ctc_loss=0.1435, cr_loss=0.3542, over 20919.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1402, cr_loss=0.3643, over 4094889.73 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:35:38,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=893641.8333333334, ans=0.05 2024-09-19 08:35:49,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893670.1666666666, ans=0.0 2024-09-19 08:36:28,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.242e+02 2.392e+02 2.522e+02 4.473e+02, threshold=4.784e+02, percent-clipped=0.0 2024-09-19 08:36:33,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=893726.8333333334, ans=0.0 2024-09-19 08:36:54,049 INFO [train.py:1198] (1/2) Epoch 50, batch 2300, loss[loss=0.1619, ctc_loss=0.1025, cr_loss=0.2969, over 20982.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1397, cr_loss=0.3635, over 4099267.72 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:37:21,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-09-19 08:37:35,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=893840.1666666666, ans=0.07 2024-09-19 08:37:36,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=893840.1666666666, ans=0.0 2024-09-19 08:37:44,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=22.5 2024-09-19 08:38:09,207 INFO [train.py:1198] (1/2) Epoch 50, batch 2350, loss[loss=0.2262, ctc_loss=0.1492, cr_loss=0.3851, over 20829.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1402, cr_loss=0.3643, over 4092553.39 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:38:21,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=893925.1666666666, ans=0.125 2024-09-19 08:38:25,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=893953.5, ans=0.2 2024-09-19 08:38:39,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893981.8333333334, ans=0.1 2024-09-19 08:39:01,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.305e+02 2.433e+02 2.556e+02 4.397e+02, threshold=4.867e+02, percent-clipped=0.0 2024-09-19 08:39:24,066 INFO [train.py:1198] (1/2) Epoch 50, batch 2400, loss[loss=0.2517, ctc_loss=0.1676, cr_loss=0.4203, over 18193.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3658, over 4088047.83 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:39:28,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894066.8333333334, ans=0.1 2024-09-19 08:39:41,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=894095.1666666666, ans=0.2 2024-09-19 08:39:42,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=894095.1666666666, ans=0.125 2024-09-19 08:39:51,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=894095.1666666666, ans=0.2 2024-09-19 08:39:59,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=894123.5, ans=0.0 2024-09-19 08:40:39,094 INFO [train.py:1198] (1/2) Epoch 50, batch 2450, loss[loss=0.1653, ctc_loss=0.1084, cr_loss=0.2842, over 19919.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3649, over 4095047.54 frames. ], batch size: 44, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:40:42,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-09-19 08:40:47,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894208.5, ans=0.125 2024-09-19 08:41:08,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=22.5 2024-09-19 08:41:21,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=894265.1666666666, ans=0.0 2024-09-19 08:41:21,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=894265.1666666666, ans=0.2 2024-09-19 08:41:32,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=15.0 2024-09-19 08:41:34,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.270e+02 2.403e+02 2.563e+02 3.540e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-19 08:41:37,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=894293.5, ans=0.125 2024-09-19 08:41:46,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-19 08:41:57,400 INFO [train.py:1198] (1/2) Epoch 50, batch 2500, loss[loss=0.2244, ctc_loss=0.1511, cr_loss=0.3668, over 20833.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.365, over 4104922.38 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:42:39,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=894406.8333333334, ans=0.0 2024-09-19 08:42:48,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=22.5 2024-09-19 08:42:49,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=894435.1666666666, ans=0.025 2024-09-19 08:43:08,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=894463.5, ans=0.0 2024-09-19 08:43:15,314 INFO [train.py:1198] (1/2) Epoch 50, batch 2550, loss[loss=0.2505, ctc_loss=0.1678, cr_loss=0.4134, over 18184.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.366, over 4106307.64 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:43:31,015 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:44:03,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-09-19 08:44:08,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.280e+02 2.426e+02 2.621e+02 3.118e+02, threshold=4.853e+02, percent-clipped=0.0 2024-09-19 08:44:15,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=894605.1666666666, ans=0.0 2024-09-19 08:44:30,799 INFO [train.py:1198] (1/2) Epoch 50, batch 2600, loss[loss=0.212, ctc_loss=0.1396, cr_loss=0.3622, over 20663.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1403, cr_loss=0.3663, over 4115490.51 frames. ], batch size: 68, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:44:43,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=894633.5, ans=0.125 2024-09-19 08:44:49,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=12.0 2024-09-19 08:45:19,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-19 08:45:40,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=894746.8333333334, ans=0.0 2024-09-19 08:45:47,240 INFO [train.py:1198] (1/2) Epoch 50, batch 2650, loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3781, over 20849.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.14, cr_loss=0.3656, over 4107527.16 frames. ], batch size: 65, lr: 1.65e-03, grad_scale: 16.0 2024-09-19 08:46:27,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-19 08:46:38,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=894860.1666666666, ans=0.2 2024-09-19 08:46:41,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.287e+02 2.433e+02 2.592e+02 7.189e+02, threshold=4.866e+02, percent-clipped=1.0 2024-09-19 08:47:03,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=894888.5, ans=0.125 2024-09-19 08:47:06,029 INFO [train.py:1198] (1/2) Epoch 50, batch 2700, loss[loss=0.2498, ctc_loss=0.1671, cr_loss=0.4139, over 20018.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1404, cr_loss=0.3663, over 4110012.89 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 16.0 2024-09-19 08:47:09,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-19 08:47:15,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=894916.8333333334, ans=0.2 2024-09-19 08:47:27,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=894945.1666666666, ans=0.2 2024-09-19 08:47:32,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894945.1666666666, ans=0.125 2024-09-19 08:47:49,869 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:48:24,588 INFO [train.py:1198] (1/2) Epoch 50, batch 2750, loss[loss=0.1776, ctc_loss=0.1123, cr_loss=0.3269, over 20996.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1403, cr_loss=0.3663, over 4110962.84 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 8.0 2024-09-19 08:48:44,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895086.8333333334, ans=0.1 2024-09-19 08:48:59,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=895115.1666666666, ans=0.1 2024-09-19 08:49:18,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=895143.5, ans=0.125 2024-09-19 08:49:20,864 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.232e+02 2.400e+02 2.549e+02 4.117e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 08:49:40,439 INFO [train.py:1198] (1/2) Epoch 50, batch 2800, loss[loss=0.2241, ctc_loss=0.1475, cr_loss=0.3827, over 20674.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1395, cr_loss=0.3646, over 4109422.38 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:49:40,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895200.1666666666, ans=0.125 2024-09-19 08:49:42,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895200.1666666666, ans=0.125 2024-09-19 08:49:55,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=895228.5, ans=0.125 2024-09-19 08:50:06,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895228.5, ans=0.1 2024-09-19 08:50:11,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=895256.8333333334, ans=0.0 2024-09-19 08:50:21,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=895256.8333333334, ans=0.125 2024-09-19 08:50:22,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-19 08:50:32,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=895285.1666666666, ans=0.125 2024-09-19 08:50:33,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895285.1666666666, ans=0.1 2024-09-19 08:50:52,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=895313.5, ans=0.125 2024-09-19 08:50:56,768 INFO [train.py:1198] (1/2) Epoch 50, batch 2850, loss[loss=0.1813, ctc_loss=0.1195, cr_loss=0.3092, over 20974.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1389, cr_loss=0.3637, over 4114321.17 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:51:24,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=895370.1666666666, ans=0.125 2024-09-19 08:51:28,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895398.5, ans=0.125 2024-09-19 08:51:35,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-19 08:51:48,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=895426.8333333334, ans=0.125 2024-09-19 08:51:52,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.294e+02 2.419e+02 2.605e+02 5.521e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-19 08:52:12,186 INFO [train.py:1198] (1/2) Epoch 50, batch 2900, loss[loss=0.229, ctc_loss=0.15, cr_loss=0.395, over 20955.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1386, cr_loss=0.3639, over 4119489.07 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:52:12,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=895483.5, ans=0.025 2024-09-19 08:52:33,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=895511.8333333334, ans=0.125 2024-09-19 08:53:04,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-19 08:53:31,083 INFO [train.py:1198] (1/2) Epoch 50, batch 2950, loss[loss=0.1832, ctc_loss=0.1197, cr_loss=0.3175, over 20955.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1394, cr_loss=0.3643, over 4109438.14 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:53:48,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-09-19 08:54:02,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-19 08:54:03,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=895681.8333333334, ans=0.125 2024-09-19 08:54:30,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.317e+02 2.424e+02 2.670e+02 4.100e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-19 08:54:49,914 INFO [train.py:1198] (1/2) Epoch 50, batch 3000, loss[loss=0.2215, ctc_loss=0.1452, cr_loss=0.3813, over 21014.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1392, cr_loss=0.3634, over 4097103.20 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:54:49,914 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 08:55:08,581 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.03862, ctc_loss=0.03862, cr_loss=1.608e-14, over 944034.00 frames. 2024-09-19 08:55:08,582 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 08:55:21,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=895766.8333333334, ans=0.0 2024-09-19 08:55:49,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=895823.5, ans=0.125 2024-09-19 08:56:24,679 INFO [train.py:1198] (1/2) Epoch 50, batch 3050, loss[loss=0.2215, ctc_loss=0.1449, cr_loss=0.3827, over 20638.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1393, cr_loss=0.364, over 4112517.55 frames. ], batch size: 66, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:56:41,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=895936.8333333334, ans=0.025 2024-09-19 08:56:57,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895965.1666666666, ans=0.125 2024-09-19 08:57:12,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=895993.5, ans=0.09899494936611666 2024-09-19 08:57:19,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.269e+02 2.440e+02 2.595e+02 3.509e+02, threshold=4.880e+02, percent-clipped=0.0 2024-09-19 08:57:39,264 INFO [train.py:1198] (1/2) Epoch 50, batch 3100, loss[loss=0.222, ctc_loss=0.1458, cr_loss=0.381, over 20950.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1412, cr_loss=0.3669, over 4083529.62 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:57:51,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=896050.1666666666, ans=0.2 2024-09-19 08:58:00,837 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:58:28,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=896135.1666666666, ans=0.0 2024-09-19 08:58:44,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=896163.5, ans=0.0 2024-09-19 08:58:51,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=896163.5, ans=0.125 2024-09-19 08:58:57,707 INFO [train.py:1198] (1/2) Epoch 50, batch 3150, loss[loss=0.1829, ctc_loss=0.1174, cr_loss=0.3274, over 20940.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1417, cr_loss=0.3683, over 4097374.05 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:59:24,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=896220.1666666666, ans=10.0 2024-09-19 08:59:54,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=896276.8333333334, ans=0.125 2024-09-19 08:59:55,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.256e+02 2.364e+02 2.511e+02 3.291e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-19 09:00:15,599 INFO [train.py:1198] (1/2) Epoch 50, batch 3200, loss[loss=0.2295, ctc_loss=0.154, cr_loss=0.3773, over 21035.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1414, cr_loss=0.3679, over 4098622.60 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:00:19,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.01 vs. limit=12.0 2024-09-19 09:00:49,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=896390.1666666666, ans=0.04949747468305833 2024-09-19 09:00:55,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=896390.1666666666, ans=0.025 2024-09-19 09:01:19,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=896446.8333333334, ans=0.2 2024-09-19 09:01:19,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=896446.8333333334, ans=0.2 2024-09-19 09:01:26,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=896446.8333333334, ans=0.125 2024-09-19 09:01:30,966 INFO [train.py:1198] (1/2) Epoch 50, batch 3250, loss[loss=0.2067, ctc_loss=0.1352, cr_loss=0.3575, over 20826.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1411, cr_loss=0.3672, over 4098227.80 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:01:41,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=8.0 2024-09-19 09:01:51,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=896503.5, ans=0.125 2024-09-19 09:01:54,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-19 09:02:08,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=896531.8333333334, ans=0.125 2024-09-19 09:02:17,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=896560.1666666666, ans=0.125 2024-09-19 09:02:21,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=896560.1666666666, ans=0.0 2024-09-19 09:02:27,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.265e+02 2.376e+02 2.604e+02 4.520e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 09:02:45,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=896616.8333333334, ans=0.125 2024-09-19 09:02:45,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=896616.8333333334, ans=0.125 2024-09-19 09:02:46,657 INFO [train.py:1198] (1/2) Epoch 50, batch 3300, loss[loss=0.2293, ctc_loss=0.1528, cr_loss=0.3828, over 20870.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1412, cr_loss=0.3675, over 4106313.44 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:02:51,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=896616.8333333334, ans=0.035 2024-09-19 09:03:03,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=896645.1666666666, ans=0.2 2024-09-19 09:03:06,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=896645.1666666666, ans=0.125 2024-09-19 09:03:15,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=896673.5, ans=0.125 2024-09-19 09:03:17,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=896673.5, ans=0.0 2024-09-19 09:03:41,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=896701.8333333334, ans=0.0 2024-09-19 09:03:54,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=896730.1666666666, ans=0.125 2024-09-19 09:04:04,587 INFO [train.py:1198] (1/2) Epoch 50, batch 3350, loss[loss=0.191, ctc_loss=0.1236, cr_loss=0.337, over 20944.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.3683, over 4106816.25 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:04:21,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=896786.8333333334, ans=0.125 2024-09-19 09:04:37,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=896815.1666666666, ans=0.5 2024-09-19 09:04:49,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-19 09:04:50,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=896843.5, ans=0.2 2024-09-19 09:05:00,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.285e+02 2.411e+02 2.587e+02 3.722e+02, threshold=4.823e+02, percent-clipped=0.0 2024-09-19 09:05:05,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-19 09:05:17,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=896871.8333333334, ans=0.125 2024-09-19 09:05:19,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=896871.8333333334, ans=0.125 2024-09-19 09:05:22,671 INFO [train.py:1198] (1/2) Epoch 50, batch 3400, loss[loss=0.2073, ctc_loss=0.1397, cr_loss=0.3378, over 21012.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3686, over 4102818.38 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:05:57,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=896956.8333333334, ans=0.125 2024-09-19 09:05:57,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=896956.8333333334, ans=0.125 2024-09-19 09:06:38,009 INFO [train.py:1198] (1/2) Epoch 50, batch 3450, loss[loss=0.2294, ctc_loss=0.1522, cr_loss=0.3857, over 21036.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3689, over 4105593.40 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:06:38,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=897041.8333333334, ans=0.95 2024-09-19 09:06:57,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=8.0 2024-09-19 09:07:04,301 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:07:10,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-19 09:07:31,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=897126.8333333334, ans=15.0 2024-09-19 09:07:33,938 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.318e+02 2.426e+02 2.580e+02 3.269e+02, threshold=4.853e+02, percent-clipped=0.0 2024-09-19 09:07:53,741 INFO [train.py:1198] (1/2) Epoch 50, batch 3500, loss[loss=0.1883, ctc_loss=0.123, cr_loss=0.3265, over 20972.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1405, cr_loss=0.3665, over 4118579.47 frames. ], batch size: 51, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:07:57,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=897183.5, ans=0.125 2024-09-19 09:08:17,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=897211.8333333334, ans=0.125 2024-09-19 09:09:09,432 INFO [train.py:1198] (1/2) Epoch 50, batch 3550, loss[loss=0.2049, ctc_loss=0.1375, cr_loss=0.3368, over 20371.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1408, cr_loss=0.3667, over 4110286.07 frames. ], batch size: 74, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:09:17,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897325.1666666666, ans=0.1 2024-09-19 09:09:17,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-19 09:09:20,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-19 09:10:03,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=897410.1666666666, ans=0.2 2024-09-19 09:10:08,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.358e+02 2.484e+02 2.630e+02 2.984e+02, threshold=4.968e+02, percent-clipped=0.0 2024-09-19 09:10:16,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=897438.5, ans=0.2 2024-09-19 09:10:28,085 INFO [train.py:1198] (1/2) Epoch 50, batch 3600, loss[loss=0.2288, ctc_loss=0.1511, cr_loss=0.3886, over 21075.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1399, cr_loss=0.3648, over 4116074.56 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:11:04,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=897523.5, ans=0.125 2024-09-19 09:11:06,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=897523.5, ans=0.0 2024-09-19 09:11:28,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897551.8333333334, ans=0.1 2024-09-19 09:11:33,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=897580.1666666666, ans=0.04949747468305833 2024-09-19 09:11:46,719 INFO [train.py:1198] (1/2) Epoch 50, batch 3650, loss[loss=0.2001, ctc_loss=0.1283, cr_loss=0.3588, over 21054.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1393, cr_loss=0.3637, over 4108440.76 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:11:47,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=22.5 2024-09-19 09:12:40,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=897693.5, ans=0.025 2024-09-19 09:12:42,046 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.342e+02 2.472e+02 2.648e+02 3.415e+02, threshold=4.943e+02, percent-clipped=0.0 2024-09-19 09:12:45,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=897721.8333333334, ans=0.125 2024-09-19 09:13:01,476 INFO [train.py:1198] (1/2) Epoch 50, batch 3700, loss[loss=0.2023, ctc_loss=0.1311, cr_loss=0.356, over 21061.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1402, cr_loss=0.365, over 4105631.07 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:14:04,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=897863.5, ans=0.125 2024-09-19 09:14:07,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=897863.5, ans=0.2 2024-09-19 09:14:16,276 INFO [train.py:1198] (1/2) Epoch 50, batch 3750, loss[loss=0.2178, ctc_loss=0.1444, cr_loss=0.3672, over 20750.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3661, over 4117545.73 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:14:31,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=897920.1666666666, ans=0.09899494936611666 2024-09-19 09:14:42,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=897920.1666666666, ans=0.0 2024-09-19 09:14:47,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-19 09:14:48,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=897948.5, ans=0.025 2024-09-19 09:15:04,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=897976.8333333334, ans=0.125 2024-09-19 09:15:06,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-19 09:15:12,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.252e+02 2.391e+02 2.576e+02 2.941e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-19 09:15:12,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=897976.8333333334, ans=0.125 2024-09-19 09:15:22,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=898005.1666666666, ans=0.2 2024-09-19 09:15:34,806 INFO [train.py:1198] (1/2) Epoch 50, batch 3800, loss[loss=0.1859, ctc_loss=0.1197, cr_loss=0.331, over 20908.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1402, cr_loss=0.3652, over 4109359.16 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:15:56,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=898061.8333333334, ans=0.125 2024-09-19 09:16:08,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898090.1666666666, ans=0.1 2024-09-19 09:16:28,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=898118.5, ans=0.125 2024-09-19 09:16:52,983 INFO [train.py:1198] (1/2) Epoch 50, batch 3850, loss[loss=0.2373, ctc_loss=0.1565, cr_loss=0.404, over 20981.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3656, over 4098249.14 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:17:21,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2024-09-19 09:17:23,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=898231.8333333334, ans=0.0 2024-09-19 09:17:44,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=898260.1666666666, ans=0.125 2024-09-19 09:17:49,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.321e+02 2.455e+02 2.629e+02 5.773e+02, threshold=4.910e+02, percent-clipped=2.0 2024-09-19 09:18:07,846 INFO [train.py:1198] (1/2) Epoch 50, batch 3900, loss[loss=0.1691, ctc_loss=0.11, cr_loss=0.2956, over 19850.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.367, over 4083037.52 frames. ], batch size: 44, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:18:18,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=898316.8333333334, ans=0.125 2024-09-19 09:18:24,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=898345.1666666666, ans=0.125 2024-09-19 09:18:56,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898401.8333333334, ans=0.1 2024-09-19 09:19:02,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=898401.8333333334, ans=0.125 2024-09-19 09:19:23,353 INFO [train.py:1198] (1/2) Epoch 50, batch 3950, loss[loss=0.1798, ctc_loss=0.1156, cr_loss=0.3214, over 21068.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3673, over 4068529.31 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:19:35,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=898458.5, ans=0.0 2024-09-19 09:20:20,963 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.303e+02 2.461e+02 2.591e+02 8.269e+02, threshold=4.922e+02, percent-clipped=1.0 2024-09-19 09:20:22,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=898571.8333333334, ans=0.2 2024-09-19 09:20:34,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=898571.8333333334, ans=0.0 2024-09-19 09:20:38,990 INFO [train.py:1198] (1/2) Epoch 50, batch 4000, loss[loss=0.199, ctc_loss=0.1305, cr_loss=0.3426, over 20995.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3662, over 4068910.37 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:20:42,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898600.1666666666, ans=0.1 2024-09-19 09:20:44,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-19 09:20:55,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898628.5, ans=0.1 2024-09-19 09:21:20,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898656.8333333334, ans=0.1 2024-09-19 09:21:38,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=898685.1666666666, ans=0.0 2024-09-19 09:21:56,592 INFO [train.py:1198] (1/2) Epoch 50, batch 4050, loss[loss=0.1887, ctc_loss=0.1197, cr_loss=0.3447, over 20956.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3655, over 4078035.16 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:22:28,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-19 09:22:32,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=898798.5, ans=0.0 2024-09-19 09:22:56,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.304e+02 2.442e+02 2.641e+02 3.605e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-19 09:23:08,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=898855.1666666666, ans=0.125 2024-09-19 09:23:14,300 INFO [train.py:1198] (1/2) Epoch 50, batch 4100, loss[loss=0.2199, ctc_loss=0.1466, cr_loss=0.3667, over 20781.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3666, over 4059340.18 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:23:41,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=898911.8333333334, ans=0.2 2024-09-19 09:23:43,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=898940.1666666666, ans=0.0 2024-09-19 09:23:49,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=898940.1666666666, ans=0.2 2024-09-19 09:24:05,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=898968.5, ans=0.0 2024-09-19 09:24:16,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=898996.8333333334, ans=0.0 2024-09-19 09:24:29,389 INFO [train.py:1198] (1/2) Epoch 50, batch 4150, loss[loss=0.2021, ctc_loss=0.1329, cr_loss=0.3459, over 19515.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3663, over 4061756.54 frames. ], batch size: 43, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:24:35,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=899025.1666666666, ans=0.125 2024-09-19 09:24:42,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=899053.5, ans=0.0 2024-09-19 09:24:52,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899053.5, ans=0.125 2024-09-19 09:24:53,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=899053.5, ans=0.0 2024-09-19 09:25:20,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-09-19 09:25:26,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.319e+02 2.478e+02 2.657e+02 3.033e+02, threshold=4.956e+02, percent-clipped=0.0 2024-09-19 09:25:45,408 INFO [train.py:1198] (1/2) Epoch 50, batch 4200, loss[loss=0.1974, ctc_loss=0.1309, cr_loss=0.3325, over 21065.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.141, cr_loss=0.3658, over 4080021.10 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:25:56,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=899166.8333333334, ans=0.125 2024-09-19 09:26:26,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=899223.5, ans=0.125 2024-09-19 09:27:03,579 INFO [train.py:1198] (1/2) Epoch 50, batch 4250, loss[loss=0.1876, ctc_loss=0.1213, cr_loss=0.3317, over 20955.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1402, cr_loss=0.3648, over 4080694.35 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:27:05,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=899308.5, ans=0.0 2024-09-19 09:27:20,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=899336.8333333334, ans=0.125 2024-09-19 09:27:23,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=899336.8333333334, ans=0.015 2024-09-19 09:27:24,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-09-19 09:27:46,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=899365.1666666666, ans=0.125 2024-09-19 09:28:03,568 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.275e+02 2.415e+02 2.601e+02 5.675e+02, threshold=4.830e+02, percent-clipped=1.0 2024-09-19 09:28:21,411 INFO [train.py:1198] (1/2) Epoch 50, batch 4300, loss[loss=0.1926, ctc_loss=0.1248, cr_loss=0.3386, over 21082.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1403, cr_loss=0.3653, over 4087207.42 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:28:35,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=899478.5, ans=0.125 2024-09-19 09:28:56,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=899506.8333333334, ans=0.0 2024-09-19 09:29:11,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=899535.1666666666, ans=0.0 2024-09-19 09:29:35,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-19 09:29:37,061 INFO [train.py:1198] (1/2) Epoch 50, batch 4350, loss[loss=0.2183, ctc_loss=0.1441, cr_loss=0.3709, over 20959.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3668, over 4065304.65 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:29:52,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899620.1666666666, ans=0.125 2024-09-19 09:29:55,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=899620.1666666666, ans=0.0 2024-09-19 09:30:34,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.298e+02 2.428e+02 2.673e+02 1.217e+03, threshold=4.857e+02, percent-clipped=1.0 2024-09-19 09:30:52,304 INFO [train.py:1198] (1/2) Epoch 50, batch 4400, loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3447, over 20962.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3662, over 4081784.26 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:31:00,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2024-09-19 09:31:40,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=899818.5, ans=0.0 2024-09-19 09:32:10,726 INFO [train.py:1198] (1/2) Epoch 50, batch 4450, loss[loss=0.1958, ctc_loss=0.1258, cr_loss=0.3501, over 20983.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3665, over 4091296.48 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:32:33,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=899903.5, ans=0.125 2024-09-19 09:32:36,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=899903.5, ans=0.0 2024-09-19 09:32:40,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:32:43,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=22.5 2024-09-19 09:33:05,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=899960.1666666666, ans=0.125 2024-09-19 09:33:08,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.325e+02 2.445e+02 2.628e+02 3.331e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-19 09:33:16,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=22.5 2024-09-19 09:33:26,035 INFO [train.py:1198] (1/2) Epoch 50, batch 4500, loss[loss=0.2306, ctc_loss=0.153, cr_loss=0.3881, over 21054.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3674, over 4100113.04 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:33:30,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=900016.8333333334, ans=0.125 2024-09-19 09:33:55,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=900045.1666666666, ans=0.125 2024-09-19 09:34:23,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=900101.8333333334, ans=0.125 2024-09-19 09:34:43,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900158.5, ans=0.1 2024-09-19 09:34:44,518 INFO [train.py:1198] (1/2) Epoch 50, batch 4550, loss[loss=0.2003, ctc_loss=0.1323, cr_loss=0.34, over 21013.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3653, over 4093789.76 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:35:19,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=900215.1666666666, ans=0.125 2024-09-19 09:35:22,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900215.1666666666, ans=0.1 2024-09-19 09:35:24,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=900215.1666666666, ans=0.0 2024-09-19 09:35:27,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=22.5 2024-09-19 09:35:41,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.282e+02 2.424e+02 2.573e+02 3.172e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-19 09:36:00,028 INFO [train.py:1198] (1/2) Epoch 50, batch 4600, loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3434, over 20918.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1404, cr_loss=0.3655, over 4094708.91 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:36:10,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=900300.1666666666, ans=0.2 2024-09-19 09:36:12,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=900300.1666666666, ans=0.125 2024-09-19 09:36:19,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=900328.5, ans=0.2 2024-09-19 09:36:48,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=900385.1666666666, ans=0.125 2024-09-19 09:37:15,052 INFO [train.py:1198] (1/2) Epoch 50, batch 4650, loss[loss=0.2169, ctc_loss=0.1414, cr_loss=0.3771, over 21048.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1406, cr_loss=0.3647, over 4082465.38 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:37:22,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=900441.8333333334, ans=0.125 2024-09-19 09:37:30,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-19 09:37:56,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=900498.5, ans=0.2 2024-09-19 09:38:15,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.298e+02 2.468e+02 2.596e+02 3.526e+02, threshold=4.936e+02, percent-clipped=0.0 2024-09-19 09:38:20,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900555.1666666666, ans=0.0 2024-09-19 09:38:26,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-19 09:38:33,153 INFO [train.py:1198] (1/2) Epoch 50, batch 4700, loss[loss=0.2326, ctc_loss=0.1514, cr_loss=0.4059, over 20876.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3656, over 4080402.18 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:39:08,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=900640.1666666666, ans=0.125 2024-09-19 09:39:36,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900696.8333333334, ans=0.125 2024-09-19 09:39:51,355 INFO [train.py:1198] (1/2) Epoch 50, batch 4750, loss[loss=0.2173, ctc_loss=0.141, cr_loss=0.3813, over 20977.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1398, cr_loss=0.3644, over 4090612.04 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:39:53,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900725.1666666666, ans=0.1 2024-09-19 09:39:55,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.90 vs. limit=10.0 2024-09-19 09:40:47,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=22.5 2024-09-19 09:40:48,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.273e+02 2.438e+02 2.557e+02 4.070e+02, threshold=4.876e+02, percent-clipped=0.0 2024-09-19 09:41:02,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900838.5, ans=0.125 2024-09-19 09:41:06,166 INFO [train.py:1198] (1/2) Epoch 50, batch 4800, loss[loss=0.2313, ctc_loss=0.1528, cr_loss=0.3928, over 20869.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1397, cr_loss=0.3645, over 4098024.95 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:41:12,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=900866.8333333334, ans=0.125 2024-09-19 09:42:07,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=900980.1666666666, ans=0.125 2024-09-19 09:42:20,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=22.5 2024-09-19 09:42:21,500 INFO [train.py:1198] (1/2) Epoch 50, batch 4850, loss[loss=0.2438, ctc_loss=0.1672, cr_loss=0.3828, over 20005.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1396, cr_loss=0.3637, over 4090952.81 frames. ], batch size: 80, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:43:18,232 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.316e+02 2.418e+02 2.607e+02 6.487e+02, threshold=4.836e+02, percent-clipped=1.0 2024-09-19 09:43:21,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=901121.8333333334, ans=0.125 2024-09-19 09:43:35,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=901121.8333333334, ans=0.0 2024-09-19 09:43:39,331 INFO [train.py:1198] (1/2) Epoch 50, batch 4900, loss[loss=0.2055, ctc_loss=0.1364, cr_loss=0.3453, over 19488.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3651, over 4091898.57 frames. ], batch size: 90, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:43:49,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=901150.1666666666, ans=0.125 2024-09-19 09:44:09,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901206.8333333334, ans=0.125 2024-09-19 09:44:15,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=901206.8333333334, ans=0.125 2024-09-19 09:44:21,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901206.8333333334, ans=0.1 2024-09-19 09:44:29,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=901235.1666666666, ans=0.125 2024-09-19 09:44:34,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=901235.1666666666, ans=0.0 2024-09-19 09:44:53,777 INFO [train.py:1198] (1/2) Epoch 50, batch 4950, loss[loss=0.2015, ctc_loss=0.1286, cr_loss=0.3647, over 19493.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1404, cr_loss=0.3653, over 4090386.17 frames. ], batch size: 43, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:45:29,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=901348.5, ans=0.0 2024-09-19 09:45:35,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901348.5, ans=0.125 2024-09-19 09:45:49,932 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.307e+02 2.452e+02 2.660e+02 3.351e+02, threshold=4.903e+02, percent-clipped=0.0 2024-09-19 09:46:10,557 INFO [train.py:1198] (1/2) Epoch 50, batch 5000, loss[loss=0.2131, ctc_loss=0.1398, cr_loss=0.3667, over 20883.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3671, over 4094308.23 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:46:12,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-19 09:46:14,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901433.5, ans=0.125 2024-09-19 09:46:17,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-19 09:46:55,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901518.5, ans=0.0 2024-09-19 09:47:24,875 INFO [train.py:1198] (1/2) Epoch 50, batch 5050, loss[loss=0.2077, ctc_loss=0.135, cr_loss=0.3637, over 20943.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3668, over 4106328.41 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:48:20,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=901660.1666666666, ans=0.0 2024-09-19 09:48:21,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.229e+02 2.381e+02 2.526e+02 8.900e+02, threshold=4.762e+02, percent-clipped=1.0 2024-09-19 09:48:23,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901688.5, ans=0.1 2024-09-19 09:48:38,935 INFO [train.py:1198] (1/2) Epoch 50, batch 5100, loss[loss=0.1925, ctc_loss=0.1217, cr_loss=0.3543, over 21002.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3662, over 4109035.62 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:48:40,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=901716.8333333334, ans=0.025 2024-09-19 09:49:24,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=901801.8333333334, ans=0.95 2024-09-19 09:49:31,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-19 09:49:44,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=901830.1666666666, ans=10.0 2024-09-19 09:49:53,448 INFO [train.py:1198] (1/2) Epoch 50, batch 5150, loss[loss=0.1959, ctc_loss=0.1264, cr_loss=0.3472, over 21059.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.367, over 4092417.08 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:50:32,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=901915.1666666666, ans=0.0 2024-09-19 09:50:49,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.300e+02 2.425e+02 2.557e+02 3.658e+02, threshold=4.850e+02, percent-clipped=0.0 2024-09-19 09:50:51,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901971.8333333334, ans=0.125 2024-09-19 09:51:07,808 INFO [train.py:1198] (1/2) Epoch 50, batch 5200, loss[loss=0.1961, ctc_loss=0.1291, cr_loss=0.335, over 20961.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.3659, over 4089054.88 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:51:08,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=902000.1666666666, ans=0.0 2024-09-19 09:51:41,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902056.8333333334, ans=0.1 2024-09-19 09:52:04,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=902085.1666666666, ans=0.07 2024-09-19 09:52:21,875 INFO [train.py:1198] (1/2) Epoch 50, batch 5250, loss[loss=0.1882, ctc_loss=0.1214, cr_loss=0.3337, over 20879.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3659, over 4094745.40 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:52:31,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=902141.8333333334, ans=0.125 2024-09-19 09:52:35,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=902170.1666666666, ans=0.125 2024-09-19 09:52:45,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=902170.1666666666, ans=0.0 2024-09-19 09:53:11,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=902226.8333333334, ans=0.2 2024-09-19 09:53:18,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.309e+02 2.443e+02 2.645e+02 7.412e+02, threshold=4.887e+02, percent-clipped=1.0 2024-09-19 09:53:38,807 INFO [train.py:1198] (1/2) Epoch 50, batch 5300, loss[loss=0.1824, ctc_loss=0.1148, cr_loss=0.3377, over 20993.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1403, cr_loss=0.3654, over 4097854.85 frames. ], batch size: 51, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:54:01,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=902311.8333333334, ans=0.125 2024-09-19 09:54:12,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=902340.1666666666, ans=0.125 2024-09-19 09:54:20,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=902340.1666666666, ans=0.2 2024-09-19 09:54:30,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=902368.5, ans=0.2 2024-09-19 09:54:33,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902368.5, ans=0.1 2024-09-19 09:54:33,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=902368.5, ans=0.0 2024-09-19 09:54:45,796 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:54:46,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=902396.8333333334, ans=0.125 2024-09-19 09:54:52,531 INFO [train.py:1198] (1/2) Epoch 50, batch 5350, loss[loss=0.2229, ctc_loss=0.1488, cr_loss=0.3703, over 18501.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3649, over 4100387.99 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:55:13,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=902453.5, ans=0.0 2024-09-19 09:55:16,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-09-19 09:55:40,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=902510.1666666666, ans=0.0 2024-09-19 09:55:53,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.306e+02 2.484e+02 2.674e+02 7.892e+02, threshold=4.969e+02, percent-clipped=1.0 2024-09-19 09:56:09,490 INFO [train.py:1198] (1/2) Epoch 50, batch 5400, loss[loss=0.1862, ctc_loss=0.1199, cr_loss=0.3313, over 20979.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.366, over 4105266.14 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:56:12,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=902566.8333333334, ans=0.125 2024-09-19 09:56:20,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=902566.8333333334, ans=0.02 2024-09-19 09:56:25,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-09-19 09:56:38,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-19 09:57:13,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=902680.1666666666, ans=0.0 2024-09-19 09:57:23,867 INFO [train.py:1198] (1/2) Epoch 50, batch 5450, loss[loss=0.1998, ctc_loss=0.1296, cr_loss=0.3509, over 20963.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.366, over 4109980.13 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:57:52,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=902765.1666666666, ans=0.0 2024-09-19 09:58:05,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=902765.1666666666, ans=0.125 2024-09-19 09:58:14,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=902793.5, ans=0.125 2024-09-19 09:58:21,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.299e+02 2.453e+02 2.614e+02 4.135e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-19 09:58:27,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=902821.8333333334, ans=0.2 2024-09-19 09:58:28,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902821.8333333334, ans=0.1 2024-09-19 09:58:37,525 INFO [train.py:1198] (1/2) Epoch 50, batch 5500, loss[loss=0.2037, ctc_loss=0.1335, cr_loss=0.351, over 20808.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3659, over 4109099.02 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:58:53,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902878.5, ans=0.125 2024-09-19 09:58:55,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=902878.5, ans=0.0 2024-09-19 09:59:47,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2024-09-19 09:59:49,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=902991.8333333334, ans=0.0 2024-09-19 09:59:50,908 INFO [train.py:1198] (1/2) Epoch 50, batch 5550, loss[loss=0.2272, ctc_loss=0.1562, cr_loss=0.3549, over 13963.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3658, over 4095592.93 frames. ], batch size: 152, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 10:00:00,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=902991.8333333334, ans=0.0 2024-09-19 10:00:15,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=903020.1666666666, ans=0.2 2024-09-19 10:00:20,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=903048.5, ans=0.125 2024-09-19 10:00:48,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.247e+02 2.367e+02 2.567e+02 4.623e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-19 10:01:05,413 INFO [train.py:1198] (1/2) Epoch 50, batch 5600, loss[loss=0.2251, ctc_loss=0.1496, cr_loss=0.3777, over 20840.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1404, cr_loss=0.3654, over 4100116.91 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:01:11,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=903133.5, ans=0.0 2024-09-19 10:01:41,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=903190.1666666666, ans=0.125 2024-09-19 10:01:52,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-19 10:02:02,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=903218.5, ans=0.125 2024-09-19 10:02:05,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=903246.8333333334, ans=0.0 2024-09-19 10:02:21,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=903275.1666666666, ans=0.2 2024-09-19 10:02:22,293 INFO [train.py:1198] (1/2) Epoch 50, batch 5650, loss[loss=0.2165, ctc_loss=0.1406, cr_loss=0.3797, over 20945.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3663, over 4093910.48 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:02:40,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=903303.5, ans=0.025 2024-09-19 10:03:12,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=903360.1666666666, ans=0.5 2024-09-19 10:03:15,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:03:19,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.290e+02 2.430e+02 2.621e+02 3.528e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 10:03:35,893 INFO [train.py:1198] (1/2) Epoch 50, batch 5700, loss[loss=0.2316, ctc_loss=0.1519, cr_loss=0.3988, over 21005.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3663, over 4096341.72 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:03:37,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=903416.8333333334, ans=0.07 2024-09-19 10:04:08,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=903473.5, ans=0.125 2024-09-19 10:04:26,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=903501.8333333334, ans=0.0 2024-09-19 10:04:49,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=903530.1666666666, ans=0.2 2024-09-19 10:04:49,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-09-19 10:04:52,307 INFO [train.py:1198] (1/2) Epoch 50, batch 5750, loss[loss=0.2121, ctc_loss=0.1391, cr_loss=0.3652, over 20965.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3676, over 4099612.79 frames. ], batch size: 51, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:05:01,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-19 10:05:50,005 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.344e+02 2.489e+02 2.688e+02 3.298e+02, threshold=4.978e+02, percent-clipped=0.0 2024-09-19 10:05:57,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=903671.8333333334, ans=0.2 2024-09-19 10:06:06,434 INFO [train.py:1198] (1/2) Epoch 50, batch 5800, loss[loss=0.1901, ctc_loss=0.1219, cr_loss=0.3412, over 20776.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1406, cr_loss=0.3667, over 4102733.35 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:06:12,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=903700.1666666666, ans=0.09899494936611666 2024-09-19 10:06:17,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=903700.1666666666, ans=0.0 2024-09-19 10:06:24,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=903728.5, ans=0.07 2024-09-19 10:06:29,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.65 vs. limit=6.0 2024-09-19 10:06:30,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903728.5, ans=0.1 2024-09-19 10:06:35,368 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:06:49,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=903785.1666666666, ans=0.2 2024-09-19 10:07:03,245 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:07:21,040 INFO [train.py:1198] (1/2) Epoch 50, batch 5850, loss[loss=0.2523, ctc_loss=0.1698, cr_loss=0.4124, over 21014.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1411, cr_loss=0.3672, over 4107049.89 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:07:24,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=903841.8333333334, ans=0.125 2024-09-19 10:07:30,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903841.8333333334, ans=0.125 2024-09-19 10:07:34,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=903870.1666666666, ans=0.2 2024-09-19 10:07:47,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=22.5 2024-09-19 10:07:48,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=903870.1666666666, ans=0.04949747468305833 2024-09-19 10:07:56,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=22.5 2024-09-19 10:07:57,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=903898.5, ans=0.0 2024-09-19 10:08:14,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=903926.8333333334, ans=0.04949747468305833 2024-09-19 10:08:19,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.289e+02 2.410e+02 2.545e+02 7.787e+02, threshold=4.821e+02, percent-clipped=1.0 2024-09-19 10:08:22,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=903955.1666666666, ans=0.125 2024-09-19 10:08:35,776 INFO [train.py:1198] (1/2) Epoch 50, batch 5900, loss[loss=0.2276, ctc_loss=0.1491, cr_loss=0.3922, over 20855.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3683, over 4104000.88 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:09:02,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=904011.8333333334, ans=0.0 2024-09-19 10:09:13,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=904040.1666666666, ans=0.0 2024-09-19 10:09:24,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904068.5, ans=0.1 2024-09-19 10:09:39,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=904096.8333333334, ans=0.125 2024-09-19 10:09:49,633 INFO [train.py:1198] (1/2) Epoch 50, batch 5950, loss[loss=0.1984, ctc_loss=0.1295, cr_loss=0.3445, over 20965.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1412, cr_loss=0.3676, over 4106434.34 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:10:23,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=904181.8333333334, ans=0.0 2024-09-19 10:10:50,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.257e+02 2.429e+02 2.560e+02 4.338e+02, threshold=4.859e+02, percent-clipped=0.0 2024-09-19 10:10:57,994 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:11:06,520 INFO [train.py:1198] (1/2) Epoch 50, batch 6000, loss[loss=0.2509, ctc_loss=0.1693, cr_loss=0.4081, over 20092.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1401, cr_loss=0.3662, over 4108569.30 frames. ], batch size: 80, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:11:06,521 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 10:11:24,441 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.03896, ctc_loss=0.03896, cr_loss=1.621e-14, over 944034.00 frames. 2024-09-19 10:11:24,441 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 20869MB 2024-09-19 10:11:49,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2024-09-19 10:11:50,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=904295.1666666666, ans=0.0 2024-09-19 10:11:50,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-09-19 10:12:38,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904380.1666666666, ans=0.0 2024-09-19 10:12:41,363 INFO [train.py:1198] (1/2) Epoch 50, batch 6050, loss[loss=0.2074, ctc_loss=0.1372, cr_loss=0.351, over 20882.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1398, cr_loss=0.3656, over 4108555.02 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:13:39,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.302e+02 2.437e+02 2.622e+02 3.791e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 10:13:55,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904550.1666666666, ans=0.1 2024-09-19 10:13:56,358 INFO [train.py:1198] (1/2) Epoch 50, batch 6100, loss[loss=0.2465, ctc_loss=0.1726, cr_loss=0.3694, over 13919.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1401, cr_loss=0.3657, over 4098147.89 frames. ], batch size: 150, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:14:04,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=12.0 2024-09-19 10:14:27,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.83 vs. limit=6.0 2024-09-19 10:14:49,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=904635.1666666666, ans=0.2 2024-09-19 10:15:02,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=904663.5, ans=0.0 2024-09-19 10:15:10,190 INFO [train.py:1198] (1/2) Epoch 50, batch 6150, loss[loss=0.1936, ctc_loss=0.1262, cr_loss=0.3372, over 20946.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1402, cr_loss=0.3663, over 4109224.48 frames. ], batch size: 50, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:15:10,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=904691.8333333334, ans=0.2 2024-09-19 10:15:34,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904720.1666666666, ans=0.1 2024-09-19 10:15:36,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=904720.1666666666, ans=0.2 2024-09-19 10:15:39,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904748.5, ans=0.1 2024-09-19 10:16:07,514 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.304e+02 2.446e+02 2.627e+02 3.437e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-19 10:16:10,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=904805.1666666666, ans=0.05 2024-09-19 10:16:19,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=904805.1666666666, ans=0.125 2024-09-19 10:16:23,988 INFO [train.py:1198] (1/2) Epoch 50, batch 6200, loss[loss=0.2119, ctc_loss=0.1417, cr_loss=0.3509, over 21022.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.3669, over 4102986.20 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:16:43,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904861.8333333334, ans=0.1 2024-09-19 10:16:44,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=904861.8333333334, ans=0.125 2024-09-19 10:16:45,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=904861.8333333334, ans=0.0 2024-09-19 10:16:58,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904890.1666666666, ans=0.0 2024-09-19 10:17:36,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904975.1666666666, ans=0.0 2024-09-19 10:17:37,744 INFO [train.py:1198] (1/2) Epoch 50, batch 6250, loss[loss=0.2562, ctc_loss=0.1769, cr_loss=0.3965, over 14540.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3685, over 4090763.23 frames. ], batch size: 149, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:17:55,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=905003.5, ans=0.2 2024-09-19 10:18:03,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-19 10:18:10,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=905031.8333333334, ans=0.125 2024-09-19 10:18:14,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-19 10:18:34,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.339e+02 2.541e+02 2.732e+02 3.809e+02, threshold=5.081e+02, percent-clipped=0.0 2024-09-19 10:18:50,844 INFO [train.py:1198] (1/2) Epoch 50, batch 6300, loss[loss=0.2318, ctc_loss=0.1534, cr_loss=0.3919, over 20727.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1437, cr_loss=0.3702, over 4044112.38 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:18:51,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=905116.8333333334, ans=0.125 2024-09-19 10:18:51,313 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:19:38,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905201.8333333334, ans=0.125 2024-09-19 10:19:51,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-09-19 10:19:58,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=905230.1666666666, ans=0.1 2024-09-19 10:20:02,514 INFO [train.py:1198] (1/2) Epoch 50, batch 6350, loss[loss=0.2656, ctc_loss=0.1846, cr_loss=0.4048, over 13823.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3726, over 3850512.28 frames. ], batch size: 150, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:20:02,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=905258.5, ans=0.125 2024-09-19 10:20:38,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=12.0 2024-09-19 10:20:48,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=905343.5, ans=0.0 2024-09-19 10:20:58,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.707e+02 2.928e+02 3.162e+02 4.182e+02, threshold=5.857e+02, percent-clipped=0.0 2024-09-19 10:21:00,125 INFO [train.py:1496] (1/2) Done!