diff --git "a/nohup.out" "b/nohup.out" --- "a/nohup.out" +++ "b/nohup.out" @@ -70601,3 +70601,1068 @@ If your task is similar to the task the model of the checkpoint was trained on, [INFO|modeling_utils.py:1680] 2022-12-16 18:13:10,901 >> Model weights saved in ./checkpoint-4000/pytorch_model.bin [INFO|feature_extraction_utils.py:368] 2022-12-16 18:13:10,920 >> Feature extractor saved in ./checkpoint-4000/preprocessor_config.json [INFO|feature_extraction_utils.py:368] 2022-12-16 18:13:15,300 >> Feature extractor saved in ./preprocessor_config.json +[WARNING|modeling_whisper.py:902] 2022-12-16 18:14:40,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|██████████████████████▊ | 4001/10000 [8:52:11<230:03:34, 138.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:14:49,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▏ | 4002/10000 [8:52:20<165:24:24, 99.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:14:59,467 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▏ | 4003/10000 [8:52:30<120:59:13, 72.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:06,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 4004/10000 [8:52:37<88:07:32, 52.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:14,111 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4005/10000 [8:52:45<65:31:15, 39.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:21,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4006/10000 [8:52:52<49:19:39, 29.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:29,203 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4007/10000 [8:53:00<38:34:27, 23.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:35,362 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4008/10000 [8:53:06<30:04:52, 18.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:41,844 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4009/10000 [8:53:12<24:17:02, 14.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:47,984 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4010/10000 [8:53:19<20:05:13, 12.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:15:54,161 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4011/10000 [8:53:25<17:08:44, 10.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:00,381 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4012/10000 [8:53:31<15:02:03, 9.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:06,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4013/10000 [8:53:37<13:28:40, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:12,288 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4014/10000 [8:53:43<12:26:40, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:20,040 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4015/10000 [8:53:51<12:34:36, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:26,688 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4016/10000 [8:53:57<12:05:36, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:33,533 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4017/10000 [8:54:04<11:54:07, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:39,551 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4018/10000 [8:54:10<11:20:01, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:16:46,475 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4019/10000 [8:54:17<11:24:42, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:04,077 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4020/10000 [8:54:35<16:45:18, 10.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:10,804 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4021/10000 [8:54:41<15:04:57, 9.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:18,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4022/10000 [8:54:49<14:10:33, 8.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:29,614 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4023/10000 [8:55:00<15:41:08, 9.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:37,352 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4024/10000 [8:55:08<14:47:03, 8.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:43,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▋ | 4025/10000 [8:55:15<13:41:16, 8.25s/it] 40%|███████████████████████▋ | 4025/10000 [8:55:15<13:41:16, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:50,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4026/10000 [8:55:21<12:57:44, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:17:57,623 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4027/10000 [8:55:28<12:26:29, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:04,592 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4028/10000 [8:55:35<12:12:52, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:10,808 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4029/10000 [8:55:41<11:34:39, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:17,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4030/10000 [8:55:48<11:25:32, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:24,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4031/10000 [8:55:55<11:34:00, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:31,199 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4032/10000 [8:56:02<11:24:33, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:41,208 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4033/10000 [8:56:12<12:55:50, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:47,939 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4034/10000 [8:56:19<12:23:55, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:18:55,237 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4035/10000 [8:56:26<12:14:50, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:04,159 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4036/10000 [8:56:35<13:05:03, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:10,850 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4037/10000 [8:56:42<12:29:53, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:18,246 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4038/10000 [8:56:49<12:24:13, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:24,900 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4039/10000 [8:56:56<12:00:01, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:37,735 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4040/10000 [8:57:08<14:46:00, 8.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:44,633 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4041/10000 [8:57:15<13:43:31, 8.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:51,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4042/10000 [8:57:22<12:54:38, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:19:59,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4043/10000 [8:57:30<13:01:26, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:10,081 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4044/10000 [8:57:41<14:29:42, 8.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:17,055 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4045/10000 [8:57:48<13:35:54, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:24,427 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▊ | 4046/10000 [8:57:55<13:09:31, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:30,483 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▉ | 4047/10000 [8:58:01<12:12:26, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:38,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▉ | 4048/10000 [8:58:09<12:21:53, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:45,400 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▉ | 4049/10000 [8:58:16<12:14:47, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:20:53,217 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▉ | 4050/10000 [8:58:24<12:25:20, 7.52s/it] 40%|███████████████████████▉ | 4050/10000 [8:58:24<12:25:20, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:00,275 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4051/10000 [8:58:31<12:15:16, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:07,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4052/10000 [8:58:38<12:17:49, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:16,236 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4053/10000 [8:58:47<12:46:33, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:24,178 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4054/10000 [8:58:55<12:51:46, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:31,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4055/10000 [8:59:02<12:35:28, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:38,679 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4056/10000 [8:59:09<12:23:11, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:21:52,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4057/10000 [8:59:23<15:32:16, 9.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:00,908 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4058/10000 [8:59:32<15:04:21, 9.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:10,504 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4059/10000 [8:59:41<15:14:00, 9.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:17,819 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4060/10000 [8:59:48<14:19:00, 8.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:25,707 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4061/10000 [8:59:56<13:56:31, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:35,243 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4062/10000 [9:00:06<14:27:51, 8.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:42,944 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4063/10000 [9:00:14<13:56:11, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:50,004 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4064/10000 [9:00:21<13:12:28, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:22:58,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4065/10000 [9:00:29<13:17:08, 8.06s/it]{'eval_loss': 0.2849899232387543, 'eval_wer': 21.80550320392009, 'eval_runtime': 350.0983, 'eval_samples_per_second': 4.867, 'eval_steps_per_second': 0.154, 'epoch': 0.4} +{'loss': 0.1593, 'learning_rate': 1.8874736842105264e-06, 'epoch': 0.4} +{'loss': 0.1416, 'learning_rate': 1.8795789473684212e-06, 'epoch': 0.41} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.55it/s] Reading metadata...: 2165it [00:00, 15770.03it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 18:23:06,244 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4066/10000 [9:00:37<13:19:03, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:13,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|███████████████████████▉ | 4067/10000 [9:00:44<12:45:17, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:20,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4068/10000 [9:00:51<12:23:32, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:27,319 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4069/10000 [9:00:58<12:12:37, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:34,453 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4070/10000 [9:01:05<12:02:04, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:41,924 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4071/10000 [9:01:12<12:04:14, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:48,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4072/10000 [9:01:19<11:54:15, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:23:56,190 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4073/10000 [9:01:27<11:58:49, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:03,763 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4074/10000 [9:01:34<12:06:39, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:10,825 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4075/10000 [9:01:42<11:59:39, 7.29s/it] 41%|████████████████████████ | 4075/10000 [9:01:42<11:59:39, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:17,696 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4076/10000 [9:01:48<11:46:38, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:25,122 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4077/10000 [9:01:56<11:52:17, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:32,202 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4078/10000 [9:02:03<11:50:25, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:39,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4079/10000 [9:02:10<11:47:18, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:46,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4080/10000 [9:02:17<11:48:01, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:24:53,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4081/10000 [9:02:24<11:51:11, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:03,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4082/10000 [9:02:34<12:59:42, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:10,797 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4083/10000 [9:02:41<12:45:01, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:19,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4084/10000 [9:02:51<13:27:26, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:27,476 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4085/10000 [9:02:58<13:08:39, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:34,750 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4086/10000 [9:03:05<12:47:19, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:41,527 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4087/10000 [9:03:12<12:13:11, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:49,097 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████ | 4088/10000 [9:03:20<12:17:05, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:25:56,514 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4089/10000 [9:03:27<12:18:30, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:04,000 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4090/10000 [9:03:35<12:14:06, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:11,500 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4091/10000 [9:03:42<12:16:14, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:18,583 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4092/10000 [9:03:49<12:07:40, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:26,362 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4093/10000 [9:03:57<12:15:50, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:33,999 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4094/10000 [9:04:05<12:20:58, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:45,542 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4095/10000 [9:04:16<14:20:08, 8.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:26:53,043 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4096/10000 [9:04:24<13:43:37, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:00,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4097/10000 [9:04:31<13:05:27, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:06,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4098/10000 [9:04:38<12:33:11, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:14,933 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4099/10000 [9:04:46<12:42:04, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:23,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4100/10000 [9:04:54<13:11:25, 8.05s/it] 41%|████████████████████████▏ | 4100/10000 [9:04:54<13:11:25, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:31,149 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4101/10000 [9:05:02<12:51:28, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:39,831 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4102/10000 [9:05:10<13:17:45, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:47,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4103/10000 [9:05:18<13:03:35, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:27:54,467 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4104/10000 [9:05:25<12:34:59, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:02,229 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4105/10000 [9:05:33<12:38:54, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:10,975 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4106/10000 [9:05:42<13:08:52, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:18,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4107/10000 [9:05:49<12:40:40, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:31,470 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4108/10000 [9:06:02<15:26:53, 9.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:38,378 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4109/10000 [9:06:09<14:11:17, 8.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:45,247 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▏ | 4110/10000 [9:06:16<13:18:55, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:28:52,866 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4111/10000 [9:06:23<13:01:52, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:00,137 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4112/10000 [9:06:31<12:43:35, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:07,758 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4113/10000 [9:06:38<12:37:00, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:14,679 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4114/10000 [9:06:45<12:13:30, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:21,752 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4115/10000 [9:06:52<12:02:32, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:28,931 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4116/10000 [9:07:00<11:56:55, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:36,214 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4117/10000 [9:07:07<11:56:32, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:43,840 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4118/10000 [9:07:15<12:06:11, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:51,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4119/10000 [9:07:22<11:56:44, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:29:58,458 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4120/10000 [9:07:29<11:59:39, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:07,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4121/10000 [9:07:38<12:40:56, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:14,443 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4122/10000 [9:07:45<12:27:25, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:21,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4123/10000 [9:07:52<12:11:25, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:29,916 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4124/10000 [9:08:01<12:38:57, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:37,987 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████���███████████▎ | 4125/10000 [9:08:09<12:48:23, 7.85s/it] 41%|████████████████████████▎ | 4125/10000 [9:08:09<12:48:23, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:44,959 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4126/10000 [9:08:16<12:19:29, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:52,249 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4127/10000 [9:08:23<12:14:45, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:30:59,921 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4128/10000 [9:08:31<12:20:18, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:07,708 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4129/10000 [9:08:38<12:25:34, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:15,478 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4130/10000 [9:08:46<12:31:13, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:23,184 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▎ | 4131/10000 [9:08:54<12:30:26, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:31,053 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4132/10000 [9:09:02<12:35:29, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:38,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4133/10000 [9:09:09<12:24:26, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:47,134 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4134/10000 [9:09:18<12:59:00, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:31:55,632 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4135/10000 [9:09:26<13:11:34, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:02,618 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4136/10000 [9:09:33<12:39:33, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:11,659 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4137/10000 [9:09:42<13:16:19, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:18,655 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4138/10000 [9:09:49<12:43:27, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:25,893 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4139/10000 [9:09:57<12:25:59, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:33,405 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4140/10000 [9:10:04<12:22:53, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:43,648 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4141/10000 [9:10:14<13:40:05, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:52,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4142/10000 [9:10:23<13:37:24, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:32:59,886 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4143/10000 [9:10:30<13:20:21, 8.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:06,270 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4144/10000 [9:10:37<12:28:00, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:12,663 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4145/10000 [9:10:43<11:50:41, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:19,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4146/10000 [9:10:50<11:47:46, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:26,086 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4147/10000 [9:10:57<11:16:55, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:32,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4148/10000 [9:11:03<10:55:58, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:38,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 41%|████████████████████████▍ | 4149/10000 [9:11:09<10:35:18, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:44,742 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▍ | 4150/10000 [9:11:15<10:35:17, 6.52s/it] 42%|████████████████████████▍ | 4150/10000 [9:11:15<10:35:17, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:51,270 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▍ | 4151/10000 [9:11:22<10:33:49, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:33:57,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▍ | 4152/10000 [9:11:28<10:19:28, 6.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:04,423 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4153/10000 [9:11:35<10:43:37, 6.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:10,607 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4154/10000 [9:11:41<10:31:52, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:17,021 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4155/10000 [9:11:48<10:29:03, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:23,066 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████���███████████████████▌ | 4156/10000 [9:11:54<10:16:38, 6.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:29,617 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4157/10000 [9:12:00<10:19:55, 6.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:35,717 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4158/10000 [9:12:06<10:14:49, 6.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:42,037 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4159/10000 [9:12:13<10:16:44, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:49,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4160/10000 [9:12:20<10:36:15, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:34:58,964 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4161/10000 [9:12:29<12:09:39, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:05,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4162/10000 [9:12:36<11:53:24, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:12,866 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4163/10000 [9:12:43<11:44:09, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:20,489 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4164/10000 [9:12:51<11:57:24, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:27,278 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4165/10000 [9:12:58<11:36:23, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:33,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4166/10000 [9:13:04<11:02:27, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:39,644 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4167/10000 [9:13:10<10:50:02, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:46,337 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4168/10000 [9:13:17<10:49:53, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:35:57,245 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4169/10000 [9:13:28<12:56:03, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:04,911 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4170/10000 [9:13:36<12:46:46, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:11,769 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4171/10000 [9:13:42<12:14:04, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:18,262 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4172/10000 [9:13:49<11:43:09, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:24,739 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▌ | 4173/10000 [9:13:55<11:20:25, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:31,272 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4174/10000 [9:14:02<11:06:15, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:38,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4175/10000 [9:14:09<11:15:16, 6.96s/it] 42%|████████████████████████▋ | 4175/10000 [9:14:09<11:15:16, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:45,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4176/10000 [9:14:16<11:05:44, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:36:52,208 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4177/10000 [9:14:23<11:14:45, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:00,535 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4178/10000 [9:14:31<11:54:32, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:09,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4179/10000 [9:14:40<12:31:43, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:18,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4180/10000 [9:14:49<13:16:22, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:27,804 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4181/10000 [9:14:58<13:46:23, 8.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:34,376 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4182/10000 [9:15:05<12:51:41, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:40,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4183/10000 [9:15:11<12:05:51, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:51,069 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4184/10000 [9:15:22<13:30:28, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:37:57,287 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4185/10000 [9:15:28<12:24:05, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:03,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4186/10000 [9:15:34<11:34:43, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:09,384 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4187/10000 [9:15:40<11:06:24, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:15,995 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4188/10000 [9:15:47<10:58:58, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:22,460 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4189/10000 [9:15:53<10:49:32, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:29,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4190/10000 [9:16:00<10:50:36, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:35,924 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4191/10000 [9:16:07<10:50:08, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:42,545 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4192/10000 [9:16:13<10:46:56, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:49,261 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4193/10000 [9:16:20<10:46:30, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:38:55,987 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▋ | 4194/10000 [9:16:27<10:49:35, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:02,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4195/10000 [9:16:33<10:31:23, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:08,205 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4196/10000 [9:16:39<10:20:51, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:14,952 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4197/10000 [9:16:46<10:28:17, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:21,473 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4198/10000 [9:16:52<10:26:00, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:28,279 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4199/10000 [9:16:59<10:37:34, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:39:53,438 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4200/10000 [9:17:24<19:37:55, 12.19s/it] 42%|████████████████████████▊ | 4200/10000 [9:17:24<19:37:55, 12.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:00,562 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4201/10000 [9:17:31<17:11:20, 10.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:07,778 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4202/10000 [9:17:38<15:30:10, 9.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:14,525 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4203/10000 [9:17:45<14:06:45, 8.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:21,282 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4204/10000 [9:17:52<13:08:33, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:29,468 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4205/10000 [9:18:00<13:09:39, 8.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:37,655 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4206/10000 [9:18:08<13:09:10, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:46,003 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4207/10000 [9:18:17<13:12:46, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:52,906 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4208/10000 [9:18:24<12:34:48, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:40:59,792 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4209/10000 [9:18:30<12:07:00, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:06,553 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4210/10000 [9:18:37<11:48:26, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:13,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4211/10000 [9:18:44<11:39:01, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:20,573 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4212/10000 [9:18:51<11:27:32, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:27,632 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4213/10000 [9:18:58<11:27:45, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:34,678 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4214/10000 [9:19:05<11:25:20, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:41,773 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4215/10000 [9:19:12<11:25:13, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:48,803 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▊ | 4216/10000 [9:19:19<11:19:10, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:41:56,411 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4217/10000 [9:19:27<11:36:43, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:04,250 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4218/10000 [9:19:35<11:56:19, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:11,882 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4219/10000 [9:19:43<12:01:49, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:23,611 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4220/10000 [9:19:54<14:01:48, 8.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:31,204 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4221/10000 [9:20:02<13:31:06, 8.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:38,631 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4222/10000 [9:20:09<13:01:59, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:46,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4223/10000 [9:20:17<12:41:11, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:42:53,365 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4224/10000 [9:20:24<12:25:04, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:01,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4225/10000 [9:20:32<12:19:38, 7.68s/it] 42%|████████████████████████▉ | 4225/10000 [9:20:32<12:19:38, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:08,538 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4226/10000 [9:20:39<12:15:57, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:18,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4227/10000 [9:20:49<13:32:48, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:26,153 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4228/10000 [9:20:57<12:59:37, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:33,593 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4229/10000 [9:21:04<12:40:52, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:41,145 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4230/10000 [9:21:12<12:26:47, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:48,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4231/10000 [9:21:19<12:14:39, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:43:55,841 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4232/10000 [9:21:26<12:06:19, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:04,350 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4233/10000 [9:21:35<12:34:19, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:11,743 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4234/10000 [9:21:42<12:22:45, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:19,254 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4235/10000 [9:21:50<12:17:52, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:27,499 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4236/10000 [9:21:58<12:32:04, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:35,268 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|████████████████████████▉ | 4237/10000 [9:22:06<12:31:18, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:43,546 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4238/10000 [9:22:14<12:45:34, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:44:51,996 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4239/10000 [9:22:23<12:58:33, 8.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:00,516 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4240/10000 [9:22:31<13:05:48, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:07,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4241/10000 [9:22:38<12:41:33, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:15,080 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4242/10000 [9:22:46<12:24:37, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:22,295 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4243/10000 [9:22:53<12:06:07, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:32,008 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4244/10000 [9:23:03<13:08:40, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:39,169 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4245/10000 [9:23:10<12:38:21, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:46,625 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4246/10000 [9:23:17<12:28:00, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:45:54,029 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4247/10000 [9:23:25<12:15:20, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:01,529 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4248/10000 [9:23:32<12:10:06, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:09,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4249/10000 [9:23:40<12:16:05, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:16,326 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 42%|█████████████████████████ | 4250/10000 [9:23:47<11:51:58, 7.43s/it] 42%|█████████████████████████ | 4250/10000 [9:23:47<11:51:58, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:23,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4251/10000 [9:23:54<11:56:49, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:31,790 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4252/10000 [9:24:02<12:08:32, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:47,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4253/10000 [9:24:18<15:52:45, 9.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:46:54,443 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4254/10000 [9:24:25<14:38:04, 9.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:02,510 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4255/10000 [9:24:33<14:06:05, 8.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:10,672 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4256/10000 [9:24:41<13:45:25, 8.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:19,029 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4257/10000 [9:24:50<13:39:29, 8.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:27,060 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████ | 4258/10000 [9:24:58<13:24:09, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:36,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4259/10000 [9:25:07<13:53:12, 8.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:52,338 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4260/10000 [9:25:23<17:18:06, 10.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:47:59,948 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4261/10000 [9:25:31<15:42:25, 9.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:08,856 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4262/10000 [9:25:39<15:15:45, 9.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:16,374 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4263/10000 [9:25:47<14:16:09, 8.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:23,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4264/10000 [9:25:54<13:19:56, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:30,603 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4265/10000 [9:26:01<12:46:33, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:36,683 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4266/10000 [9:26:07<11:53:18, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:42,755 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4267/10000 [9:26:13<11:11:09, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:49,559 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4268/10000 [9:26:20<11:06:42, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:48:56,275 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4269/10000 [9:26:27<10:59:18, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:02,842 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4270/10000 [9:26:33<10:48:53, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:09,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4271/10000 [9:26:40<10:45:39, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:16,107 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4272/10000 [9:26:47<10:40:59, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:29,282 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4273/10000 [9:27:00<13:43:49, 8.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:35,539 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4274/10000 [9:27:06<12:37:36, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:42,283 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4275/10000 [9:27:13<12:03:49, 7.59s/it] 43%|█████████████████████████▏ | 4275/10000 [9:27:13<12:03:49, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:49,709 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4276/10000 [9:27:20<11:59:47, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:49:56,667 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4277/10000 [9:27:27<11:40:30, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:03,247 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4278/10000 [9:27:34<11:20:14, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:12,184 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▏ | 4279/10000 [9:27:43<12:11:34, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:18,956 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4280/10000 [9:27:50<11:45:09, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:25,918 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4281/10000 [9:27:57<11:30:35, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:32,386 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4282/10000 [9:28:03<11:10:32, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:39,335 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4283/10000 [9:28:10<11:08:24, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:45,904 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4284/10000 [9:28:17<10:56:37, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:50:53,728 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4285/10000 [9:28:24<11:20:00, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:00,252 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4286/10000 [9:28:31<11:01:35, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:06,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4287/10000 [9:28:37<10:49:22, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:13,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4288/10000 [9:28:44<10:43:11, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:19,567 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4289/10000 [9:28:50<10:30:42, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:25,848 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4290/10000 [9:28:56<10:17:03, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:32,440 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4291/10000 [9:29:03<10:19:36, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:38,977 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4292/10000 [9:29:10<10:22:23, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:45,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4293/10000 [9:29:16<10:14:14, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:51,708 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4294/10000 [9:29:22<10:15:00, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:51:58,212 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4295/10000 [9:29:29<10:14:02, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:05,033 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4296/10000 [9:29:36<10:25:20, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:11,697 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4297/10000 [9:29:42<10:28:46, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:18,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4298/10000 [9:29:49<10:37:34, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:25,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4299/10000 [9:29:57<10:53:18, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:32,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▎ | 4300/10000 [9:30:03<10:44:41, 6.79s/it] 43%|█████████████████████████▎ | 4300/10000 [9:30:03<10:44:41, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:38,737 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4301/10000 [9:30:09<10:25:50, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:46,660 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4302/10000 [9:30:17<11:05:49, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:53,459 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4303/10000 [9:30:24<11:01:30, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:52:59,899 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4304/10000 [9:30:31<10:44:31, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:06,417 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4305/10000 [9:30:37<10:37:14, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:12,550 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4306/10000 [9:30:43<10:17:13, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:18,571 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4307/10000 [9:30:49<10:06:16, 6.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:26,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4308/10000 [9:30:57<10:59:34, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:34,552 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4309/10000 [9:31:05<11:22:33, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:41,671 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4310/10000 [9:31:12<11:20:20, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:49,077 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4311/10000 [9:31:20<11:22:35, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:53:57,790 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4312/10000 [9:31:28<12:05:35, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:05,107 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4313/10000 [9:31:36<11:57:03, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:12,292 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4314/10000 [9:31:43<11:45:53, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:19,882 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4315/10000 [9:31:51<11:53:32, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:27,299 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4316/10000 [9:31:58<11:49:08, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:34,137 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4317/10000 [9:32:05<11:30:10, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:41,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4318/10000 [9:32:12<11:28:36, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:52,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4319/10000 [9:32:23<13:14:16, 8.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:54:59,722 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4320/10000 [9:32:30<12:45:59, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:07,476 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4321/10000 [9:32:38<12:35:29, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:15,518 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▍ | 4322/10000 [9:32:46<12:37:52, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:23,525 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4323/10000 [9:32:54<12:37:20, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:35,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4324/10000 [9:33:06<14:22:43, 9.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:42,673 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4325/10000 [9:33:13<13:30:30, 8.57s/it] 43%|█████████████████████████▌ | 4325/10000 [9:33:13<13:30:30, 8.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:51,529 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4326/10000 [9:33:22<13:40:00, 8.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:55:59,212 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4327/10000 [9:33:30<13:12:13, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:06,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4328/10000 [9:33:37<12:49:58, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:12,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|███���█████████████████████▌ | 4329/10000 [9:33:44<11:54:39, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:19,289 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4330/10000 [9:33:50<11:18:06, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:25,806 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4331/10000 [9:33:56<11:00:59, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:32,884 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4332/10000 [9:34:03<11:01:53, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:39,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4333/10000 [9:34:10<10:50:01, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:45,873 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4334/10000 [9:34:17<10:37:26, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:52,220 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4335/10000 [9:34:23<10:25:48, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:56:58,249 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4336/10000 [9:34:29<10:09:25, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:04,300 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|██████████████████████████ | 4337/10000 [9:34:35<9:57:40, 6.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:11,445 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4338/10000 [9:34:42<10:18:53, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:19,562 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4339/10000 [9:34:50<11:05:01, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:26,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4340/10000 [9:34:57<11:07:07, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:33,528 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4341/10000 [9:35:04<10:59:59, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:40,060 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4342/10000 [9:35:11<10:46:22, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:47,130 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▌ | 4343/10000 [9:35:18<10:52:36, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:57:54,053 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▋ | 4344/10000 [9:35:25<10:52:04, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:01,087 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|████████████████████████��▋ | 4345/10000 [9:35:32<10:51:32, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:07,717 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▋ | 4346/10000 [9:35:38<10:47:01, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:14,465 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▋ | 4347/10000 [9:35:45<10:43:28, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:21,690 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▋ | 4348/10000 [9:35:52<10:53:29, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:28,436 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 43%|█████████████████████████▋ | 4349/10000 [9:35:59<10:49:43, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:37,050 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4350/10000 [9:36:08<11:37:20, 7.41s/it] 44%|█████████████████████████▋ | 4350/10000 [9:36:08<11:37:20, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:43,910 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4351/10000 [9:36:15<11:22:21, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:50,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4352/10000 [9:36:21<11:11:19, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:58:57,525 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4353/10000 [9:36:28<11:01:10, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:05,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4354/10000 [9:36:36<11:36:32, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:12,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4355/10000 [9:36:43<11:13:15, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:19,152 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4356/10000 [9:36:50<11:01:59, 7.04s/it]{'loss': 0.1325, 'learning_rate': 1.871684210526316e-06, 'epoch': 0.41} +{'loss': 0.1365, 'learning_rate': 1.8637894736842106e-06, 'epoch': 0.41} +{'loss': 0.1677, 'learning_rate': 1.8558947368421054e-06, 'epoch': 0.41} +{'loss': 0.2245, 'learning_rate': 1.848e-06, 'epoch': 0.41} +{'loss': 0.1685, 'learning_rate': 1.8401052631578947e-06, 'epoch': 0.42} +{'loss': 0.1616, 'learning_rate': 1.8322105263157894e-06, 'epoch': 0.42} +{'loss': 0.1527, 'learning_rate': 1.8243157894736842e-06, 'epoch': 0.42} +{'loss': 0.1616, 'learning_rate': 1.816421052631579e-06, 'epoch': 0.42} +{'loss': 0.138, 'learning_rate': 1.8085263157894737e-06, 'epoch': 0.43} +{'loss': 0.281, 'learning_rate': 1.8006315789473686e-06, 'epoch': 0.43} +{'loss': 0.3141, 'learning_rate': 1.7927368421052634e-06, 'epoch': 0.43} +{'loss': 0.139, 'learning_rate': 1.784842105263158e-06, 'epoch': 0.43} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.57it/s] Reading metadata...: 2165it [00:00, 16068.11it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 18:59:26,420 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4357/10000 [9:36:57<11:08:11, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:34,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4358/10000 [9:37:05<11:32:43, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:40,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4359/10000 [9:37:11<11:02:25, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:46,917 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4360/10000 [9:37:18<10:37:27, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:53,076 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4361/10000 [9:37:24<10:21:41, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:59:59,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▋ | 4362/10000 [9:37:30<10:09:32, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:05,376 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4363/10000 [9:37:36<9:58:07, 6.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:11,494 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4364/10000 [9:37:42<9:52:56, 6.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:17,831 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4365/10000 [9:37:48<9:48:12, 6.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:24,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4366/10000 [9:37:55<9:55:40, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:31,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4367/10000 [9:38:02<10:09:55, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:37,769 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4368/10000 [9:38:08<10:13:39, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:47,231 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4369/10000 [9:38:18<11:36:22, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:00:53,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4370/10000 [9:38:24<11:04:12, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:00,188 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4371/10000 [9:38:31<10:51:59, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:07,890 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4372/10000 [9:38:39<11:14:29, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:14,447 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4373/10000 [9:38:45<10:57:07, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:21,087 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4374/10000 [9:38:52<10:44:49, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:28,161 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4375/10000 [9:38:59<10:51:58, 6.95s/it] 44%|█████████████████████████▊ | 4375/10000 [9:38:59<10:51:58, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:34,848 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4376/10000 [9:39:06<10:44:35, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:41,085 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4377/10000 [9:39:12<10:23:36, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:48,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4378/10000 [9:39:19<10:50:48, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:01:54,586 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4379/10000 [9:39:25<10:22:59, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:01,246 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4380/10000 [9:39:32<10:21:48, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:07,327 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4381/10000 [9:39:38<10:07:14, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:13,411 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▎ | 4382/10000 [9:39:44<9:54:27, 6.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:19,936 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▎ | 4383/10000 [9:39:51<9:59:35, 6.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:26,317 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▎ | 4384/10000 [9:39:57<9:57:54, 6.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:32,838 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▊ | 4385/10000 [9:40:03<10:00:42, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:39,220 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4386/10000 [9:40:10<10:00:31, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:02:56,985 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4387/10000 [9:40:28<15:20:55, 9.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:03,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4388/10000 [9:40:34<13:36:27, 8.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:09,661 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4389/10000 [9:40:40<12:33:55, 8.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:17,241 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4390/10000 [9:40:48<12:17:55, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:23,851 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4391/10000 [9:40:54<11:43:27, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:30,896 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4392/10000 [9:41:01<11:28:05, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:40,776 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4393/10000 [9:41:11<12:38:28, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:47,211 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4394/10000 [9:41:18<11:53:02, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:03:54,881 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4395/10000 [9:41:26<11:55:26, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:02,605 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4396/10000 [9:41:33<11:54:12, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:09,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4397/10000 [9:41:40<11:32:57, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:21,454 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4398/10000 [9:41:52<13:42:49, 8.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:28,770 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4399/10000 [9:41:59<13:00:17, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:35,814 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4400/10000 [9:42:06<12:22:28, 7.96s/it] 44%|█████████████████████████▉ | 4400/10000 [9:42:06<12:22:28, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:42,826 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4401/10000 [9:42:14<11:58:10, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:50,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4402/10000 [9:42:21<11:55:34, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:04:57,602 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4403/10000 [9:42:28<11:41:16, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:05,430 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4404/10000 [9:42:36<11:46:46, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:12,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4405/10000 [9:42:44<11:45:46, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:20,490 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|█████████████████████████▉ | 4406/10000 [9:42:51<11:45:13, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:27,819 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4407/10000 [9:42:58<11:37:14, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:35,631 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4408/10000 [9:43:06<11:50:11, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:43,510 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4409/10000 [9:43:14<11:53:14, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:51,008 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4410/10000 [9:43:22<11:52:05, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:05:58,279 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4411/10000 [9:43:29<11:40:16, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:05,924 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4412/10000 [9:43:36<11:41:38, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:13,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4413/10000 [9:43:44<11:42:08, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:20,507 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4414/10000 [9:43:51<11:28:51, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:27,316 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4415/10000 [9:43:58<11:14:12, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:35,630 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4416/10000 [9:44:06<11:43:55, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:42,947 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4417/10000 [9:44:14<11:37:17, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:52,123 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4418/10000 [9:44:23<12:24:12, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:06:59,087 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4419/10000 [9:44:30<11:54:36, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:07,119 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4420/10000 [9:44:38<12:05:01, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:14,805 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4421/10000 [9:44:45<12:00:51, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:21,915 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4422/10000 [9:44:53<11:41:44, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:29,755 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4423/10000 [9:45:00<11:51:02, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:37,414 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4424/10000 [9:45:08<11:53:24, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:46,676 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4425/10000 [9:45:17<12:36:50, 8.15s/it] 44%|██████████████████████████ | 4425/10000 [9:45:17<12:36:50, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:07:55,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4426/10000 [9:45:26<12:53:38, 8.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:03,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████ | 4427/10000 [9:45:34<12:39:02, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:10,830 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4428/10000 [9:45:41<12:20:56, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:19,044 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4429/10000 [9:45:50<12:25:56, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:27,060 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4430/10000 [9:45:58<12:27:23, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:34,714 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4431/10000 [9:46:05<12:16:33, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:41,967 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4432/10000 [9:46:13<11:56:44, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:50,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4433/10000 [9:46:21<12:05:01, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:08:57,763 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4434/10000 [9:46:28<12:03:49, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:05,390 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4435/10000 [9:46:36<11:57:25, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:12,316 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4436/10000 [9:46:43<11:36:42, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:35,043 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4437/10000 [9:47:06<18:39:14, 12.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:42,104 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4438/10000 [9:47:13<16:18:01, 10.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:49,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4439/10000 [9:47:20<14:44:05, 9.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:09:56,042 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4440/10000 [9:47:27<13:26:22, 8.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:03,653 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4441/10000 [9:47:34<12:55:34, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:11,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4442/10000 [9:47:42<12:33:16, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:18,706 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4443/10000 [9:47:49<12:18:43, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:26,181 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4444/10000 [9:47:57<12:02:56, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:33,108 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4445/10000 [9:48:04<11:36:27, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:40,474 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4446/10000 [9:48:11<11:34:54, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:47,707 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4447/10000 [9:48:18<11:26:27, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:10:55,894 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4448/10000 [9:48:26<11:45:36, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:03,204 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▏ | 4449/10000 [9:48:34<11:35:55, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:10,736 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 44%|██████████████████████████▎ | 4450/10000 [9:48:41<11:39:36, 7.56s/it] 44%|██████████████████████████▎ | 4450/10000 [9:48:41<11:39:36, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:18,227 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4451/10000 [9:48:49<11:32:43, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:25,735 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4452/10000 [9:48:56<11:33:25, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:33,084 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4453/10000 [9:49:04<11:33:12, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:39,886 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4454/10000 [9:49:11<11:13:48, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:46,968 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4455/10000 [9:49:18<11:06:59, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:11:55,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4456/10000 [9:49:26<11:48:46, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:09,919 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4457/10000 [9:49:41<14:48:39, 9.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:17,866 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4458/10000 [9:49:48<14:00:59, 9.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:25,323 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4459/10000 [9:49:56<13:17:54, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:32,767 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4460/10000 [9:50:03<12:44:42, 8.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:40,085 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4461/10000 [9:50:11<12:16:22, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:12:47,024 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4462/10000 [9:50:18<11:47:23, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:02,918 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4463/10000 [9:50:34<15:35:51, 10.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:15,790 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4464/10000 [9:50:46<16:52:45, 10.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:23,369 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4465/10000 [9:50:54<15:19:14, 9.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:36,761 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4466/10000 [9:51:07<16:50:50, 10.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:44,444 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|████████████��█████████████▎ | 4467/10000 [9:51:15<15:17:24, 9.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:13:53,704 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4468/10000 [9:51:24<15:03:06, 9.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:01,316 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4469/10000 [9:51:32<14:01:29, 9.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:08,351 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4470/10000 [9:51:39<13:01:42, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:15,701 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4471/10000 [9:51:46<12:30:49, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:22,830 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4472/10000 [9:51:53<12:02:53, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:31,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4473/10000 [9:52:02<12:20:45, 8.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:38,759 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4474/10000 [9:52:09<12:06:01, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:46,755 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4475/10000 [9:52:17<12:09:08, 7.92s/it] 45%|██████████████████████████▍ | 4475/10000 [9:52:17<12:09:08, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:14:53,846 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4476/10000 [9:52:24<11:44:23, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:02,898 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4477/10000 [9:52:34<12:23:59, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:09,873 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4478/10000 [9:52:41<11:52:45, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:16,923 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4479/10000 [9:52:48<11:32:58, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:24,877 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4480/10000 [9:52:56<11:46:26, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:33,972 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4481/10000 [9:53:05<12:22:53, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:41,239 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4482/10000 [9:53:12<12:00:02, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:48,098 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4483/10000 [9:53:19<11:33:15, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:15:55,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4484/10000 [9:53:26<11:18:49, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:01,946 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4485/10000 [9:53:32<11:01:41, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:08,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4486/10000 [9:53:39<10:48:13, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:15,575 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4487/10000 [9:53:46<10:48:46, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:22,906 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4488/10000 [9:53:54<10:55:05, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:29,904 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4489/10000 [9:54:01<10:50:00, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:37,586 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4490/10000 [9:54:08<11:06:13, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:46,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4491/10000 [9:54:17<11:41:59, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:16:53,226 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4492/10000 [9:54:24<11:27:48, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:00,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4493/10000 [9:54:31<11:21:10, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:08,333 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4494/10000 [9:54:39<11:31:02, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:15,851 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4495/10000 [9:54:47<11:33:25, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:23,490 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4496/10000 [9:54:54<11:34:31, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:32,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4497/10000 [9:55:03<12:06:23, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:39,742 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4498/10000 [9:55:10<11:56:19, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:48,849 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4499/10000 [9:55:20<12:31:11, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:17:56,895 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4500/10000 [9:55:28<12:27:44, 8.16s/it] 45%|██████████████████████████▌ | 4500/10000 [9:55:28<12:27:44, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:04,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4501/10000 [9:55:35<12:14:44, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:11,799 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4502/10000 [9:55:42<11:51:30, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:19,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4503/10000 [9:55:50<11:47:37, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:25,783 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4504/10000 [9:55:57<11:11:48, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:31,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4505/10000 [9:56:03<10:37:25, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:38,509 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4506/10000 [9:56:09<10:24:28, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:44,744 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4507/10000 [9:56:15<10:10:51, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:50,968 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████ | 4508/10000 [9:56:22<9:57:01, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:18:57,529 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████ | 4509/10000 [9:56:28<9:56:12, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:03,915 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████ | 4510/10000 [9:56:35<9:53:46, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:10,775 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4511/10000 [9:56:41<10:05:41, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:17,344 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▌ | 4512/10000 [9:56:48<10:04:28, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:23,816 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4513/10000 [9:56:55<10:01:27, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:30,104 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████ | 4514/10000 [9:57:01<9:48:49, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:36,598 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████ | 4515/10000 [9:57:07<9:53:22, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:45,539 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4516/10000 [9:57:16<11:01:10, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:53,044 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4517/10000 [9:57:24<11:05:53, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:19:59,909 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4518/10000 [9:57:30<10:53:58, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:07,311 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4519/10000 [9:57:38<11:02:22, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:13,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4520/10000 [9:57:45<10:46:00, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:20,263 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4521/10000 [9:57:51<10:21:34, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:26,413 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4522/10000 [9:57:57<10:04:54, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:33,167 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4523/10000 [9:58:04<10:08:03, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:39,732 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4524/10000 [9:58:10<10:07:30, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:45,849 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████▏ | 4525/10000 [9:58:16<9:50:32, 6.47s/it] 45%|███████████████████████████▏ | 4525/10000 [9:58:16<9:50:32, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:52,023 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████▏ | 4526/10000 [9:58:23<9:43:35, 6.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:20:58,102 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|███████████████████████████▏ | 4527/10000 [9:58:29<9:32:50, 6.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:06,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4528/10000 [9:58:37<10:22:19, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:12,623 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4529/10000 [9:58:43<10:10:57, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:19,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4530/10000 [9:58:50<10:15:47, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:26,459 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4531/10000 [9:58:57<10:24:09, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:33,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4532/10000 [9:59:04<10:18:09, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:39,858 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▋ | 4533/10000 [9:59:10<10:15:37, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:46,847 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4534/10000 [9:59:18<10:24:17, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:21:53,490 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4535/10000 [9:59:24<10:18:40, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:01,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4536/10000 [9:59:32<10:38:19, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:10,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4537/10000 [9:59:41<11:37:31, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:16,288 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4538/10000 [9:59:47<10:52:25, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:22,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▊ | 4539/10000 [9:59:53<10:26:47, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:29,745 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4540/10000 [10:00:00<10:33:50, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:36,194 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4541/10000 [10:00:07<10:22:47, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:44,658 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4542/10000 [10:00:15<11:05:34, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:50,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4543/10000 [10:00:21<10:36:15, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:22:57,496 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4544/10000 [10:00:28<10:27:28, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:04,336 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████���███████████████████▎ | 4545/10000 [10:00:35<10:23:15, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:11,006 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4546/10000 [10:00:42<10:16:46, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:17,577 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▎ | 4547/10000 [10:00:48<10:12:51, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:24,333 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4548/10000 [10:00:55<10:14:48, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:31,054 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 45%|██████████████████████████▍ | 4549/10000 [10:01:02<10:11:39, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:37,732 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4550/10000 [10:01:08<10:11:41, 6.73s/it] 46%|██████████████████████████▍ | 4550/10000 [10:01:08<10:11:41, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:44,054 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4551/10000 [10:01:15<10:00:44, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:50,450 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4552/10000 [10:01:21<9:52:06, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:23:56,747 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4553/10000 [10:01:27<9:47:34, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:06,176 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4554/10000 [10:01:37<11:08:52, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:14,321 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4555/10000 [10:01:45<11:29:27, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:23,052 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4556/10000 [10:01:54<11:58:47, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:29,624 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4557/10000 [10:02:00<11:19:59, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:36,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4558/10000 [10:02:07<10:54:42, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:42,225 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4559/10000 [10:02:13<10:23:26, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:48,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4560/10000 [10:02:19<10:01:28, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:24:54,412 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4561/10000 [10:02:25<9:47:06, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:00,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4562/10000 [10:02:31<9:38:40, 6.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:08,513 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4563/10000 [10:02:39<10:22:21, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:15,148 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4564/10000 [10:02:46<10:16:21, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:22,631 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4565/10000 [10:02:53<10:32:41, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:28,748 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4566/10000 [10:02:59<10:10:58, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:35,121 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4567/10000 [10:03:06<9:59:11, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:43,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▍ | 4568/10000 [10:03:14<10:38:34, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:49,861 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4569/10000 [10:03:20<10:26:30, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:25:56,376 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4570/10000 [10:03:27<10:18:50, 6.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:03,147 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4571/10000 [10:03:34<10:16:34, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:09,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4572/10000 [10:03:40<10:07:37, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:16,257 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4573/10000 [10:03:47<10:05:12, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:22,673 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4574/10000 [10:03:53<9:55:14, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:28,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4575/10000 [10:04:00<9:48:10, 6.51s/it] 46%|██████████████████████████▉ | 4575/10000 [10:04:00<9:48:10, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:36,824 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4576/10000 [10:04:07<10:22:56, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:43,469 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4577/10000 [10:04:14<10:19:23, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:50,516 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4578/10000 [10:04:21<10:25:55, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:26:57,305 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4579/10000 [10:04:28<10:17:33, 6.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:04,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4580/10000 [10:04:35<10:18:27, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:10,599 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4581/10000 [10:04:41<10:09:36, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:17,998 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4582/10000 [10:04:49<10:24:15, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:24,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4583/10000 [10:04:55<10:10:41, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:30,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4584/10000 [10:05:02<10:06:34, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:37,627 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4585/10000 [10:05:08<10:06:05, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:44,427 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4586/10000 [10:05:15<10:07:30, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:51,861 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4587/10000 [10:05:23<10:27:29, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:27:58,547 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4588/10000 [10:05:29<10:17:35, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:05,837 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4589/10000 [10:05:36<10:30:15, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:11,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▌ | 4590/10000 [10:05:42<10:03:52, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:17,895 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████ | 4591/10000 [10:05:49<9:47:31, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:24,298 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████ | 4592/10000 [10:05:55<9:43:53, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:30,348 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████ | 4593/10000 [10:06:01<9:32:29, 6.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:36,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████ | 4594/10000 [10:06:07<9:25:55, 6.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:45,201 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4595/10000 [10:06:16<10:33:23, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:51,335 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4596/10000 [10:06:22<10:08:03, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:28:57,655 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████ | 4597/10000 [10:06:28<9:55:57, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:03,745 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▏ | 4598/10000 [10:06:34<9:38:14, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:09,653 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▏ | 4599/10000 [10:06:40<9:25:33, 6.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:15,684 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▏ | 4600/10000 [10:06:46<9:21:14, 6.24s/it] 46%|███████████████████████████▏ | 4600/10000 [10:06:46<9:21:14, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:22,555 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▏ | 4601/10000 [10:06:53<9:37:38, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:46,036 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4602/10000 [10:07:17<17:15:14, 11.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:29:58,830 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4603/10000 [10:07:30<17:53:05, 11.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:05,381 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4604/10000 [10:07:36<15:27:24, 10.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:12,183 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4605/10000 [10:07:43<13:48:16, 9.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:18,628 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4606/10000 [10:07:49<12:33:41, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:25,070 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4607/10000 [10:07:56<11:43:17, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:31,865 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4608/10000 [10:08:03<11:17:47, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:38,151 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4609/10000 [10:08:09<10:42:16, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:45,200 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4610/10000 [10:08:16<10:37:27, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:51,290 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4611/10000 [10:08:22<10:11:42, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:30:58,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▋ | 4612/10000 [10:08:29<10:14:28, 6.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:05,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4613/10000 [10:08:36<10:16:18, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:11,327 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▏ | 4614/10000 [10:08:42<9:59:41, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:19,378 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4615/10000 [10:08:50<10:36:04, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:26,020 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4616/10000 [10:08:57<10:24:50, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:34,722 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4617/10000 [10:09:05<11:11:16, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:41,300 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4618/10000 [10:09:12<10:43:32, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:47,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4619/10000 [10:09:19<10:30:20, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:31:54,698 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4620/10000 [10:09:25<10:25:20, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:01,256 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4621/10000 [10:09:32<10:08:30, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:07,721 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4622/10000 [10:09:38<10:03:27, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:14,252 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4623/10000 [10:09:45<10:00:00, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:21,035 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4624/10000 [10:09:52<10:00:31, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:28,122 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4625/10000 [10:09:59<10:08:14, 6.79s/it] 46%|██████████████████████████▊ | 4625/10000 [10:09:59<10:08:14, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:34,760 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4626/10000 [10:10:05<10:05:40, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:41,382 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4627/10000 [10:10:12<10:01:51, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:48,163 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4628/10000 [10:10:19<10:03:45, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:32:54,692 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▎ | 4629/10000 [10:10:25<9:59:31, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:00,836 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▎ | 4630/10000 [10:10:32<9:44:37, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:07,098 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▎ | 4631/10000 [10:10:38<9:33:52, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:13,918 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|███████████████████████████▎ | 4632/10000 [10:10:45<9:48:12, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:21,865 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▊ | 4633/10000 [10:10:53<10:23:54, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:28,738 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4634/10000 [10:10:59<10:19:50, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:37,537 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4635/10000 [10:11:08<11:12:23, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:45,200 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4636/10000 [10:11:16<11:12:56, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:33:52,674 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4637/10000 [10:11:23<11:12:15, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:00,088 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4638/10000 [10:11:31<11:09:58, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:07,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4639/10000 [10:11:38<11:03:21, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:14,942 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4640/10000 [10:11:45<11:05:42, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:22,779 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4641/10000 [10:11:53<11:18:50, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:30,207 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4642/10000 [10:12:01<11:13:47, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:37,589 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4643/10000 [10:12:08<11:11:13, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:44,702 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4644/10000 [10:12:15<10:59:13, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:52,428 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4645/10000 [10:12:23<11:08:16, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:34:59,431 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4646/10000 [10:12:30<10:52:53, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:06,343 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4647/10000 [10:12:37<10:44:13, 7.22s/it]{'loss': 0.1279, 'learning_rate': 1.7769473684210528e-06, 'epoch': 0.44} +{'loss': 0.1394, 'learning_rate': 1.7690526315789474e-06, 'epoch': 0.44} +{'loss': 0.1267, 'learning_rate': 1.7611578947368421e-06, 'epoch': 0.44} +{'loss': 0.1167, 'learning_rate': 1.7532631578947369e-06, 'epoch': 0.45} +{'loss': 0.1406, 'learning_rate': 1.7453684210526316e-06, 'epoch': 0.45} +{'loss': 0.1481, 'learning_rate': 1.7374736842105264e-06, 'epoch': 0.45} +{'loss': 0.1508, 'learning_rate': 1.7295789473684211e-06, 'epoch': 0.45} +{'loss': 0.1624, 'learning_rate': 1.7216842105263159e-06, 'epoch': 0.46} +{'loss': 0.1449, 'learning_rate': 1.7137894736842104e-06, 'epoch': 0.46} +{'loss': 0.1454, 'learning_rate': 1.7058947368421051e-06, 'epoch': 0.46} +{'loss': 0.1285, 'learning_rate': 1.6979999999999999e-06, 'epoch': 0.46} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.21it/s] Reading metadata...: 2165it [00:00, 15079.18it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 19:35:14,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4648/10000 [10:12:46<11:18:26, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:22,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4649/10000 [10:12:53<11:15:56, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:29,980 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 46%|██████████████████████████▉ | 4650/10000 [10:13:01<11:17:09, 7.59s/it] 46%|██████████████████████████▉ | 4650/10000 [10:13:01<11:17:09, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:38,135 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|██████████████████████████▉ | 4651/10000 [10:13:09<11:31:26, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:45,803 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|██████████████████████████▉ | 4652/10000 [10:13:17<11:30:27, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:35:55,274 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|██████████████████████████▉ | 4653/10000 [10:13:26<12:15:38, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:03,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|██████████████████████████▉ | 4654/10000 [10:13:34<12:10:01, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:13,333 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|██████████████████████████▉ | 4655/10000 [10:13:44<12:56:26, 8.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:21,697 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4656/10000 [10:13:52<12:46:43, 8.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:29,869 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4657/10000 [10:14:01<12:36:29, 8.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:39,532 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4658/10000 [10:14:10<13:06:11, 8.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:47,016 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4659/10000 [10:14:18<12:28:58, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:36:54,551 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4660/10000 [10:14:25<12:05:27, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:02,644 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4661/10000 [10:14:33<12:06:15, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:10,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4662/10000 [10:14:41<11:51:02, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:17,717 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4663/10000 [10:14:48<11:36:16, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:25,136 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4664/10000 [10:14:56<11:24:31, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:32,896 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4665/10000 [10:15:03<11:25:12, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:41,625 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4666/10000 [10:15:12<11:52:28, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:37:54,453 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4667/10000 [10:15:25<14:02:40, 9.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:01,779 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4668/10000 [10:15:32<13:05:36, 8.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:09,419 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4669/10000 [10:15:40<12:32:41, 8.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:18,897 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4670/10000 [10:15:50<12:59:42, 8.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:26,698 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4671/10000 [10:15:57<12:32:58, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:33,986 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4672/10000 [10:16:05<12:01:22, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:42,439 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4673/10000 [10:16:13<12:10:11, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:51,778 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4674/10000 [10:16:22<12:36:28, 8.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:38:58,948 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4675/10000 [10:16:30<12:04:25, 8.16s/it] 47%|███████████████████████████ | 4675/10000 [10:16:30<12:04:25, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:06,666 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████ | 4676/10000 [10:16:37<11:49:40, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:14,152 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4677/10000 [10:16:45<11:38:19, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:26,212 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4678/10000 [10:16:57<13:27:59, 9.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:33,814 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4679/10000 [10:17:04<12:46:05, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:40,956 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4680/10000 [10:17:11<12:05:45, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:48,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4681/10000 [10:17:19<11:40:35, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:39:55,975 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4682/10000 [10:17:27<11:37:32, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:03,816 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4683/10000 [10:17:34<11:37:49, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:11,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4684/10000 [10:17:42<11:19:57, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:18,689 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4685/10000 [10:17:49<11:19:10, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:26,159 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4686/10000 [10:17:57<11:16:01, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:33,836 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4687/10000 [10:18:05<11:17:02, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:42,479 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4688/10000 [10:18:13<11:39:58, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:49,925 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4689/10000 [10:18:21<11:31:12, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:40:58,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4690/10000 [10:18:29<11:51:53, 8.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:07,541 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4691/10000 [10:18:38<12:15:41, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:18,984 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4692/10000 [10:18:50<13:39:49, 9.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:26,810 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4693/10000 [10:18:57<12:58:51, 8.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:35,955 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4694/10000 [10:19:07<13:11:19, 8.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:43,346 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4695/10000 [10:19:14<12:27:48, 8.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:51,582 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4696/10000 [10:19:22<12:21:13, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:41:58,528 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4697/10000 [10:19:29<11:43:37, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:05,468 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▏ | 4698/10000 [10:19:36<11:16:38, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:12,603 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4699/10000 [10:19:43<11:00:49, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:19,492 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4700/10000 [10:19:50<10:45:15, 7.30s/it] 47%|███████████████████████████▎ | 4700/10000 [10:19:50<10:45:15, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:25,564 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4701/10000 [10:19:56<10:15:54, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:31,946 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▋ | 4702/10000 [10:20:03<9:59:34, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:38,727 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▋ | 4703/10000 [10:20:09<9:58:45, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:45,065 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4704/10000 [10:20:16<9:46:09, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:42:51,199 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4705/10000 [10:20:22<9:32:15, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:00,774 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4706/10000 [10:20:31<10:52:13, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:07,792 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4707/10000 [10:20:38<10:45:19, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:13,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4708/10000 [10:20:45<10:15:52, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:21,825 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4709/10000 [10:20:52<10:36:05, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:28,537 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4710/10000 [10:20:59<10:24:37, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:36,307 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4711/10000 [10:21:07<10:41:59, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:42,667 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4712/10000 [10:21:13<10:18:33, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:48,973 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4713/10000 [10:21:20<9:58:48, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:43:57,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4714/10000 [10:21:28<10:32:10, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:04,499 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4715/10000 [10:21:35<10:37:23, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:11,360 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4716/10000 [10:21:42<10:29:23, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:18,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4717/10000 [10:21:49<10:16:32, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:24,624 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4718/10000 [10:21:55<10:06:00, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:31,255 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▎ | 4719/10000 [10:22:02<10:00:04, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:37,958 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4720/10000 [10:22:09<9:56:44, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:44,571 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4721/10000 [10:22:15<9:51:34, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:51,451 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4722/10000 [10:22:22<9:54:04, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:44:58,051 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4723/10000 [10:22:29<9:53:29, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:04,746 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▊ | 4724/10000 [10:22:35<9:50:26, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:10,966 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4725/10000 [10:22:41<9:33:24, 6.52s/it] 47%|███████████████████████████▉ | 4725/10000 [10:22:41<9:33:24, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:17,544 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4726/10000 [10:22:48<9:37:56, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:24,721 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4727/10000 [10:22:55<9:54:52, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:31,264 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4728/10000 [10:23:02<9:49:19, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:38,717 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4729/10000 [10:23:09<10:07:56, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:45,676 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4730/10000 [10:23:16<10:09:07, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:52,525 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4731/10000 [10:23:23<10:06:26, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:45:59,868 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4732/10000 [10:23:31<10:18:00, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:08,136 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4733/10000 [10:23:39<10:50:29, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:17,185 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4734/10000 [10:23:48<11:33:17, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:23,833 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4735/10000 [10:23:54<10:56:33, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:31,796 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4736/10000 [10:24:02<11:10:27, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:38,220 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4737/10000 [10:24:09<10:40:12, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:44,995 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▍ | 4738/10000 [10:24:16<10:24:31, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:51,113 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4739/10000 [10:24:22<9:59:06, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:46:57,731 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4740/10000 [10:24:28<9:50:37, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:04,300 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▉ | 4741/10000 [10:24:35<9:48:26, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:12,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4742/10000 [10:24:43<10:20:06, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:23,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4743/10000 [10:24:54<12:05:26, 8.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:31,032 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4744/10000 [10:25:02<11:51:37, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:37,595 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4745/10000 [10:25:08<11:11:51, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:43,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4746/10000 [10:25:15<10:33:42, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:51,406 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4747/10000 [10:25:22<10:41:26, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:47:58,553 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4748/10000 [10:25:29<10:37:10, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:06,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 47%|███████████████████████████▌ | 4749/10000 [10:25:37<10:58:44, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:14,034 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4750/10000 [10:25:45<10:54:45, 7.48s/it] 48%|███████████████████████████▌ | 4750/10000 [10:25:45<10:54:45, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:22,165 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4751/10000 [10:25:53<11:10:22, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:30,376 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4752/10000 [10:26:01<11:22:44, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:39,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4753/10000 [10:26:10<11:52:51, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:46,288 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4754/10000 [10:26:17<11:21:17, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:48:53,238 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4755/10000 [10:26:24<10:59:55, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:02,238 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4756/10000 [10:26:33<11:37:50, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:09,585 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4757/10000 [10:26:40<11:20:00, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:16,356 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4758/10000 [10:26:47<10:54:09, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:23,269 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|█████████████████████████���█▌ | 4759/10000 [10:26:54<10:37:57, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:29,922 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4760/10000 [10:27:01<10:23:00, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:36,963 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4761/10000 [10:27:08<10:22:02, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:44,532 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▌ | 4762/10000 [10:27:15<10:29:53, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:51,638 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4763/10000 [10:27:22<10:29:19, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:49:59,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4764/10000 [10:27:30<10:35:07, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:07,166 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4765/10000 [10:27:38<10:54:46, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:17,449 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4766/10000 [10:27:48<12:09:32, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:32,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4767/10000 [10:28:03<15:04:30, 10.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:40,230 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4768/10000 [10:28:11<13:55:17, 9.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:47,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4769/10000 [10:28:18<12:49:34, 8.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:50:55,605 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4770/10000 [10:28:26<12:34:06, 8.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:04,809 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4771/10000 [10:28:35<12:48:40, 8.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:12,334 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4772/10000 [10:28:43<12:13:07, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:23,413 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4773/10000 [10:28:54<13:22:14, 9.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:31,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4774/10000 [10:29:03<13:06:01, 9.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:40,020 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|█████████████████████████���█▋ | 4775/10000 [10:29:11<12:41:54, 8.75s/it] 48%|███████████████████████████▋ | 4775/10000 [10:29:11<12:41:54, 8.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:47,368 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4776/10000 [10:29:18<12:06:01, 8.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:51:54,743 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4777/10000 [10:29:25<11:40:32, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:02,281 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4778/10000 [10:29:33<11:25:35, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:09,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4779/10000 [10:29:41<11:20:22, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:17,450 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4780/10000 [10:29:48<11:11:15, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:24,936 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4781/10000 [10:29:56<11:04:29, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:32,252 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4782/10000 [10:30:03<10:57:22, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:39,870 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4783/10000 [10:30:11<10:59:20, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:47,567 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▋ | 4784/10000 [10:30:18<11:02:32, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:52:55,284 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4785/10000 [10:30:26<11:04:55, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:02,744 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4786/10000 [10:30:33<10:59:59, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:23,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4787/10000 [10:30:54<16:38:27, 11.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:30,986 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4788/10000 [10:31:02<14:56:54, 10.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:37,869 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4789/10000 [10:31:09<13:28:42, 9.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:45,114 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4790/10000 [10:31:16<12:33:56, 8.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:53:52,311 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4791/10000 [10:31:23<11:55:45, 8.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:00,749 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4792/10000 [10:31:31<12:01:36, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:09,885 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4793/10000 [10:31:41<12:22:08, 8.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:17,607 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4794/10000 [10:31:48<11:58:57, 8.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:24,501 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4795/10000 [10:31:55<11:22:52, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:31,477 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4796/10000 [10:32:02<11:00:12, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:41,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4797/10000 [10:32:12<11:51:37, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:49,331 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4798/10000 [10:32:20<11:52:29, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:54:57,724 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4799/10000 [10:32:28<11:54:37, 8.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:04,713 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4800/10000 [10:32:35<11:23:59, 7.89s/it] 48%|███████████████████████████▊ | 4800/10000 [10:32:35<11:23:59, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:11,535 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4801/10000 [10:32:42<10:57:34, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:18,791 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4802/10000 [10:32:49<10:48:01, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:25,604 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4803/10000 [10:32:56<10:30:30, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:32,788 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4804/10000 [10:33:03<10:25:42, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:39,723 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4805/10000 [10:33:10<10:19:31, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:47,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▊ | 4806/10000 [10:33:18<10:25:53, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:55:55,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4807/10000 [10:33:26<10:55:52, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:03,028 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4808/10000 [10:33:34<10:54:44, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:10,588 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4809/10000 [10:33:41<10:53:36, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:19,078 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4810/10000 [10:33:50<11:16:05, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:26,504 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4811/10000 [10:33:57<11:07:49, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:36,098 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4812/10000 [10:34:07<11:55:12, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:43,663 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4813/10000 [10:34:14<11:38:31, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:51,422 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4814/10000 [10:34:22<11:29:54, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:56:58,458 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4815/10000 [10:34:29<11:03:13, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:05,719 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4816/10000 [10:34:36<10:51:54, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:12,810 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4817/10000 [10:34:43<10:40:51, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:20,011 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4818/10000 [10:34:51<10:35:46, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:27,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4819/10000 [10:34:58<10:38:22, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:36,103 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4820/10000 [10:35:07<11:07:16, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:43,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4821/10000 [10:35:14<10:55:21, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:50,816 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4822/10000 [10:35:21<10:52:58, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:57:59,012 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4823/10000 [10:35:30<11:07:56, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:05,985 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4824/10000 [10:35:37<10:51:00, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:13,091 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4825/10000 [10:35:44<10:39:32, 7.41s/it] 48%|███████████████████████████▉ | 4825/10000 [10:35:44<10:39:32, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:20,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4826/10000 [10:35:51<10:24:28, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:27,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|███████████████████████████▉ | 4827/10000 [10:35:58<10:17:42, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:36,094 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4828/10000 [10:36:07<11:09:53, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:44,620 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4829/10000 [10:36:15<11:27:13, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:58:52,081 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4830/10000 [10:36:23<11:12:55, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:00,066 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4831/10000 [10:36:31<11:19:13, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:13,292 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4832/10000 [10:36:44<13:37:56, 9.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:20,455 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4833/10000 [10:36:51<12:37:06, 8.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:29,797 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4834/10000 [10:37:00<12:51:42, 8.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:36,920 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4835/10000 [10:37:08<12:04:24, 8.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:44,104 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4836/10000 [10:37:15<11:30:10, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:51,872 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4837/10000 [10:37:22<11:24:03, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 19:59:59,674 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4838/10000 [10:37:30<11:17:36, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:07,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4839/10000 [10:37:38<11:05:14, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:15,516 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4840/10000 [10:37:46<11:27:11, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:36,914 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4841/10000 [10:38:07<17:09:33, 11.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:43,628 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4842/10000 [10:38:14<14:54:14, 10.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:51,199 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4843/10000 [10:38:22<13:42:07, 9.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:00:59,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4844/10000 [10:38:30<12:56:04, 9.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:06,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4845/10000 [10:38:37<12:11:44, 8.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:14,538 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4846/10000 [10:38:45<12:02:58, 8.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:22,359 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4847/10000 [10:38:53<11:50:22, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:30,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4848/10000 [10:39:01<11:37:25, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:37,776 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████ | 4849/10000 [10:39:08<11:25:29, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:46,359 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 48%|████████████████████████████▏ | 4850/10000 [10:39:17<11:39:32, 8.15s/it] 48%|████████████████████████████▏ | 4850/10000 [10:39:17<11:39:32, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:01:54,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4851/10000 [10:39:25<11:31:28, 8.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:03,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4852/10000 [10:39:34<12:04:08, 8.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:11,197 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4853/10000 [10:39:42<11:45:32, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:19,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4854/10000 [10:39:51<11:56:28, 8.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:27,772 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4855/10000 [10:39:58<11:43:01, 8.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:34,543 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4856/10000 [10:40:05<11:06:21, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:41,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4857/10000 [10:40:12<10:45:16, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:47,798 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4858/10000 [10:40:18<10:11:55, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:02:53,896 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4859/10000 [10:40:25<9:46:45, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:00,017 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4860/10000 [10:40:31<9:27:30, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:06,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4861/10000 [10:40:37<9:14:50, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:14,477 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4862/10000 [10:40:45<10:02:31, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:21,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4863/10000 [10:40:52<9:53:23, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:27,968 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4864/10000 [10:40:59<9:47:48, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:34,782 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4865/10000 [10:41:05<9:49:51, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:41,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4866/10000 [10:41:12<9:36:50, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:48,445 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4867/10000 [10:41:19<9:49:36, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:03:55,408 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4868/10000 [10:41:26<9:50:31, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:04:02,406 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4869/10000 [10:41:33<9:53:49, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:04:21,677 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▏ | 4870/10000 [10:41:52<15:10:40, 10.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:04:28,514 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4871/10000 [10:41:59<13:32:09, 9.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:04:41,054 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4872/10000 [10:42:12<14:50:35, 10.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:04:48,031 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4873/10000 [10:42:19<13:20:05, 9.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:01,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4874/10000 [10:42:32<14:56:25, 10.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:08,127 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4875/10000 [10:42:39<13:29:07, 9.47s/it] 49%|████████████████████████████▎ | 4875/10000 [10:42:39<13:29:07, 9.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:16,528 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4876/10000 [10:42:47<12:59:11, 9.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:23,076 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4877/10000 [10:42:54<11:54:05, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:29,709 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4878/10000 [10:43:00<11:09:26, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:37,090 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4879/10000 [10:43:08<10:58:04, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:44,660 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4880/10000 [10:43:15<10:53:35, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:05:53,835 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4881/10000 [10:43:25<11:32:41, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:05,473 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4882/10000 [10:43:36<13:02:28, 9.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:13,056 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4883/10000 [10:43:44<12:19:33, 8.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:20,832 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4884/10000 [10:43:52<11:58:24, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:28,690 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4885/10000 [10:43:59<11:42:35, 8.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:36,446 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4886/10000 [10:44:07<11:28:50, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:44,321 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4887/10000 [10:44:15<11:25:36, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:52,052 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4888/10000 [10:44:23<11:17:48, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:06:59,133 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4889/10000 [10:44:30<10:55:42, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:06,793 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4890/10000 [10:44:37<10:51:19, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:29,878 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4891/10000 [10:45:01<17:28:19, 12.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:37,696 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▎ | 4892/10000 [10:45:08<15:34:14, 10.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:44,529 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4893/10000 [10:45:15<13:44:19, 9.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:51,388 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4894/10000 [10:45:22<12:33:11, 8.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:07:58,247 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4895/10000 [10:45:29<11:43:06, 8.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:05,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4896/10000 [10:45:36<11:07:16, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:12,114 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4897/10000 [10:45:43<10:47:19, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:19,304 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4898/10000 [10:45:50<10:32:20, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:26,140 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4899/10000 [10:45:57<10:21:03, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:33,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4900/10000 [10:46:04<10:24:04, 7.34s/it] 49%|████████████████████████████▍ | 4900/10000 [10:46:04<10:24:04, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:41,143 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4901/10000 [10:46:12<10:26:07, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:48,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4902/10000 [10:46:20<10:39:20, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:08:56,603 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4903/10000 [10:46:27<10:44:06, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:04,438 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4904/10000 [10:46:35<10:47:11, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:12,306 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4905/10000 [10:46:43<10:52:12, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:19,714 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4906/10000 [10:46:50<10:47:06, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:26,757 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4907/10000 [10:46:57<10:34:00, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:35,373 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4908/10000 [10:47:06<11:02:28, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:42,336 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4909/10000 [10:47:13<10:41:41, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:50,586 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4910/10000 [10:47:21<10:58:49, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:09:59,835 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4911/10000 [10:47:30<11:34:13, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:06,811 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4912/10000 [10:47:37<11:02:02, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:13,505 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▍ | 4913/10000 [10:47:44<10:35:46, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:20,373 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4914/10000 [10:47:51<10:20:57, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:29,736 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4915/10000 [10:48:00<11:12:54, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:36,767 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4916/10000 [10:48:07<10:46:49, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:43,521 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4917/10000 [10:48:14<10:23:57, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:50,704 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4918/10000 [10:48:21<10:22:33, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:10:58,419 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4919/10000 [10:48:29<10:29:21, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:06,417 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4920/10000 [10:48:37<10:43:34, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:13,744 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4921/10000 [10:48:44<10:36:21, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:21,726 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4922/10000 [10:48:52<10:48:48, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:29,562 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4923/10000 [10:49:00<10:51:47, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:37,018 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4924/10000 [10:49:08<10:46:32, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:44,111 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4925/10000 [10:49:15<10:32:03, 7.47s/it] 49%|████████████████████████████▌ | 4925/10000 [10:49:15<10:32:03, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:11:51,920 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4926/10000 [10:49:23<10:40:56, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:12:00,002 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4927/10000 [10:49:31<10:55:10, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:12:07,626 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4928/10000 [10:49:38<10:51:20, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:12:15,942 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4929/10000 [10:49:47<11:05:21, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:12:22,797 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4930/10000 [10:49:53<10:41:26, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:12:52,627 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4931/10000 [10:50:23<20:04:22, 14.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:00,890 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4932/10000 [10:50:32<17:30:59, 12.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:08,835 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4933/10000 [10:50:40<15:38:24, 11.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:16,324 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4934/10000 [10:50:47<14:06:56, 10.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:24,127 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▌ | 4935/10000 [10:50:55<13:09:52, 9.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:31,706 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4936/10000 [10:51:02<12:24:32, 8.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:39,989 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4937/10000 [10:51:11<12:10:53, 8.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:13:47,592 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4938/10000 [10:51:18<11:40:17, 8.30s/it]{'loss': 0.1178, 'learning_rate': 1.6901052631578948e-06, 'epoch': 0.47} +{'loss': 0.1291, 'learning_rate': 1.6822105263157896e-06, 'epoch': 0.47} +{'loss': 0.1715, 'learning_rate': 1.6743157894736843e-06, 'epoch': 0.47} +{'loss': 0.2607, 'learning_rate': 1.666421052631579e-06, 'epoch': 0.47} +{'loss': 0.1372, 'learning_rate': 1.6585263157894738e-06, 'epoch': 0.47} +{'loss': 0.2149, 'learning_rate': 1.6506315789473686e-06, 'epoch': 0.48} +{'loss': 0.1565, 'learning_rate': 1.6427368421052633e-06, 'epoch': 0.48} +{'loss': 0.1308, 'learning_rate': 1.6348421052631578e-06, 'epoch': 0.48} +{'loss': 0.138, 'learning_rate': 1.6269473684210526e-06, 'epoch': 0.48} +{'loss': 0.1417, 'learning_rate': 1.6190526315789473e-06, 'epoch': 0.49} +{'loss': 0.1523, 'learning_rate': 1.611157894736842e-06, 'epoch': 0.49} +{'loss': 0.1371, 'learning_rate': 1.6032631578947368e-06, 'epoch': 0.49} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.67it/s] Reading metadata...: 2165it [00:00, 15493.46it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 20:13:55,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4939/10000 [10:51:27<11:44:13, 8.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:03,957 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4940/10000 [10:51:35<11:36:03, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:11,125 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4941/10000 [10:51:42<11:05:49, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:18,411 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4942/10000 [10:51:49<10:53:20, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:25,944 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4943/10000 [10:51:57<10:44:46, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:32,924 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4944/10000 [10:52:04<10:29:24, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:40,038 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4945/10000 [10:52:11<10:21:55, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:47,420 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4946/10000 [10:52:18<10:20:48, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:14:54,633 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4947/10000 [10:52:25<10:18:22, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:01,846 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4948/10000 [10:52:33<10:15:31, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:09,013 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 49%|████████████████████████████▋ | 4949/10000 [10:52:40<10:10:27, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:16,190 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▋ | 4950/10000 [10:52:47<10:09:21, 7.24s/it] 50%|████████████████████████████▋ | 4950/10000 [10:52:47<10:09:21, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:25,026 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▋ | 4951/10000 [10:52:56<10:48:14, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:32,261 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▋ | 4952/10000 [10:53:03<10:36:31, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:39,238 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▋ | 4953/10000 [10:53:10<10:19:35, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:46,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▋ | 4954/10000 [10:53:17<10:11:11, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:15:53,043 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 4955/10000 [10:53:24<9:56:12, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:00,015 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 4956/10000 [10:53:31<9:56:44, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:08,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4957/10000 [10:53:39<10:21:30, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:15,359 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4958/10000 [10:53:46<10:15:16, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:23,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4959/10000 [10:53:54<10:34:40, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:31,289 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4960/10000 [10:54:02<10:43:42, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:39,181 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4961/10000 [10:54:10<10:50:47, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:46,363 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4962/10000 [10:54:17<10:35:35, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:16:53,816 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4963/10000 [10:54:24<10:30:21, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:00,858 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4964/10000 [10:54:32<10:21:26, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:07,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4965/10000 [10:54:38<10:08:26, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:14,580 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4966/10000 [10:54:45<9:53:31, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:21,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4967/10000 [10:54:52<9:49:40, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:28,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4968/10000 [10:54:59<9:46:41, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:35,234 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4969/10000 [10:55:06<9:45:35, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:42,560 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4970/10000 [10:55:13<9:52:51, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:51,758 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4971/10000 [10:55:22<10:45:47, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:17:58,957 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4972/10000 [10:55:30<10:35:17, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:06,079 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4973/10000 [10:55:37<10:23:03, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:12,988 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4974/10000 [10:55:44<10:06:02, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:19,678 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 4975/10000 [10:55:50<9:54:00, 7.09s/it] 50%|█████████████████████████████▎ | 4975/10000 [10:55:50<9:54:00, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:27,607 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4976/10000 [10:55:58<10:16:03, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:35,776 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4977/10000 [10:56:06<10:34:11, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:42,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▊ | 4978/10000 [10:56:14<10:24:25, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:51,048 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4979/10000 [10:56:22<10:40:44, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:18:58,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4980/10000 [10:56:29<10:31:09, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:05,682 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4981/10000 [10:56:36<10:28:01, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:12,762 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4982/10000 [10:56:43<10:17:09, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:20,545 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4983/10000 [10:56:51<10:25:18, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:27,999 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4984/10000 [10:56:59<10:23:25, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:34,693 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4985/10000 [10:57:05<10:06:45, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:41,195 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▍ | 4986/10000 [10:57:12<9:48:06, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:47,793 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▍ | 4987/10000 [10:57:18<9:34:48, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:19:54,397 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▍ | 4988/10000 [10:57:25<9:29:08, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:00,885 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▍ | 4989/10000 [10:57:31<9:18:29, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:24,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4990/10000 [10:57:55<16:23:36, 11.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:30,495 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4991/10000 [10:58:01<13:58:33, 10.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:37,033 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4992/10000 [10:58:08<12:32:51, 9.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:43,753 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4993/10000 [10:58:14<11:34:39, 8.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:50,600 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4994/10000 [10:58:21<10:56:12, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:20:57,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4995/10000 [10:58:28<10:19:58, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:21:03,985 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4996/10000 [10:58:35<10:06:52, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:21:11,480 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4997/10000 [10:58:42<10:15:22, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:21:18,772 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4998/10000 [10:58:49<10:10:51, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:21:25,840 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▉ | 4999/10000 [10:58:56<10:04:13, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:21:32,497 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▌ | 5000/10000 [10:59:03<9:48:20, 7.06s/it] 50%|█████████████████████████████▌ | 5000/10000 [10:59:03<9:48:20, 7.06s/it][INFO|trainer.py:2955] 2022-12-16 20:21:34,512 >> ***** Running Evaluation ***** +[INFO|trainer.py:2959] 2022-12-16 20:21:34,512 >> Num examples: Unknown +[INFO|trainer.py:2960] 2022-12-16 20:21:34,512 >> Batch size = 32 +{'loss': 0.1398, 'learning_rate': 1.5953684210526316e-06, 'epoch': 0.49} +{'loss': 0.14, 'learning_rate': 1.5874736842105265e-06, 'epoch': 0.5} +{'loss': 0.1429, 'learning_rate': 1.5795789473684213e-06, 'epoch': 0.5} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 9.04it/s] Reading metadata...: 1704it [00:00, 13340.45it/s] +[INFO|trainer_utils.py:689] 2022-12-16 20:21:38,519 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender. If up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. + 50%|█████████████████████████████▌ | 5000/10000 [11:03:51<9:48:20, 7.06s/it][INFO|trainer.py:2700] 2022-12-16 20:26:22,164 >> Saving model checkpoint to ./checkpoint-5000 +[INFO|configuration_utils.py:447] 2022-12-16 20:26:22,165 >> Configuration saved in ./checkpoint-5000/config.json +[INFO|modeling_utils.py:1680] 2022-12-16 20:26:23,283 >> Model weights saved in ./checkpoint-5000/pytorch_model.bin +[INFO|feature_extraction_utils.py:368] 2022-12-16 20:26:23,298 >> Feature extractor saved in ./checkpoint-5000/preprocessor_config.json +[INFO|feature_extraction_utils.py:368] 2022-12-16 20:26:27,319 >> Feature extractor saved in ./preprocessor_config.json