diff --git "a/nohup.out" "b/nohup.out" --- "a/nohup.out" +++ "b/nohup.out" @@ -71666,3 +71666,1065 @@ If your task is similar to the task the model of the checkpoint was trained on, [INFO|modeling_utils.py:1680] 2022-12-16 20:26:23,283 >> Model weights saved in ./checkpoint-5000/pytorch_model.bin [INFO|feature_extraction_utils.py:368] 2022-12-16 20:26:23,298 >> Feature extractor saved in ./checkpoint-5000/preprocessor_config.json [INFO|feature_extraction_utils.py:368] 2022-12-16 20:26:27,319 >> Feature extractor saved in ./preprocessor_config.json +[WARNING|modeling_whisper.py:902] 2022-12-16 20:28:00,042 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████ | 5001/10000 [11:05:31<168:19:52, 121.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:07,930 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|████████████████████████████▌ | 5002/10000 [11:05:39<121:07:12, 87.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:15,837 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5003/10000 [11:05:46<88:00:39, 63.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:23,079 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5004/10000 [11:05:54<64:37:37, 46.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:30,954 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5005/10000 [11:06:02<48:31:52, 34.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:38,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5006/10000 [11:06:09<37:12:13, 26.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:46,428 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5007/10000 [11:06:17<29:11:30, 21.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:28:54,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5008/10000 [11:06:25<23:41:53, 17.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:01,694 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5009/10000 [11:06:32<19:42:18, 14.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:10,844 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5010/10000 [11:06:42<17:36:16, 12.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:18,357 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5011/10000 [11:06:49<15:26:33, 11.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:27,022 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5012/10000 [11:06:58<14:25:27, 10.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:35,261 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5013/10000 [11:07:06<13:30:12, 9.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:42,475 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5014/10000 [11:07:13<12:27:26, 8.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:49,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5015/10000 [11:07:20<11:33:22, 8.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:29:56,209 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5016/10000 [11:07:27<10:56:31, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:03,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5017/10000 [11:07:34<10:33:19, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:11,093 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5018/10000 [11:07:42<10:40:46, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:18,053 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5019/10000 [11:07:49<10:22:11, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:25,401 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5020/10000 [11:07:56<10:19:08, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:32,704 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████ | 5021/10000 [11:08:03<10:12:48, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:47,982 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5022/10000 [11:08:19<13:30:52, 9.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:30:55,448 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5023/10000 [11:08:26<12:32:55, 9.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:03,172 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5024/10000 [11:08:34<11:56:13, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:10,102 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5025/10000 [11:08:41<11:16:18, 8.16s/it] 50%|█████████████████████████████▏ | 5025/10000 [11:08:41<11:16:18, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:17,318 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5026/10000 [11:08:48<10:53:18, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:25,656 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5027/10000 [11:08:56<11:05:41, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:34,570 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5028/10000 [11:09:05<11:26:20, 8.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:42,688 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5029/10000 [11:09:13<11:21:21, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:31:52,691 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5030/10000 [11:09:23<12:06:02, 8.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:00,022 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5031/10000 [11:09:31<11:29:20, 8.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:07,364 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5032/10000 [11:09:38<11:06:43, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:15,257 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5033/10000 [11:09:46<11:00:16, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:23,274 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5034/10000 [11:09:54<10:58:44, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:30,737 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5035/10000 [11:10:01<10:47:54, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:37,943 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5036/10000 [11:10:09<10:34:46, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:44,795 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5037/10000 [11:10:15<10:13:40, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:51,876 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5038/10000 [11:10:23<10:05:00, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:32:59,021 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5039/10000 [11:10:30<10:00:14, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:06,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5040/10000 [11:10:37<10:05:09, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:14,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5041/10000 [11:10:45<10:16:54, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:26,915 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5042/10000 [11:10:58<12:24:26, 9.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:34,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▏ | 5043/10000 [11:11:06<12:02:00, 8.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:42,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5044/10000 [11:11:13<11:30:40, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:50,627 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5045/10000 [11:11:21<11:22:34, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:33:58,287 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5046/10000 [11:11:29<11:10:38, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:06,396 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5047/10000 [11:11:37<11:07:16, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:13,373 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5048/10000 [11:11:44<10:42:55, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:20,299 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5049/10000 [11:11:51<10:18:52, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:28,880 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 50%|█████████████████████████████▎ | 5050/10000 [11:11:59<10:45:37, 7.83s/it] 50%|█████████████████████████████▎ | 5050/10000 [11:11:59<10:45:37, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:35,736 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▎ | 5051/10000 [11:12:06<10:20:15, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:42,760 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▎ | 5052/10000 [11:12:13<10:11:01, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:49,605 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5053/10000 [11:12:20<9:56:05, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:34:56,459 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5054/10000 [11:12:27<9:45:53, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:03,160 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5055/10000 [11:12:34<9:37:26, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:10,126 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5056/10000 [11:12:41<9:36:29, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:17,088 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5057/10000 [11:12:48<9:31:30, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:23,754 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5058/10000 [11:12:54<9:24:26, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:30,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5059/10000 [11:13:01<9:22:46, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:37,326 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5060/10000 [11:13:08<9:25:39, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:44,246 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5061/10000 [11:13:15<9:26:30, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:51,716 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5062/10000 [11:13:22<9:39:13, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:35:59,397 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5063/10000 [11:13:30<9:54:44, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:08,895 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▎ | 5064/10000 [11:13:39<10:50:02, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:16,274 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5065/10000 [11:13:47<10:40:09, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:24,040 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5066/10000 [11:13:55<10:36:17, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:31,772 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5067/10000 [11:14:02<10:38:59, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:39,447 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5068/10000 [11:14:10<10:35:38, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:48,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5069/10000 [11:14:19<11:04:06, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:36:56,164 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5070/10000 [11:14:27<10:57:16, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:03,714 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5071/10000 [11:14:34<10:47:17, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:11,769 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5072/10000 [11:14:42<10:51:37, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:27,143 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5073/10000 [11:14:58<13:53:47, 10.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:34,069 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5074/10000 [11:15:05<12:35:00, 9.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:42,647 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5075/10000 [11:15:13<12:18:07, 8.99s/it] 51%|█████████████████████████████▍ | 5075/10000 [11:15:13<12:18:07, 8.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:50,848 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5076/10000 [11:15:21<11:58:19, 8.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:37:57,714 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5077/10000 [11:15:28<11:13:37, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:04,532 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5078/10000 [11:15:35<10:37:41, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:11,522 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5079/10000 [11:15:42<10:17:23, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:18,455 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5080/10000 [11:15:49<10:04:45, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:25,584 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▉ | 5081/10000 [11:15:56<9:57:19, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:32,723 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▉ | 5082/10000 [11:16:03<9:55:10, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:39,854 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▉ | 5083/10000 [11:16:11<9:51:06, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:47,811 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5084/10000 [11:16:18<10:07:34, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:38:54,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▍ | 5085/10000 [11:16:26<10:02:49, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:01,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5086/10000 [11:16:33<9:53:26, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:08,678 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5087/10000 [11:16:39<9:39:15, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:16,511 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5088/10000 [11:16:47<9:59:04, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:25,588 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5089/10000 [11:16:56<10:42:41, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:33,106 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5090/10000 [11:17:04<10:34:53, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:40,098 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5091/10000 [11:17:11<10:15:00, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:47,118 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5092/10000 [11:17:18<10:02:44, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:39:54,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5093/10000 [11:17:25<9:55:25, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:01,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5094/10000 [11:17:32<10:01:52, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:08,896 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5095/10000 [11:17:40<9:57:01, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:22,849 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5096/10000 [11:17:53<12:36:57, 9.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:31,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5097/10000 [11:18:02<12:20:04, 9.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:39,035 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5098/10000 [11:18:10<11:45:35, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:47,194 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5099/10000 [11:18:18<11:36:39, 8.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:40:54,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5100/10000 [11:18:25<10:54:21, 8.01s/it] 51%|█████████████████████████████▌ | 5100/10000 [11:18:25<10:54:21, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:01,038 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5101/10000 [11:18:32<10:29:07, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:07,565 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5102/10000 [11:18:38<9:59:45, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:14,045 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████ | 5103/10000 [11:18:45<9:37:46, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:28,401 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5104/10000 [11:18:59<12:36:05, 9.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:35,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5105/10000 [11:19:06<11:38:44, 8.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:41,771 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5106/10000 [11:19:12<10:43:43, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:48,195 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▌ | 5107/10000 [11:19:19<10:11:27, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:41:54,913 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5108/10000 [11:19:26<9:52:01, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:02,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▋ | 5109/10000 [11:19:33<10:05:59, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:09,296 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5110/10000 [11:19:40<9:44:07, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:16,575 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5111/10000 [11:19:47<9:43:42, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:22,909 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5112/10000 [11:19:54<9:26:02, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:29,418 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5113/10000 [11:20:00<9:16:22, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:36,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5114/10000 [11:20:07<9:13:58, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:43,129 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5115/10000 [11:20:14<9:16:27, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:51,099 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5116/10000 [11:20:22<9:41:35, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:42:57,671 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5117/10000 [11:20:28<9:30:44, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:04,692 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5118/10000 [11:20:35<9:31:08, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:11,565 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5119/10000 [11:20:42<9:26:37, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:18,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5120/10000 [11:20:49<9:14:12, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:24,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5121/10000 [11:20:55<9:04:50, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:33,649 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████���███████████████████▋ | 5122/10000 [11:21:04<10:05:31, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:40,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5123/10000 [11:21:11<9:49:26, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:46,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5124/10000 [11:21:18<9:28:27, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:52,944 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5125/10000 [11:21:24<9:06:58, 6.73s/it] 51%|██████████████████████████████▏ | 5125/10000 [11:21:24<9:06:58, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:43:59,026 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5126/10000 [11:21:30<8:50:05, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:05,100 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▏ | 5127/10000 [11:21:36<8:37:55, 6.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:11,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5128/10000 [11:21:42<8:28:47, 6.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:19,072 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5129/10000 [11:21:50<9:10:10, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:26,351 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5130/10000 [11:21:57<9:22:02, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:33,405 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5131/10000 [11:22:04<9:27:11, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:39,705 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5132/10000 [11:22:10<9:09:43, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:46,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5133/10000 [11:22:18<9:21:12, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:44:53,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5134/10000 [11:22:24<9:15:30, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:00,251 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5135/10000 [11:22:31<9:06:56, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:07,024 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5136/10000 [11:22:38<9:06:53, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:13,515 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5137/10000 [11:22:44<9:00:23, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:19,497 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5138/10000 [11:22:50<8:44:00, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:27,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5139/10000 [11:22:58<9:24:52, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:33,713 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5140/10000 [11:23:04<9:06:02, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:40,330 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5141/10000 [11:23:11<9:03:30, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:47,740 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5142/10000 [11:23:18<9:19:18, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:45:54,005 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5143/10000 [11:23:25<9:02:47, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:00,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5144/10000 [11:23:31<8:47:36, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:07,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5145/10000 [11:23:38<9:08:52, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:16,176 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5146/10000 [11:23:47<9:54:43, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:23,209 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|██████████████████████████████▎ | 5147/10000 [11:23:54<9:48:59, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:34,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5148/10000 [11:24:05<11:12:24, 8.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:40,843 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 51%|█████████████████████████████▊ | 5149/10000 [11:24:12<10:37:46, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:47,684 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▊ | 5150/10000 [11:24:18<10:11:30, 7.57s/it] 52%|█████████████████████████████▊ | 5150/10000 [11:24:18<10:11:30, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:46:54,607 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5151/10000 [11:24:25<9:56:50, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:01,598 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5152/10000 [11:24:32<9:44:20, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:09,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5153/10000 [11:24:40<9:50:35, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:16,075 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5154/10000 [11:24:47<9:40:24, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:22,833 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5155/10000 [11:24:53<9:33:42, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:29,692 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5156/10000 [11:25:00<9:28:17, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:38,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5157/10000 [11:25:09<10:01:02, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:46,558 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5158/10000 [11:25:17<10:25:22, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:47:53,697 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5159/10000 [11:25:24<10:07:49, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:01,047 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5160/10000 [11:25:32<10:04:17, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:08,191 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5161/10000 [11:25:39<9:54:53, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:21,091 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5162/10000 [11:25:52<12:11:38, 9.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:28,122 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5163/10000 [11:25:59<11:21:48, 8.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:35,400 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5164/10000 [11:26:06<10:52:28, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:42,284 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5165/10000 [11:26:13<10:22:57, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:49,725 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5166/10000 [11:26:20<10:16:52, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:48:57,392 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5167/10000 [11:26:28<10:17:37, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:06,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5168/10000 [11:26:37<10:40:14, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:13,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5169/10000 [11:26:44<10:23:12, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:21,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5170/10000 [11:26:52<10:27:39, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:29,646 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5171/10000 [11:27:00<10:43:53, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:37,595 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|█████████████████████████████▉ | 5172/10000 [11:27:08<10:41:44, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:44,911 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5173/10000 [11:27:16<10:25:45, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:49:52,982 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5174/10000 [11:27:24<10:32:11, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:00,200 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5175/10000 [11:27:31<10:16:31, 7.67s/it] 52%|██████████████████████████████ | 5175/10000 [11:27:31<10:16:31, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:08,550 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5176/10000 [11:27:39<10:29:54, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:16,269 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5177/10000 [11:27:47<10:28:44, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:23,915 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5178/10000 [11:27:55<10:26:44, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:30,919 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5179/10000 [11:28:01<10:03:43, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:37,782 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5180/10000 [11:28:08<9:49:05, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:44,810 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5181/10000 [11:28:15<9:42:58, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:52,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5182/10000 [11:28:23<9:42:13, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:50:58,892 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5183/10000 [11:28:29<9:29:43, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:51:05,769 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5184/10000 [11:28:36<9:25:20, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:51:12,664 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▌ | 5185/10000 [11:28:43<9:19:51, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:51:42,745 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5186/10000 [11:29:13<18:39:54, 13.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:51:50,407 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5187/10000 [11:29:21<16:05:53, 12.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:51:59,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5188/10000 [11:29:30<14:47:48, 11.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:06,788 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5189/10000 [11:29:37<13:23:13, 10.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:13,919 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5190/10000 [11:29:45<12:16:06, 9.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:20,824 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5191/10000 [11:29:51<11:19:49, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:27,685 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5192/10000 [11:29:58<10:42:40, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:35,355 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████ | 5193/10000 [11:30:06<10:31:24, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:42,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5194/10000 [11:30:13<10:01:50, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:48,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5195/10000 [11:30:20<9:49:15, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:52:56,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5196/10000 [11:30:27<9:49:09, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:03,238 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5197/10000 [11:30:34<9:36:12, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:10,011 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5198/10000 [11:30:41<9:27:25, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:18,409 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5199/10000 [11:30:49<9:56:46, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:26,478 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5200/10000 [11:30:57<10:13:32, 7.67s/it] 52%|██████████████████████████████▏ | 5200/10000 [11:30:57<10:13:32, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:34,095 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5201/10000 [11:31:05<10:13:45, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:41,505 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5202/10000 [11:31:12<10:06:41, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:49,038 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5203/10000 [11:31:20<10:03:29, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:53:56,825 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5204/10000 [11:31:27<10:10:55, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:03,842 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5205/10000 [11:31:34<9:55:22, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:10,889 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▋ | 5206/10000 [11:31:42<9:46:01, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:19,203 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5207/10000 [11:31:50<10:09:27, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:28,123 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5208/10000 [11:31:59<10:39:36, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:35,428 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5209/10000 [11:32:06<10:21:27, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:46,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5210/10000 [11:32:17<11:48:01, 8.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:54:53,543 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5211/10000 [11:32:24<10:55:13, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:00,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5212/10000 [11:32:31<10:21:07, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:09,684 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|███████████████���██████████████▏ | 5213/10000 [11:32:40<10:59:23, 8.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:16,623 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▏ | 5214/10000 [11:32:47<10:27:54, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:23,312 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5215/10000 [11:32:54<9:58:47, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:30,109 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5216/10000 [11:33:01<9:39:50, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:37,611 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5217/10000 [11:33:08<9:49:07, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:44,518 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5218/10000 [11:33:15<9:34:52, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:51,549 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5219/10000 [11:33:22<9:31:54, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:55:58,459 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5220/10000 [11:33:29<9:24:32, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:05,411 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5221/10000 [11:33:36<9:22:42, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:12,357 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5222/10000 [11:33:43<9:20:45, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:19,397 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5223/10000 [11:33:50<9:17:02, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:27,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5224/10000 [11:33:58<9:40:21, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:34,324 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5225/10000 [11:34:05<9:34:51, 7.22s/it] 52%|██████████████████████████████▊ | 5225/10000 [11:34:05<9:34:51, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:41,305 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▊ | 5226/10000 [11:34:12<9:27:23, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:49,991 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5227/10000 [11:34:21<10:04:52, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:56:57,972 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5228/10000 [11:34:29<10:14:04, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:05,227 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5229/10000 [11:34:36<10:04:23, 7.60s/it]{'eval_loss': 0.2875998318195343, 'eval_wer': 21.720693554466642, 'eval_runtime': 287.6501, 'eval_samples_per_second': 5.924, 'eval_steps_per_second': 0.188, 'epoch': 0.5} +{'loss': 0.1436, 'learning_rate': 1.571684210526316e-06, 'epoch': 0.5} +{'loss': 0.1448, 'learning_rate': 1.5637894736842106e-06, 'epoch': 0.51} +{'loss': 0.1459, 'learning_rate': 1.5558947368421053e-06, 'epoch': 0.51} +{'loss': 0.1404, 'learning_rate': 1.548e-06, 'epoch': 0.51} +{'loss': 0.1306, 'learning_rate': 1.5401052631578948e-06, 'epoch': 0.51} +{'loss': 0.1462, 'learning_rate': 1.5322105263157895e-06, 'epoch': 0.52} +{'loss': 0.1263, 'learning_rate': 1.5243157894736843e-06, 'epoch': 0.52} +{'loss': 0.1336, 'learning_rate': 1.516421052631579e-06, 'epoch': 0.52} +{'loss': 0.1251, 'learning_rate': 1.5085263157894736e-06, 'epoch': 0.52} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.30it/s] Reading metadata...: 2165it [00:00, 15278.57it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 20:57:14,668 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5230/10000 [11:34:45<10:47:40, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:22,492 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5231/10000 [11:34:53<10:36:55, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:30,799 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5232/10000 [11:35:01<10:45:15, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:38,549 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5233/10000 [11:35:09<10:33:57, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:47,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5234/10000 [11:35:18<10:53:31, 8.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:57:54,843 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5235/10000 [11:35:25<10:37:03, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:02,229 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5236/10000 [11:35:33<10:23:08, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:09,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▎ | 5237/10000 [11:35:40<10:15:21, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:18,143 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5238/10000 [11:35:49<10:31:38, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:25,921 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5239/10000 [11:35:56<10:24:14, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:33,365 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5240/10000 [11:36:04<10:16:07, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:40,793 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5241/10000 [11:36:11<10:09:05, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:48,499 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5242/10000 [11:36:19<10:09:18, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:58:56,965 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5243/10000 [11:36:28<10:27:34, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:04,676 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5244/10000 [11:36:35<10:21:15, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:12,573 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5245/10000 [11:36:43<10:23:57, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:19,843 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5246/10000 [11:36:50<10:08:52, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:31,351 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5247/10000 [11:37:02<11:39:15, 8.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:39,475 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5248/10000 [11:37:10<11:21:29, 8.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:47,225 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5249/10000 [11:37:18<11:03:24, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 20:59:54,785 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 52%|██████████████████████████████▍ | 5250/10000 [11:37:26<10:44:21, 8.14s/it] 52%|██████████████████████████████▍ | 5250/10000 [11:37:26<10:44:21, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:02,474 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▍ | 5251/10000 [11:37:33<10:32:51, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:09,225 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5252/10000 [11:37:40<9:59:44, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:16,210 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5253/10000 [11:37:47<9:47:04, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:23,358 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5254/10000 [11:37:54<9:40:15, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:30,656 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5255/10000 [11:38:01<9:38:28, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:38,178 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5256/10000 [11:38:09<9:46:43, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:46,633 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▍ | 5257/10000 [11:38:17<10:10:16, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:52,918 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5258/10000 [11:38:24<9:36:10, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:00:59,436 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5259/10000 [11:38:30<9:17:33, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:11,835 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▌ | 5260/10000 [11:38:42<11:23:50, 8.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:17,948 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▌ | 5261/10000 [11:38:49<10:23:58, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:24,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5262/10000 [11:38:55<9:40:27, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:30,090 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5263/10000 [11:39:01<9:11:06, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:36,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5264/10000 [11:39:07<8:53:23, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:42,585 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5265/10000 [11:39:13<8:37:14, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:48,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5266/10000 [11:39:19<8:25:07, 6.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:01:56,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5267/10000 [11:39:27<8:50:06, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:02,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5268/10000 [11:39:33<8:52:02, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:09,160 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5269/10000 [11:39:40<8:43:05, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:15,857 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5270/10000 [11:39:46<8:42:52, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:22,854 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5271/10000 [11:39:54<8:53:29, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:29,958 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5272/10000 [11:40:01<9:00:42, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:36,379 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5273/10000 [11:40:07<8:48:50, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:43,067 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5274/10000 [11:40:14<8:47:57, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:02:49,848 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5275/10000 [11:40:21<8:51:39, 6.75s/it] 53%|███████████████████████████████ | 5275/10000 [11:40:21<8:51:39, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:01,779 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▌ | 5276/10000 [11:40:32<10:51:25, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:07,730 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5277/10000 [11:40:38<9:56:54, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:13,945 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5278/10000 [11:40:44<9:23:12, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:20,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5279/10000 [11:40:51<8:58:52, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:26,248 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5280/10000 [11:40:57<8:44:37, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:32,839 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5281/10000 [11:41:04<8:43:39, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:39,231 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5282/10000 [11:41:10<8:37:31, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:46,012 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5283/10000 [11:41:17<8:42:23, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:53,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5284/10000 [11:41:24<8:55:48, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:03:59,756 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5285/10000 [11:41:30<8:45:29, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:05,812 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5286/10000 [11:41:36<8:33:40, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:14,958 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5287/10000 [11:41:46<9:31:58, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:24,505 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▋ | 5288/10000 [11:41:55<10:26:56, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:31,593 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▋ | 5289/10000 [11:42:02<10:06:08, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:37,947 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5290/10000 [11:42:09<9:34:26, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:44,408 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5291/10000 [11:42:15<9:11:23, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:50,423 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5292/10000 [11:42:21<8:50:32, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:04:56,685 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5293/10000 [11:42:27<8:37:23, 6.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:02,777 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5294/10000 [11:42:33<8:25:07, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:09,212 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5295/10000 [11:42:40<8:26:42, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:15,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▏ | 5296/10000 [11:42:46<8:17:06, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:21,412 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5297/10000 [11:42:52<8:08:57, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:27,397 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5298/10000 [11:42:58<8:02:40, 6.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:33,997 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5299/10000 [11:43:05<8:14:10, 6.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:40,649 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5300/10000 [11:43:11<8:24:35, 6.44s/it] 53%|███████████████████████████████▎ | 5300/10000 [11:43:11<8:24:35, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:05:48,885 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5301/10000 [11:43:19<9:04:23, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:00,252 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5302/10000 [11:43:31<10:48:40, 8.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:08,033 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5303/10000 [11:43:39<10:35:20, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:15,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5304/10000 [11:43:46<10:17:48, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:22,873 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5305/10000 [11:43:54<10:09:51, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:32,088 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5306/10000 [11:44:03<10:43:28, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:39,719 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5307/10000 [11:44:10<10:28:17, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:47,160 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5308/10000 [11:44:18<10:15:52, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:06:54,554 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5309/10000 [11:44:25<10:03:04, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:02,308 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5310/10000 [11:44:33<10:05:07, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:10,032 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5311/10000 [11:44:41<10:03:42, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:17,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5312/10000 [11:44:49<10:09:28, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:25,721 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5313/10000 [11:44:56<10:07:33, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:33,738 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5314/10000 [11:45:04<10:12:22, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:41,454 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5315/10000 [11:45:12<10:07:16, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:07:48,410 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▎ | 5316/10000 [11:45:19<9:50:58, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:00,842 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5317/10000 [11:45:31<11:42:55, 9.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:08,099 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5318/10000 [11:45:39<10:59:11, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:14,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5319/10000 [11:45:46<10:27:03, 8.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:22,805 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5320/10000 [11:45:53<10:21:29, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:30,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5321/10000 [11:46:01<10:10:27, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:38,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5322/10000 [11:46:09<10:15:48, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:46,589 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▊ | 5323/10000 [11:46:17<10:23:24, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:08:54,478 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5324/10000 [11:46:25<10:22:08, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:01,680 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5325/10000 [11:46:32<10:02:28, 7.73s/it] 53%|██████████████████████████████▉ | 5325/10000 [11:46:32<10:02:28, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:11,861 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5326/10000 [11:46:43<10:59:16, 8.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:19,401 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5327/10000 [11:46:50<10:36:48, 8.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:26,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5328/10000 [11:46:57<10:15:49, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:33,829 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5329/10000 [11:47:05<9:58:59, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:41,454 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5330/10000 [11:47:12<9:55:01, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:48,792 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5331/10000 [11:47:19<9:49:02, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:09:56,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5332/10000 [11:47:27<9:42:58, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:03,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5333/10000 [11:47:34<9:44:39, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:10,999 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5334/10000 [11:47:42<9:37:37, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:17,767 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5335/10000 [11:47:48<9:22:25, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:25,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5336/10000 [11:47:56<9:28:17, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:33,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▍ | 5337/10000 [11:48:04<9:57:23, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:41,937 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|██████████████████████████████▉ | 5338/10000 [11:48:13<10:08:21, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:48,938 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5339/10000 [11:48:20<9:49:05, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:10:55,880 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5340/10000 [11:48:26<9:32:42, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:02,724 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5341/10000 [11:48:33<9:22:56, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:10,980 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5342/10000 [11:48:42<9:46:25, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:17,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5343/10000 [11:48:49<9:32:17, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:25,152 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5344/10000 [11:48:56<9:25:16, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:32,509 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5345/10000 [11:49:03<9:29:50, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:41,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████ | 5346/10000 [11:49:12<10:00:36, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:48,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5347/10000 [11:49:19<9:56:07, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:11:56,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5348/10000 [11:49:27<9:49:45, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:03,417 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 53%|███████████████████████████████▌ | 5349/10000 [11:49:34<9:40:02, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:11,543 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5350/10000 [11:49:42<9:55:23, 7.68s/it] 54%|███████████████████████████████▌ | 5350/10000 [11:49:42<9:55:23, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:18,851 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5351/10000 [11:49:49<9:44:26, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:25,669 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5352/10000 [11:49:56<9:29:30, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:33,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5353/10000 [11:50:04<9:31:12, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:40,910 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5354/10000 [11:50:11<9:38:41, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:47,576 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5355/10000 [11:50:18<9:23:39, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:12:55,188 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5356/10000 [11:50:26<9:27:33, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:03,102 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5357/10000 [11:50:34<9:44:36, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:10,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5358/10000 [11:50:41<9:41:43, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:18,363 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5359/10000 [11:50:49<9:45:59, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:26,030 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5360/10000 [11:50:57<9:50:16, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:34,487 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5361/10000 [11:51:05<10:08:14, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:42,755 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5362/10000 [11:51:13<10:16:09, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:13:52,144 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5363/10000 [11:51:23<10:49:29, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:00,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5364/10000 [11:51:31<10:45:11, 8.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:08,016 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5365/10000 [11:51:39<10:30:52, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:17,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████ | 5366/10000 [11:51:48<10:48:36, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:24,404 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▏ | 5367/10000 [11:51:55<10:24:34, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:31,265 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5368/10000 [11:52:02<9:58:00, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:38,028 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5369/10000 [11:52:09<9:32:03, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:45,240 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5370/10000 [11:52:16<9:28:04, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:52,711 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5371/10000 [11:52:23<9:29:47, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:14:59,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5372/10000 [11:52:30<9:23:00, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:06,726 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5373/10000 [11:52:37<9:16:46, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:13,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5374/10000 [11:52:44<9:10:22, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:20,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5375/10000 [11:52:51<9:10:20, 7.14s/it] 54%|███████████████████████████████▋ | 5375/10000 [11:52:51<9:10:20, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:28,589 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5376/10000 [11:52:59<9:22:09, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:35,616 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5377/10000 [11:53:06<9:18:38, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:43,348 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▋ | 5378/10000 [11:53:14<9:29:30, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:15:55,204 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▏ | 5379/10000 [11:53:26<11:10:39, 8.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:02,872 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▏ | 5380/10000 [11:53:34<10:47:54, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:09,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▏ | 5381/10000 [11:53:40<10:03:47, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:15,933 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5382/10000 [11:53:46<9:31:16, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:22,473 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5383/10000 [11:53:53<9:13:44, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:29,115 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5384/10000 [11:54:00<9:01:12, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:35,384 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5385/10000 [11:54:06<8:42:19, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:43,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5386/10000 [11:54:15<9:21:19, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:50,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5387/10000 [11:54:21<8:56:07, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:16:56,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5388/10000 [11:54:27<8:33:46, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:02,707 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5389/10000 [11:54:33<8:33:06, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:09,094 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5390/10000 [11:54:40<8:25:34, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:16,081 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5391/10000 [11:54:47<8:35:26, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:23,056 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5392/10000 [11:54:54<8:40:32, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:29,618 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5393/10000 [11:55:00<8:36:12, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:36,617 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5394/10000 [11:55:07<8:42:53, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:43,526 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5395/10000 [11:55:14<8:45:26, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:50,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5396/10000 [11:55:21<8:42:37, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:17:57,136 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5397/10000 [11:55:28<8:41:26, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:04,127 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5398/10000 [11:55:35<8:44:06, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:10,801 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5399/10000 [11:55:41<8:42:07, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:18,169 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5400/10000 [11:55:49<8:56:45, 7.00s/it] 54%|███████████████████████████████▊ | 5400/10000 [11:55:49<8:56:45, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:25,275 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5401/10000 [11:55:56<8:57:40, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:32,177 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▊ | 5402/10000 [11:56:03<8:57:06, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:38,999 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5403/10000 [11:56:10<8:51:54, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:45,920 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5404/10000 [11:56:17<8:49:16, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:18:54,522 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5405/10000 [11:56:25<9:30:15, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:01,497 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5406/10000 [11:56:32<9:19:13, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:08,502 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5407/10000 [11:56:39<9:10:32, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:15,031 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5408/10000 [11:56:46<8:54:38, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:21,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5409/10000 [11:56:53<8:54:49, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:30,666 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5410/10000 [11:57:01<9:33:45, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:37,250 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5411/10000 [11:57:08<9:11:38, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:43,659 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5412/10000 [11:57:14<8:53:39, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:50,097 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5413/10000 [11:57:21<8:40:29, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:19:57,371 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5414/10000 [11:57:28<8:53:04, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:04,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5415/10000 [11:57:35<8:54:11, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:10,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5416/10000 [11:57:41<8:40:30, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:17,062 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5417/10000 [11:57:48<8:26:27, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:24,764 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5418/10000 [11:57:55<8:50:25, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:32,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5419/10000 [11:58:03<9:09:32, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:40,461 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5420/10000 [11:58:11<9:24:27, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:47,670 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5421/10000 [11:58:18<9:23:14, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:20:55,542 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5422/10000 [11:58:26<9:34:33, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:02,934 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▉ | 5423/10000 [11:58:33<9:28:22, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:10,407 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5424/10000 [11:58:41<9:29:13, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:25,647 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▍ | 5425/10000 [11:58:56<12:29:32, 9.83s/it] 54%|███████████████████████████████▍ | 5425/10000 [11:58:56<12:29:32, 9.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:33,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▍ | 5426/10000 [11:59:04<11:42:57, 9.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:40,449 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▍ | 5427/10000 [11:59:11<10:48:18, 8.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:47,371 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▍ | 5428/10000 [11:59:18<10:15:43, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:21:54,351 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5429/10000 [11:59:25<9:46:55, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:01,112 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5430/10000 [11:59:32<9:26:13, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:08,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5431/10000 [11:59:39<9:14:42, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:15,067 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5432/10000 [11:59:46<9:11:08, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:21,949 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5433/10000 [11:59:53<9:02:28, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:29,666 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5434/10000 [12:00:00<9:12:23, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:36,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5435/10000 [12:00:07<9:01:19, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:43,259 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5436/10000 [12:00:14<8:54:59, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:22:56,178 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5437/10000 [12:00:27<11:08:54, 8.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:03,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5438/10000 [12:00:34<10:40:42, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:10,710 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5439/10000 [12:00:41<10:09:53, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:17,621 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████ | 5440/10000 [12:00:48<9:42:09, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:40,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5441/10000 [12:01:11<15:27:07, 12.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:47,202 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5442/10000 [12:01:18<13:25:46, 10.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:23:54,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5443/10000 [12:01:25<12:11:52, 9.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:01,686 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5444/10000 [12:01:32<11:13:46, 8.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:10,952 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5445/10000 [12:01:42<11:22:41, 8.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:17,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5446/10000 [12:01:49<10:36:38, 8.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:24,970 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|███████████████████████████████▌ | 5447/10000 [12:01:56<10:04:27, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:31,889 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████▏ | 5448/10000 [12:02:02<9:40:51, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:38,965 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 54%|████████████████████████████████▏ | 5449/10000 [12:02:10<9:27:47, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:46,607 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5450/10000 [12:02:17<9:32:02, 7.54s/it] 55%|████████████████████████████████▏ | 5450/10000 [12:02:17<9:32:02, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:24:53,868 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5451/10000 [12:02:25<9:25:18, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:01,466 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████���███████████████████████████▏ | 5452/10000 [12:02:32<9:28:08, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:09,251 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5453/10000 [12:02:40<9:32:43, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:16,633 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5454/10000 [12:02:47<9:30:16, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:24,409 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5455/10000 [12:02:55<9:37:21, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:31,933 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5456/10000 [12:03:03<9:34:31, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:39,236 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5457/10000 [12:03:10<9:28:04, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:46,834 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5458/10000 [12:03:18<9:30:27, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:25:54,467 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5459/10000 [12:03:25<9:31:27, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:03,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5460/10000 [12:03:34<9:57:23, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:11,083 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5461/10000 [12:03:42<9:58:33, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:18,585 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5462/10000 [12:03:49<9:47:16, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:25,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5463/10000 [12:03:56<9:25:32, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:33,751 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5464/10000 [12:04:04<9:47:11, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:43,119 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████████▋ | 5465/10000 [12:04:14<10:21:55, 8.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:50,509 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████████▋ | 5466/10000 [12:04:21<10:03:28, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:26:58,177 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5467/10000 [12:04:29<9:57:27, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:05,700 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5468/10000 [12:04:36<9:49:13, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:12,725 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5469/10000 [12:04:43<9:28:36, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:21,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5470/10000 [12:04:52<9:55:19, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:28,363 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5471/10000 [12:04:59<9:34:15, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:36,144 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5472/10000 [12:05:07<9:39:38, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:43,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5473/10000 [12:05:14<9:31:34, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:50,495 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5474/10000 [12:05:21<9:16:38, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:27:57,537 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5475/10000 [12:05:28<9:10:38, 7.30s/it] 55%|████████████████████████████████▎ | 5475/10000 [12:05:28<9:10:38, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:05,347 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5476/10000 [12:05:36<9:20:22, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:12,964 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5477/10000 [12:05:44<9:24:21, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:19,984 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5478/10000 [12:05:51<9:15:04, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:33,595 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████████▊ | 5479/10000 [12:06:04<11:34:45, 9.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:40,307 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████████▊ | 5480/10000 [12:06:11<10:39:06, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:47,121 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████████▊ | 5481/10000 [12:06:18<10:01:12, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:28:53,904 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|███████████████████████████���████▎ | 5482/10000 [12:06:25<9:33:32, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:00,067 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5483/10000 [12:06:31<8:59:03, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:07,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5484/10000 [12:06:38<9:13:33, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:13,945 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5485/10000 [12:06:45<8:44:30, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:19,982 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5486/10000 [12:06:51<8:22:30, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:26,016 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▎ | 5487/10000 [12:06:57<8:09:24, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:32,288 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5488/10000 [12:07:03<8:05:30, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:39,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5489/10000 [12:07:10<8:10:28, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:45,650 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5490/10000 [12:07:16<8:12:49, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:52,334 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5491/10000 [12:07:23<8:13:45, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:29:58,592 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5492/10000 [12:07:29<8:07:48, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:05,054 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5493/10000 [12:07:36<8:08:08, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:11,890 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5494/10000 [12:07:42<8:14:19, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:19,124 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5495/10000 [12:07:50<8:29:43, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:25,847 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5496/10000 [12:07:56<8:25:38, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:32,282 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5497/10000 [12:08:03<8:21:53, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:38,877 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5498/10000 [12:08:10<8:20:01, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:46,186 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5499/10000 [12:08:17<8:34:08, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:53,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5500/10000 [12:08:24<8:31:26, 6.82s/it] 55%|████████████████████████████████▍ | 5500/10000 [12:08:24<8:31:26, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:30:59,604 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5501/10000 [12:08:30<8:25:20, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:06,149 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5502/10000 [12:08:37<8:21:54, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:12,540 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5503/10000 [12:08:43<8:16:01, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:20,833 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5504/10000 [12:08:52<8:54:37, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:27,389 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5505/10000 [12:08:58<8:39:43, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:35,274 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5506/10000 [12:09:06<9:02:07, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:41,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5507/10000 [12:09:13<8:46:52, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:48,886 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▍ | 5508/10000 [12:09:20<8:47:22, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:31:55,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5509/10000 [12:09:26<8:23:48, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:01,533 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5510/10000 [12:09:32<8:18:52, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:08,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5511/10000 [12:09:39<8:18:40, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:17,413 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5512/10000 [12:09:48<9:15:30, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:24,561 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5513/10000 [12:09:55<9:11:03, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:31,117 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5514/10000 [12:10:02<8:52:12, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:37,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5515/10000 [12:10:08<8:37:50, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:43,868 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5516/10000 [12:10:14<8:23:34, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:52,610 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5517/10000 [12:10:23<9:07:28, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:32:58,919 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5518/10000 [12:10:30<8:46:06, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:05,487 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5519/10000 [12:10:36<8:34:02, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:12,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5520/10000 [12:10:43<8:27:05, 6.79s/it]{'loss': 0.1166, 'learning_rate': 1.5006315789473683e-06, 'epoch': 0.53} +{'loss': 0.1494, 'learning_rate': 1.4927368421052633e-06, 'epoch': 0.53} +{'loss': 0.1336, 'learning_rate': 1.484842105263158e-06, 'epoch': 0.53} +{'loss': 0.1132, 'learning_rate': 1.4769473684210528e-06, 'epoch': 0.53} +{'loss': 0.1091, 'learning_rate': 1.4690526315789473e-06, 'epoch': 0.54} +{'loss': 0.1154, 'learning_rate': 1.461157894736842e-06, 'epoch': 0.54} +{'loss': 0.1305, 'learning_rate': 1.453263157894737e-06, 'epoch': 0.54} +{'loss': 0.2006, 'learning_rate': 1.4453684210526317e-06, 'epoch': 0.54} +{'loss': 0.1334, 'learning_rate': 1.4374736842105265e-06, 'epoch': 0.55} +{'loss': 0.1215, 'learning_rate': 1.429578947368421e-06, 'epoch': 0.55} +{'loss': 0.1304, 'learning_rate': 1.4216842105263158e-06, 'epoch': 0.55} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.45it/s] Reading metadata...: 2165it [00:00, 15214.42it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 21:33:20,021 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5521/10000 [12:10:51<8:55:01, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:26,076 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5522/10000 [12:10:57<8:27:42, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:33,121 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5523/10000 [12:11:04<8:34:02, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:42,627 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5524/10000 [12:11:13<9:31:45, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:49,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5525/10000 [12:11:20<9:14:20, 7.43s/it] 55%|████████████████████████████████▌ | 5525/10000 [12:11:20<9:14:20, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:33:56,432 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5526/10000 [12:11:27<9:04:27, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:03,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5527/10000 [12:11:34<8:58:39, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:10,749 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5528/10000 [12:11:41<8:59:03, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:17,701 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▌ | 5529/10000 [12:11:48<8:51:26, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:29,764 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████ | 5530/10000 [12:12:00<10:43:01, 8.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:37,246 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████ | 5531/10000 [12:12:08<10:17:11, 8.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:44,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5532/10000 [12:12:15<9:55:51, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:52,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5533/10000 [12:12:23<9:47:13, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:34:59,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5534/10000 [12:12:30<9:34:09, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:07,438 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5535/10000 [12:12:38<9:38:50, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:15,841 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5536/10000 [12:12:46<9:50:22, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:22,969 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5537/10000 [12:12:54<9:35:23, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:30,295 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5538/10000 [12:13:01<9:25:01, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:39,230 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5539/10000 [12:13:10<9:54:56, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:46,820 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5540/10000 [12:13:18<9:45:57, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:35:54,114 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5541/10000 [12:13:25<9:32:56, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:02,106 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▋ | 5542/10000 [12:13:33<9:38:57, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:14,837 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5543/10000 [12:13:45<11:27:54, 9.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:22,237 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5544/10000 [12:13:53<10:46:19, 8.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:30,204 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5545/10000 [12:14:01<10:30:52, 8.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:37,771 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5546/10000 [12:14:08<10:09:33, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:49,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5547/10000 [12:14:20<11:27:00, 9.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:36:56,945 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5548/10000 [12:14:28<10:45:56, 8.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:04,498 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 55%|████████████████████████████████▏ | 5549/10000 [12:14:35<10:19:58, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:12,565 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▏ | 5550/10000 [12:14:43<10:14:35, 8.29s/it] 56%|████████████████████████████████▏ | 5550/10000 [12:14:43<10:14:35, 8.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:20,191 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5551/10000 [12:14:51<9:57:57, 8.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:27,683 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5552/10000 [12:14:58<9:47:04, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:35,190 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5553/10000 [12:15:06<9:35:17, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:42,355 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5554/10000 [12:15:13<9:24:54, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:49,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5555/10000 [12:15:20<9:09:53, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:37:57,045 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5556/10000 [12:15:28<9:14:47, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:04,699 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5557/10000 [12:15:35<9:17:58, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:12,598 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5558/10000 [12:15:43<9:27:52, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:20,870 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5559/10000 [12:15:52<9:40:45, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:28,128 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5560/10000 [12:15:59<9:26:27, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:35,078 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5561/10000 [12:16:06<9:10:47, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:42,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5562/10000 [12:16:13<9:16:50, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:51,533 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5563/10000 [12:16:22<9:43:28, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:38:58,708 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5564/10000 [12:16:29<9:28:57, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:05,969 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5565/10000 [12:16:37<9:18:48, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:13,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5566/10000 [12:16:44<9:06:18, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:20,166 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5567/10000 [12:16:51<9:01:50, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:27,395 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5568/10000 [12:16:58<8:57:40, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:34,347 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5569/10000 [12:17:05<8:50:08, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:41,339 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5570/10000 [12:17:12<8:48:20, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:48,955 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5571/10000 [12:17:20<8:57:50, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:39:56,992 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▊ | 5572/10000 [12:17:28<9:13:00, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:04,526 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5573/10000 [12:17:35<9:13:47, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:12,775 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5574/10000 [12:17:43<9:31:14, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:20,279 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5575/10000 [12:17:51<9:22:46, 7.63s/it] 56%|████████████████████████████████▉ | 5575/10000 [12:17:51<9:22:46, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:27,605 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5576/10000 [12:17:58<9:18:00, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:34,779 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5577/10000 [12:18:05<9:08:43, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:42,538 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5578/10000 [12:18:13<9:14:46, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:49,658 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5579/10000 [12:18:20<9:07:28, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:40:57,641 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5580/10000 [12:18:28<9:20:06, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:41:10,148 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▎ | 5581/10000 [12:18:41<11:08:52, 9.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:41:17,232 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5582/10000 [12:18:48<10:24:12, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:41:25,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5583/10000 [12:18:56<10:16:42, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:41:32,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5584/10000 [12:19:03<9:48:24, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:41:38,953 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▉ | 5585/10000 [12:19:10<9:14:31, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:42:13,956 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5586/10000 [12:19:45<19:20:51, 15.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:42:20,671 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5587/10000 [12:19:51<15:58:27, 13.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:42:27,000 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5588/10000 [12:19:57<13:28:24, 10.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:42:33,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5589/10000 [12:20:05<12:00:14, 9.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:42:55,372 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5590/10000 [12:20:26<16:17:13, 13.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:01,494 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5591/10000 [12:20:32<13:38:00, 11.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:08,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5592/10000 [12:20:39<12:01:36, 9.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:14,641 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████████████████████████████▍ | 5593/10000 [12:20:45<10:48:17, 8.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:21,337 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5594/10000 [12:20:52<9:59:29, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:28,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5595/10000 [12:20:59<9:27:07, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:34,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5596/10000 [12:21:05<9:06:33, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:41,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5597/10000 [12:21:12<8:47:39, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:48,068 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5598/10000 [12:21:19<8:35:17, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:43:55,372 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5599/10000 [12:21:26<8:41:31, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:02,000 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|████████��████████████████████████ | 5600/10000 [12:21:33<8:31:22, 6.97s/it] 56%|█████████████████████████████████ | 5600/10000 [12:21:33<8:31:22, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:08,966 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5601/10000 [12:21:40<8:30:28, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:15,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5602/10000 [12:21:46<8:22:22, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:23,910 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5603/10000 [12:21:55<8:55:38, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:31,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5604/10000 [12:22:02<8:53:33, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:38,913 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5605/10000 [12:22:10<9:03:48, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:45,369 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5606/10000 [12:22:16<8:44:00, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:51,507 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5607/10000 [12:22:22<8:21:52, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:44:57,662 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5608/10000 [12:22:28<8:05:47, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:03,822 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5609/10000 [12:22:34<7:53:07, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:10,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5610/10000 [12:22:41<7:48:36, 6.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:17,468 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5611/10000 [12:22:48<8:12:25, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:23,565 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5612/10000 [12:22:54<7:59:21, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:30,117 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5613/10000 [12:23:01<7:57:51, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:37,393 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████ | 5614/10000 [12:23:08<8:13:50, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:43,832 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5615/10000 [12:23:15<8:07:19, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:50,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5616/10000 [12:23:21<7:57:51, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:45:56,370 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5617/10000 [12:23:27<7:52:40, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:02,799 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5618/10000 [12:23:33<7:51:24, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:09,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5619/10000 [12:23:40<7:52:36, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:16,237 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5620/10000 [12:23:47<8:02:39, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:22,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5621/10000 [12:23:54<8:04:58, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:32,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5622/10000 [12:24:03<9:04:21, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:39,264 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5623/10000 [12:24:10<8:52:46, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:46,044 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5624/10000 [12:24:17<8:39:05, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:53,155 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5625/10000 [12:24:24<8:41:03, 7.15s/it] 56%|█████████████████████████████████▏ | 5625/10000 [12:24:24<8:41:03, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:46:59,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5626/10000 [12:24:30<8:27:00, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:05,922 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5627/10000 [12:24:37<8:09:28, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:11,891 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5628/10000 [12:24:43<7:53:37, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:18,295 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5629/10000 [12:24:49<7:52:30, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:25,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5630/10000 [12:24:56<7:56:35, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:31,509 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5631/10000 [12:25:02<7:54:57, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:37,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5632/10000 [12:25:08<7:48:43, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:43,822 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5633/10000 [12:25:14<7:39:42, 6.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:49,843 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5634/10000 [12:25:20<7:35:06, 6.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:47:55,921 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▏ | 5635/10000 [12:25:27<7:30:32, 6.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:02,441 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5636/10000 [12:25:33<7:38:23, 6.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:09,199 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5637/10000 [12:25:40<7:48:32, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:16,049 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5638/10000 [12:25:47<7:55:23, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:22,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5639/10000 [12:25:54<8:04:12, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:29,956 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5640/10000 [12:26:01<8:11:45, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:36,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5641/10000 [12:26:07<7:57:53, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:44,816 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5642/10000 [12:26:16<8:45:51, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:48:53,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5643/10000 [12:26:24<9:12:17, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:00,541 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5644/10000 [12:26:31<9:04:01, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:07,408 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5645/10000 [12:26:38<8:48:13, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:14,174 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5646/10000 [12:26:45<8:40:31, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:21,103 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5647/10000 [12:26:52<8:33:33, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:28,546 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5648/10000 [12:26:59<8:40:59, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:36,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5649/10000 [12:27:07<8:58:21, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:44,148 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 56%|█████████████████████████████████▎ | 5650/10000 [12:27:15<9:00:22, 7.45s/it] 56%|█████████████████████████████████▎ | 5650/10000 [12:27:15<9:00:22, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:49:52,661 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5651/10000 [12:27:23<9:24:15, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:00,065 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5652/10000 [12:27:31<9:15:45, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:07,421 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5653/10000 [12:27:38<9:10:51, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:15,081 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5654/10000 [12:27:46<9:11:41, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:22,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5655/10000 [12:27:54<9:16:05, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:30,927 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5656/10000 [12:28:01<9:20:41, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:39,226 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5657/10000 [12:28:10<9:32:12, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:47,701 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5658/10000 [12:28:18<9:47:56, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:50:55,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████�� | 5659/10000 [12:28:26<9:38:29, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:03,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5660/10000 [12:28:34<9:41:10, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:22,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5661/10000 [12:28:53<13:36:01, 11.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:35,890 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5662/10000 [12:29:06<14:20:51, 11.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:43,090 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5663/10000 [12:29:14<12:40:26, 10.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:50,652 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5664/10000 [12:29:21<11:34:30, 9.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:51:58,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5665/10000 [12:29:29<10:55:40, 9.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:05,690 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5666/10000 [12:29:36<10:17:34, 8.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:14,385 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5667/10000 [12:29:45<10:19:16, 8.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:22,121 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▊ | 5668/10000 [12:29:53<10:00:50, 8.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:29,370 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5669/10000 [12:30:00<9:37:13, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:36,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5670/10000 [12:30:08<9:28:26, 7.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:44,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5671/10000 [12:30:15<9:19:00, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:51,791 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5672/10000 [12:30:22<9:10:42, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:52:59,387 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5673/10000 [12:30:30<9:09:13, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:06,864 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5674/10000 [12:30:38<9:08:13, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:14,742 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5675/10000 [12:30:45<9:11:10, 7.65s/it] 57%|█████████████████████████████████▍ | 5675/10000 [12:30:45<9:11:10, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:21,921 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5676/10000 [12:30:53<9:03:08, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:29,539 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▍ | 5677/10000 [12:31:00<9:04:50, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:36,921 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5678/10000 [12:31:08<9:01:54, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:43,832 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5679/10000 [12:31:15<8:48:24, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:51,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5680/10000 [12:31:22<8:59:07, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:53:58,973 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5681/10000 [12:31:30<8:55:18, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:08,822 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5682/10000 [12:31:39<9:45:03, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:20,714 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▉ | 5683/10000 [12:31:51<11:05:46, 9.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:28,152 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▉ | 5684/10000 [12:31:59<10:28:35, 8.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:35,658 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|████████████████████████████████▉ | 5685/10000 [12:32:06<10:01:09, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:43,091 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5686/10000 [12:32:14<9:41:25, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:51,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5687/10000 [12:32:23<9:57:40, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:54:59,566 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5688/10000 [12:32:30<9:42:34, 8.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:07,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5689/10000 [12:32:38<9:29:19, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:14,305 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5690/10000 [12:32:45<9:15:02, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:21,748 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5691/10000 [12:32:52<9:08:13, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:29,065 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5692/10000 [12:33:00<9:01:53, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:36,622 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5693/10000 [12:33:07<9:03:12, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:44,068 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5694/10000 [12:33:15<8:59:28, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:51,056 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5695/10000 [12:33:22<8:47:30, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:55:58,081 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5696/10000 [12:33:29<8:40:32, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:05,784 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5697/10000 [12:33:36<8:49:36, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:14,023 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5698/10000 [12:33:45<9:05:55, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:21,471 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▌ | 5699/10000 [12:33:52<9:05:21, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:29,386 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5700/10000 [12:34:00<9:09:06, 7.66s/it] 57%|█████████████████████████████████▋ | 5700/10000 [12:34:00<9:09:06, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:41,055 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████ | 5701/10000 [12:34:12<10:37:58, 8.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:49,701 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████ | 5702/10000 [12:34:20<10:31:30, 8.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:56:57,406 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████ | 5703/10000 [12:34:28<10:06:33, 8.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:04,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5704/10000 [12:34:35<9:33:32, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:12,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5705/10000 [12:34:43<9:28:46, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:20,604 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5706/10000 [12:34:51<9:37:57, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:28,137 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5707/10000 [12:34:59<9:27:50, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:34,974 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5708/10000 [12:35:06<9:05:06, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:41,207 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5709/10000 [12:35:12<8:33:17, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:47,259 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5710/10000 [12:35:18<8:11:03, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:53,364 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5711/10000 [12:35:24<7:52:34, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:57:59,992 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5712/10000 [12:35:31<7:54:30, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:07,116 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5713/10000 [12:35:38<8:03:10, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:13,968 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5714/10000 [12:35:45<8:06:03, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:25,366 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5715/10000 [12:35:56<9:43:37, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:31,717 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5716/10000 [12:36:02<9:05:19, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:38,036 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5717/10000 [12:36:09<8:36:19, 7.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:44,204 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5718/10000 [12:36:15<8:14:55, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:50,396 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5719/10000 [12:36:21<7:58:46, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:58:57,854 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▋ | 5720/10000 [12:36:28<8:11:38, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:12,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▏ | 5721/10000 [12:36:43<10:49:09, 9.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:19,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▏ | 5722/10000 [12:36:50<10:20:10, 8.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:27,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5723/10000 [12:36:58<9:55:38, 8.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:34,102 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5724/10000 [12:37:05<9:22:30, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:41,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5725/10000 [12:37:12<9:09:33, 7.71s/it] 57%|█████████████████████████████████▊ | 5725/10000 [12:37:12<9:09:33, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:48,096 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5726/10000 [12:37:19<8:45:59, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 21:59:54,718 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5727/10000 [12:37:25<8:30:57, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:01,682 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5728/10000 [12:37:32<8:25:14, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:08,180 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5729/10000 [12:37:39<8:12:10, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:14,891 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5730/10000 [12:37:46<8:09:37, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:21,234 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5731/10000 [12:37:52<7:57:32, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:27,433 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5732/10000 [12:37:58<7:44:50, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:33,886 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5733/10000 [12:38:04<7:42:56, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:40,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5734/10000 [12:38:11<7:38:54, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:46,707 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5735/10000 [12:38:17<7:38:16, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:00:56,872 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5736/10000 [12:38:27<8:58:35, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:03,718 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5737/10000 [12:38:34<8:44:47, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:10,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5738/10000 [12:38:41<8:21:42, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:16,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5739/10000 [12:38:47<8:10:00, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:23,092 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5740/10000 [12:38:54<8:02:45, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:31,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▊ | 5741/10000 [12:39:02<8:32:00, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:37,690 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5742/10000 [12:39:08<8:13:54, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:44,430 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5743/10000 [12:39:15<8:09:25, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:01:56,864 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▎ | 5744/10000 [12:39:27<10:05:38, 8.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:03,082 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5745/10000 [12:39:34<9:17:45, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:09,201 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5746/10000 [12:39:40<8:40:34, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:15,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5747/10000 [12:39:46<8:15:14, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:21,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5748/10000 [12:39:52<7:54:37, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:28,036 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5749/10000 [12:39:59<7:53:46, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:34,689 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 57%|█████████████████████████████████▉ | 5750/10000 [12:40:05<7:51:35, 6.66s/it] 57%|█████████████████████████████████▉ | 5750/10000 [12:40:05<7:51:35, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:40,992 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5751/10000 [12:40:12<7:42:37, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:47,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5752/10000 [12:40:18<7:41:38, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:02:54,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5753/10000 [12:40:25<7:56:26, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:06,968 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5754/10000 [12:40:38<9:55:56, 8.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:13,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5755/10000 [12:40:44<9:18:27, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:19,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5756/10000 [12:40:51<8:43:28, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:26,936 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5757/10000 [12:40:57<8:34:06, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:33,867 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5758/10000 [12:41:05<8:28:44, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:40,635 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5759/10000 [12:41:11<8:19:48, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:47,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5760/10000 [12:41:18<8:12:02, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:03:55,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5761/10000 [12:41:26<8:36:32, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:02,070 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|█████████████████████████████████▉ | 5762/10000 [12:41:33<8:20:29, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:08,513 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5763/10000 [12:41:39<8:07:41, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:14,943 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5764/10000 [12:41:46<7:55:58, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:21,575 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5765/10000 [12:41:52<7:55:30, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:28,208 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5766/10000 [12:41:59<7:51:53, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:36,315 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5767/10000 [12:42:07<8:22:48, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:43,036 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5768/10000 [12:42:14<8:14:14, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:49,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5769/10000 [12:42:21<8:10:52, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:04:56,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5770/10000 [12:42:27<7:59:21, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:02,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5771/10000 [12:42:33<7:42:59, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:08,740 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5772/10000 [12:42:39<7:36:41, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:14,864 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5773/10000 [12:42:46<7:31:41, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:21,111 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5774/10000 [12:42:52<7:26:42, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:28,289 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5775/10000 [12:42:59<7:45:45, 6.61s/it] 58%|██████████████████████████████████ | 5775/10000 [12:42:59<7:45:45, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:34,587 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5776/10000 [12:43:05<7:38:26, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:40,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5777/10000 [12:43:12<7:33:20, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:47,258 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5778/10000 [12:43:18<7:33:43, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:05:53,476 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5779/10000 [12:43:24<7:27:31, 6.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:03,100 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5780/10000 [12:43:34<8:36:21, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:10,194 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5781/10000 [12:43:41<8:30:54, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:17,026 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5782/10000 [12:43:48<8:21:26, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:23,499 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████ | 5783/10000 [12:43:54<8:05:54, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:30,034 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5784/10000 [12:44:01<7:57:48, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:36,571 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5785/10000 [12:44:07<7:53:10, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:42,794 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5786/10000 [12:44:13<7:42:33, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:48,955 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5787/10000 [12:44:20<7:32:19, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:06:55,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5788/10000 [12:44:26<7:40:52, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:02,519 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5789/10000 [12:44:33<7:46:19, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:09,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5790/10000 [12:44:40<7:54:21, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:16,146 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5791/10000 [12:44:47<7:48:17, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:22,798 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5792/10000 [12:44:53<7:46:25, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:28,674 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5793/10000 [12:44:59<7:33:10, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:34,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5794/10000 [12:45:06<7:26:42, 6.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:41,004 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5795/10000 [12:45:12<7:21:03, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:47,643 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5796/10000 [12:45:18<7:28:59, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:54,004 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5797/10000 [12:45:25<7:24:24, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:07:59,865 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5798/10000 [12:45:31<7:17:08, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:06,725 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5799/10000 [12:45:37<7:31:00, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:15,686 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5800/10000 [12:45:46<8:22:57, 7.19s/it] 58%|██████████████████████████████████▏ | 5800/10000 [12:45:46<8:22:57, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:22,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5801/10000 [12:45:53<8:09:19, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:28,725 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5802/10000 [12:45:59<7:57:42, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:34,791 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5803/10000 [12:46:05<7:40:44, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:41,159 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5804/10000 [12:46:12<7:37:58, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:47,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▏ | 5805/10000 [12:46:18<7:37:39, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:08:54,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5806/10000 [12:46:25<7:36:15, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:01,524 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5807/10000 [12:46:32<7:51:30, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:08,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5808/10000 [12:46:39<7:52:44, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:14,706 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5809/10000 [12:46:45<7:46:36, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:27,236 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5810/10000 [12:46:58<9:47:05, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:33,722 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5811/10000 [12:47:04<9:08:29, 7.86s/it]{'loss': 0.1404, 'learning_rate': 1.4137894736842105e-06, 'epoch': 0.55} +{'loss': 0.1803, 'learning_rate': 1.4058947368421053e-06, 'epoch': 0.56} +{'loss': 0.1351, 'learning_rate': 1.3980000000000002e-06, 'epoch': 0.56} +{'loss': 0.1305, 'learning_rate': 1.3901052631578947e-06, 'epoch': 0.56} +{'loss': 0.1188, 'learning_rate': 1.3822105263157895e-06, 'epoch': 0.56} +{'loss': 0.108, 'learning_rate': 1.3743157894736842e-06, 'epoch': 0.56} +{'loss': 0.1127, 'learning_rate': 1.366421052631579e-06, 'epoch': 0.57} +{'loss': 0.108, 'learning_rate': 1.3585263157894737e-06, 'epoch': 0.57} +{'loss': 0.1199, 'learning_rate': 1.3506315789473685e-06, 'epoch': 0.57} +{'loss': 0.1085, 'learning_rate': 1.3427368421052632e-06, 'epoch': 0.57} +{'loss': 0.1154, 'learning_rate': 1.334842105263158e-06, 'epoch': 0.58} +{'loss': 0.1231, 'learning_rate': 1.3269473684210527e-06, 'epoch': 0.58} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 6.92it/s] Reading metadata...: 2165it [00:00, 13096.50it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 22:09:41,503 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5812/10000 [12:47:12<9:04:23, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:48,014 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5813/10000 [12:47:19<8:37:32, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:09:54,814 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5814/10000 [12:47:25<8:26:27, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:01,329 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5815/10000 [12:47:32<8:11:24, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:07,996 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5816/10000 [12:47:39<8:01:26, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:14,556 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5817/10000 [12:47:45<7:52:14, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:20,702 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5818/10000 [12:47:51<7:40:56, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:26,728 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5819/10000 [12:47:57<7:30:03, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:36,723 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5820/10000 [12:48:07<8:42:57, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:42,814 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5821/10000 [12:48:13<8:14:18, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:49,045 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5822/10000 [12:48:20<7:53:04, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:10:55,246 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5823/10000 [12:48:26<7:43:43, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:01,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5824/10000 [12:48:32<7:29:07, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:07,343 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5825/10000 [12:48:38<7:19:33, 6.32s/it] 58%|██████████████████████████████████▎ | 5825/10000 [12:48:38<7:19:33, 6.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:13,305 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▎ | 5826/10000 [12:48:44<7:13:56, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:19,676 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5827/10000 [12:48:50<7:17:05, 6.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:25,741 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5828/10000 [12:48:56<7:12:57, 6.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:31,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5829/10000 [12:49:03<7:11:17, 6.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:39,989 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5830/10000 [12:49:11<7:49:10, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:46,874 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5831/10000 [12:49:17<7:50:13, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:11:53,286 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5832/10000 [12:49:24<7:43:05, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:00,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5833/10000 [12:49:31<7:44:43, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:06,518 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5834/10000 [12:49:37<7:42:51, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:13,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████��███████████████████████████▍ | 5835/10000 [12:49:44<7:39:41, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:19,738 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5836/10000 [12:49:50<7:41:06, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:26,443 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5837/10000 [12:49:57<7:42:19, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:32,791 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5838/10000 [12:50:03<7:35:04, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:39,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5839/10000 [12:50:10<7:28:20, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:46,467 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5840/10000 [12:50:17<7:48:04, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:52,754 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5841/10000 [12:50:23<7:38:59, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:12:58,906 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5842/10000 [12:50:30<7:28:38, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:04,992 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5843/10000 [12:50:36<7:20:48, 6.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:11,149 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5844/10000 [12:50:42<7:16:00, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:17,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5845/10000 [12:50:48<7:12:37, 6.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:23,477 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5846/10000 [12:50:54<7:10:37, 6.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:29,568 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▍ | 5847/10000 [12:51:00<7:08:46, 6.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:36,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▌ | 5848/10000 [12:51:07<7:16:41, 6.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:42,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▌ | 5849/10000 [12:51:13<7:19:20, 6.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:49,650 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 58%|██████████████████████████████████▌ | 5850/10000 [12:51:20<7:35:09, 6.58s/it] 58%|██████████████████████████████████▌ | 5850/10000 [12:51:20<7:35:09, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:13:56,538 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5851/10000 [12:51:27<7:40:33, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:03,305 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5852/10000 [12:51:34<7:42:29, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:09,877 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5853/10000 [12:51:41<7:40:17, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:16,641 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5854/10000 [12:51:47<7:42:54, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:23,134 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5855/10000 [12:51:54<7:35:16, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:29,527 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5856/10000 [12:52:00<7:32:12, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:36,128 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5857/10000 [12:52:07<7:33:58, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:42,849 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5858/10000 [12:52:14<7:37:36, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:14:51,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5859/10000 [12:52:22<8:09:12, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:02,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5860/10000 [12:52:33<9:45:11, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:09,544 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5861/10000 [12:52:40<9:07:40, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:16,058 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5862/10000 [12:52:47<8:38:55, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:22,451 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5863/10000 [12:52:53<8:14:03, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:28,685 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5864/10000 [12:52:59<7:55:16, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:35,066 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5865/10000 [12:53:06<7:47:13, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:41,211 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5866/10000 [12:53:12<7:33:35, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:47,215 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5867/10000 [12:53:18<7:19:12, 6.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:53,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▌ | 5868/10000 [12:53:24<7:12:51, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:15:59,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5869/10000 [12:53:30<7:05:29, 6.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:05,613 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5870/10000 [12:53:36<7:10:27, 6.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:12,357 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5871/10000 [12:53:43<7:20:36, 6.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:18,413 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5872/10000 [12:53:49<7:11:42, 6.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:25,951 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5873/10000 [12:53:57<7:39:36, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:32,117 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5874/10000 [12:54:03<7:30:08, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:38,614 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5875/10000 [12:54:09<7:29:18, 6.54s/it] 59%|██████████████████████████████████▋ | 5875/10000 [12:54:09<7:29:18, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:47,198 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5876/10000 [12:54:18<8:10:59, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:16:54,075 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5877/10000 [12:54:25<8:05:11, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:00,775 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5878/10000 [12:54:31<7:56:57, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:08,439 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5879/10000 [12:54:39<8:12:25, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:15,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5880/10000 [12:54:46<8:02:15, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:21,553 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5881/10000 [12:54:52<7:48:35, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:27,966 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5882/10000 [12:54:59<7:42:00, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:34,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5883/10000 [12:55:05<7:26:48, 6.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:43,915 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5884/10000 [12:55:15<8:37:06, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:17:51,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5885/10000 [12:55:23<8:47:58, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:04,764 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▏ | 5886/10000 [12:55:35<10:30:41, 9.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:10,963 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5887/10000 [12:55:41<9:27:20, 8.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:16,834 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5888/10000 [12:55:47<8:39:18, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:22,840 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▋ | 5889/10000 [12:55:53<8:05:51, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:28,783 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5890/10000 [12:55:59<7:44:44, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:34,904 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5891/10000 [12:56:05<7:27:48, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:41,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5892/10000 [12:56:12<7:21:02, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:47,041 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5893/10000 [12:56:18<7:11:40, 6.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:53,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5894/10000 [12:56:24<7:10:35, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:18:59,321 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5895/10000 [12:56:30<7:06:43, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:06,195 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5896/10000 [12:56:37<7:18:06, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:12,806 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5897/10000 [12:56:43<7:24:03, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:19,135 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5898/10000 [12:56:50<7:19:02, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:28,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5899/10000 [12:56:59<8:16:21, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:35,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5900/10000 [12:57:06<8:06:24, 7.12s/it] 59%|██████████████████████████████████▊ | 5900/10000 [12:57:06<8:06:24, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:41,685 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5901/10000 [12:57:12<7:54:16, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:48,088 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5902/10000 [12:57:19<7:44:46, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:19:54,749 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5903/10000 [12:57:25<7:40:19, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:01,572 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5904/10000 [12:57:32<7:43:29, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:08,437 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5905/10000 [12:57:39<7:42:59, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:14,966 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5906/10000 [12:57:46<7:36:55, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:20,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5907/10000 [12:57:52<7:25:05, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:27,581 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5908/10000 [12:57:58<7:25:23, 6.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:34,321 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|████████████████���█████████████████▊ | 5909/10000 [12:58:05<7:30:21, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:41,123 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5910/10000 [12:58:12<7:34:31, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:47,911 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▊ | 5911/10000 [12:58:19<7:36:53, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:20:54,559 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5912/10000 [12:58:25<7:34:01, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:01,186 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5913/10000 [12:58:32<7:35:07, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:08,747 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5914/10000 [12:58:39<7:53:31, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:15,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5915/10000 [12:58:46<7:46:43, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:22,136 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5916/10000 [12:58:53<7:44:13, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:28,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5917/10000 [12:59:00<7:42:27, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:35,646 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5918/10000 [12:59:06<7:41:31, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:48,969 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5919/10000 [12:59:20<9:56:04, 8.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:21:56,135 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5920/10000 [12:59:27<9:22:30, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:02,784 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5921/10000 [12:59:33<8:48:16, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:09,395 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5922/10000 [12:59:40<8:25:16, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:16,017 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5923/10000 [12:59:47<8:08:36, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:22,669 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5924/10000 [12:59:53<7:56:09, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:29,561 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5925/10000 [13:00:00<7:55:43, 7.00s/it] 59%|██████████████████████████████████▉ | 5925/10000 [13:00:00<7:55:43, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:36,170 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5926/10000 [13:00:07<7:45:10, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:42,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5927/10000 [13:00:13<7:33:25, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:48,441 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5928/10000 [13:00:19<7:18:28, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:22:54,342 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5929/10000 [13:00:25<7:09:12, 6.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:00,385 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5930/10000 [13:00:31<7:02:10, 6.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:06,541 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5931/10000 [13:00:37<7:01:28, 6.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:13,128 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|██████████████████████████████████▉ | 5932/10000 [13:00:44<7:09:25, 6.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:19,192 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5933/10000 [13:00:50<7:03:50, 6.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:25,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5934/10000 [13:00:56<6:59:54, 6.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:31,237 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5935/10000 [13:01:02<6:53:02, 6.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:37,459 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5936/10000 [13:01:08<6:57:40, 6.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:43,494 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5937/10000 [13:01:14<6:54:43, 6.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:23:50,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5938/10000 [13:01:21<7:04:16, 6.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:02,186 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5939/10000 [13:01:33<8:59:54, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:08,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5940/10000 [13:01:40<8:38:12, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:15,472 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5941/10000 [13:01:46<8:14:13, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:21,918 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5942/10000 [13:01:53<7:56:43, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:28,097 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5943/10000 [13:01:59<7:38:30, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:34,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5944/10000 [13:02:05<7:23:27, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:40,045 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5945/10000 [13:02:11<7:09:15, 6.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:46,586 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5946/10000 [13:02:17<7:14:42, 6.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:52,728 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5947/10000 [13:02:23<7:07:15, 6.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:24:58,789 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5948/10000 [13:02:29<7:00:40, 6.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:04,908 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 59%|███████████████████████████████████ | 5949/10000 [13:02:35<6:59:10, 6.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:10,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████ | 5950/10000 [13:02:42<6:55:59, 6.16s/it] 60%|███████████████████████████████████ | 5950/10000 [13:02:42<6:55:59, 6.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:17,136 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████ | 5951/10000 [13:02:48<6:56:16, 6.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:23,157 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████ | 5952/10000 [13:02:54<6:55:03, 6.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:29,304 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████ | 5953/10000 [13:03:00<6:51:05, 6.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:35,201 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5954/10000 [13:03:06<6:50:30, 6.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:41,376 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5955/10000 [13:03:12<6:49:53, 6.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:52,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5956/10000 [13:03:23<8:25:44, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:25:58,369 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5957/10000 [13:03:29<8:00:59, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:05,758 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5958/10000 [13:03:36<8:04:49, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:13,880 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5959/10000 [13:03:45<8:24:32, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:19,956 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5960/10000 [13:03:51<7:55:41, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:26,369 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5961/10000 [13:03:57<7:41:45, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:32,324 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5962/10000 [13:04:03<7:23:50, 6.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:38,377 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5963/10000 [13:04:09<7:12:59, 6.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:44,490 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5964/10000 [13:04:15<7:04:51, 6.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:50,523 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5965/10000 [13:04:21<6:59:56, 6.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:26:56,868 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5966/10000 [13:04:27<7:00:55, 6.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:03,026 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5967/10000 [13:04:34<6:59:04, 6.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:09,466 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5968/10000 [13:04:40<7:02:51, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:15,895 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5969/10000 [13:04:47<7:06:59, 6.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:22,368 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5970/10000 [13:04:53<7:07:58, 6.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:29,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5971/10000 [13:05:00<7:21:15, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:36,352 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5972/10000 [13:05:07<7:27:33, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:43,092 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5973/10000 [13:05:14<7:30:45, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:49,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▏ | 5974/10000 [13:05:21<7:33:09, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:27:56,693 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5975/10000 [13:05:27<7:31:33, 6.73s/it] 60%|███████████████████████████████████▎ | 5975/10000 [13:05:27<7:31:33, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:03,620 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5976/10000 [13:05:34<7:37:29, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:11,153 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5977/10000 [13:05:42<7:51:37, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:19,873 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5978/10000 [13:05:50<8:23:26, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:27,494 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5979/10000 [13:05:58<8:27:55, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:34,820 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5980/10000 [13:06:05<8:21:32, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:41,813 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5981/10000 [13:06:12<8:12:00, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:49,907 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5982/10000 [13:06:20<8:24:37, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:28:57,300 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5983/10000 [13:06:28<8:22:48, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:29:05,063 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5984/10000 [13:06:36<8:28:02, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:29:31,664 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▋ | 5985/10000 [13:07:02<14:50:29, 13.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:29:39,394 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▋ | 5986/10000 [13:07:10<12:58:02, 11.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:29:46,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▋ | 5987/10000 [13:07:18<11:36:42, 10.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:29:54,051 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▋ | 5988/10000 [13:07:25<10:30:27, 9.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:01,743 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5989/10000 [13:07:32<9:54:45, 8.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:10,335 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5990/10000 [13:07:41<9:49:46, 8.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:17,740 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5991/10000 [13:07:48<9:20:20, 8.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:25,400 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5992/10000 [13:07:56<9:05:38, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:33,084 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5993/10000 [13:08:04<8:55:44, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:41,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5994/10000 [13:08:12<8:55:32, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:48,870 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▎ | 5995/10000 [13:08:19<8:49:14, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:30:56,899 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▍ | 5996/10000 [13:08:28<8:52:39, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:31:04,545 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▍ | 5997/10000 [13:08:35<8:46:40, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:31:16,703 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▊ | 5998/10000 [13:08:47<10:10:07, 9.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:31:24,477 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|███████████████████████████████████▍ | 5999/10000 [13:08:55<9:42:53, 8.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 22:31:39,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 60%|██████████████████████████████████▊ | 6000/10000 [13:09:10<11:45:09, 10.58s/it] 60%|██████████████████████████████████▊ | 6000/10000 [13:09:10<11:45:09, 10.58s/it][INFO|trainer.py:2955] 2022-12-16 22:31:41,464 >> ***** Running Evaluation ***** +[INFO|trainer.py:2959] 2022-12-16 22:31:41,464 >> Num examples: Unknown +[INFO|trainer.py:2960] 2022-12-16 22:31:41,464 >> Batch size = 32 +{'loss': 0.1306, 'learning_rate': 1.3190526315789475e-06, 'epoch': 0.58} +{'loss': 0.1009, 'learning_rate': 1.3111578947368422e-06, 'epoch': 0.58} +{'loss': 0.1055, 'learning_rate': 1.3032631578947367e-06, 'epoch': 0.59} +{'loss': 0.1175, 'learning_rate': 1.2953684210526315e-06, 'epoch': 0.59} +{'loss': 0.1237, 'learning_rate': 1.2874736842105264e-06, 'epoch': 0.59} +{'loss': 0.1166, 'learning_rate': 1.2795789473684212e-06, 'epoch': 0.59} +{'loss': 0.1092, 'learning_rate': 1.271684210526316e-06, 'epoch': 0.6} +{'loss': 0.1113, 'learning_rate': 1.2637894736842105e-06, 'epoch': 0.6} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1487it [00:00, 14868.81it/s] Reading metadata...: 1704it [00:00, 16652.13it/s] +[INFO|trainer_utils.py:689] 2022-12-16 22:31:44,356 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender. If up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. + 60%|██████████████████████████████████▊ | 6000/10000 [13:15:04<11:45:09, 10.58s/it][INFO|trainer.py:2700] 2022-12-16 22:37:35,088 >> Saving model checkpoint to ./checkpoint-6000 +[INFO|configuration_utils.py:447] 2022-12-16 22:37:35,089 >> Configuration saved in ./checkpoint-6000/config.json +[INFO|modeling_utils.py:1680] 2022-12-16 22:37:35,965 >> Model weights saved in ./checkpoint-6000/pytorch_model.bin +[INFO|feature_extraction_utils.py:368] 2022-12-16 22:37:35,980 >> Feature extractor saved in ./checkpoint-6000/preprocessor_config.json +[INFO|feature_extraction_utils.py:368] 2022-12-16 22:37:39,711 >> Feature extractor saved in ./preprocessor_config.json