diff --git "a/nohup.out" "b/nohup.out" --- "a/nohup.out" +++ "b/nohup.out" @@ -69539,3 +69539,1065 @@ If your task is similar to the task the model of the checkpoint was trained on, [INFO|modeling_utils.py:1680] 2022-12-16 15:59:37,395 >> Model weights saved in ./checkpoint-3000/pytorch_model.bin [INFO|feature_extraction_utils.py:368] 2022-12-16 15:59:37,410 >> Feature extractor saved in ./checkpoint-3000/preprocessor_config.json [INFO|feature_extraction_utils.py:368] 2022-12-16 15:59:41,687 >> Feature extractor saved in ./preprocessor_config.json +[WARNING|modeling_whisper.py:902] 2022-12-16 16:00:59,456 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████ | 3001/10000 [6:38:30<242:27:38, 124.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:07,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▍ | 3002/10000 [6:38:38<174:08:50, 89.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:15,463 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▍ | 3003/10000 [6:38:46<126:47:00, 65.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:22,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▋ | 3004/10000 [6:38:53<92:49:22, 47.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:30,275 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▋ | 3005/10000 [6:39:01<69:30:31, 35.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:37,906 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▋ | 3006/10000 [6:39:09<53:06:07, 27.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:45,574 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▋ | 3007/10000 [6:39:16<41:35:35, 21.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:01:53,184 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▋ | 3008/10000 [6:39:24<33:34:38, 17.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:00,802 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3009/10000 [6:39:31<27:54:20, 14.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:08,898 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3010/10000 [6:39:40<24:18:10, 12.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:16,688 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3011/10000 [6:39:47<21:31:51, 11.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:23,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3012/10000 [6:39:55<19:16:42, 9.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:30,922 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3013/10000 [6:40:02<17:33:58, 9.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:38,189 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3014/10000 [6:40:09<16:28:05, 8.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:51,055 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3015/10000 [6:40:22<19:06:32, 9.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:02:58,579 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3016/10000 [6:40:29<17:43:43, 9.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:06,668 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3017/10000 [6:40:37<17:05:47, 8.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:14,272 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3018/10000 [6:40:45<16:20:51, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:22,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3019/10000 [6:40:53<16:16:42, 8.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:29,944 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3020/10000 [6:41:01<15:43:17, 8.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:37,121 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3021/10000 [6:41:08<15:08:21, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:48,027 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3022/10000 [6:41:19<16:54:52, 8.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:03:56,780 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3023/10000 [6:41:27<16:57:47, 8.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:03,424 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3024/10000 [6:41:34<15:40:26, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:09,998 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3025/10000 [6:41:41<14:51:59, 7.67s/it] 30%|█████████████████▊ | 3025/10000 [6:41:41<14:51:59, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:17,241 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3026/10000 [6:41:48<14:36:47, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:24,137 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3027/10000 [6:41:55<14:14:07, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:31,603 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3028/10000 [6:42:02<14:19:27, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:38,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▊ | 3029/10000 [6:42:10<14:16:03, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:47,022 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3030/10000 [6:42:18<14:41:32, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:04:54,688 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3031/10000 [6:42:25<14:43:16, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:02,180 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3032/10000 [6:42:33<14:37:15, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:08,983 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3033/10000 [6:42:40<14:13:29, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:16,216 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3034/10000 [6:42:47<14:10:08, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:25,821 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3035/10000 [6:42:56<15:27:48, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:33,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3036/10000 [6:43:04<15:05:05, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:40,562 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3037/10000 [6:43:11<14:51:33, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:47,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3038/10000 [6:43:18<14:22:46, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:05:54,858 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3039/10000 [6:43:25<14:21:18, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:02,363 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3040/10000 [6:43:33<14:22:55, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:20,934 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3041/10000 [6:43:52<20:50:10, 10.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:28,182 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3042/10000 [6:43:59<18:49:55, 9.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:35,386 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3043/10000 [6:44:06<17:16:55, 8.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:42,613 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3044/10000 [6:44:13<16:17:07, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:49,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3045/10000 [6:44:21<15:43:02, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:06:57,263 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3046/10000 [6:44:28<15:10:12, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:04,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3047/10000 [6:44:35<14:57:50, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:12,042 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3048/10000 [6:44:43<14:43:05, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:19,427 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3049/10000 [6:44:50<14:32:41, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:26,639 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 30%|█████████████████▉ | 3050/10000 [6:44:57<14:26:08, 7.48s/it] 30%|█████████████████▉ | 3050/10000 [6:44:57<14:26:08, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:36,062 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3051/10000 [6:45:07<15:28:48, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:42,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3052/10000 [6:45:14<14:51:51, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:49,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3053/10000 [6:45:21<14:27:06, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:07:57,833 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3054/10000 [6:45:29<14:43:40, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:07,389 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3055/10000 [6:45:38<15:49:09, 8.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:15,286 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3056/10000 [6:45:46<15:38:01, 8.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:23,628 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3057/10000 [6:45:54<15:45:37, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:32,372 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3058/10000 [6:46:03<16:04:16, 8.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:39,873 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3059/10000 [6:46:11<15:38:45, 8.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:47,214 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3060/10000 [6:46:18<15:08:48, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:08:54,689 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3061/10000 [6:46:25<14:56:56, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:02,876 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3062/10000 [6:46:34<15:13:12, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:10,396 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3063/10000 [6:46:41<14:56:49, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:17,800 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3064/10000 [6:46:48<14:45:40, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:25,388 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3065/10000 [6:46:56<14:41:51, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:37,443 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3066/10000 [6:47:08<17:14:47, 8.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:45,689 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3067/10000 [6:47:16<16:50:43, 8.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:09:53,388 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3068/10000 [6:47:24<16:16:21, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:00,400 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3069/10000 [6:47:31<15:21:33, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:08,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3070/10000 [6:47:39<15:15:06, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:15,896 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3071/10000 [6:47:47<15:12:02, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:24,401 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████ | 3072/10000 [6:47:55<15:33:47, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:33,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3073/10000 [6:48:04<15:53:10, 8.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:41,824 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3074/10000 [6:48:13<16:10:27, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:48,828 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3075/10000 [6:48:19<15:20:54, 7.98s/it] 31%|██████████████████▏ | 3075/10000 [6:48:19<15:20:54, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:10:57,001 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3076/10000 [6:48:28<15:27:20, 8.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:04,835 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3077/10000 [6:48:36<15:20:40, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:13,084 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3078/10000 [6:48:44<15:31:40, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:20,711 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3079/10000 [6:48:51<15:12:31, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:27,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3080/10000 [6:48:58<14:46:30, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:34,638 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3081/10000 [6:49:05<14:15:39, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:41,094 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3082/10000 [6:49:12<13:42:37, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:47,342 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3083/10000 [6:49:18<13:07:48, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:11:54,157 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3084/10000 [6:49:25<13:11:44, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:01,587 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3085/10000 [6:49:32<13:30:35, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:08,334 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3086/10000 [6:49:39<13:20:52, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:15,106 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3087/10000 [6:49:46<13:14:02, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:21,510 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3088/10000 [6:49:52<12:59:19, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:28,053 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3089/10000 [6:49:59<12:48:33, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:34,176 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3090/10000 [6:50:05<12:30:40, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:41,010 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3091/10000 [6:50:12<12:38:36, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:47,663 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3092/10000 [6:50:18<12:40:07, 6.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:12:54,520 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▏ | 3093/10000 [6:50:25<12:53:31, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:02,780 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3094/10000 [6:50:33<13:45:14, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:09,523 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3095/10000 [6:50:40<13:30:32, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:16,031 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3096/10000 [6:50:47<13:12:15, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:22,695 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3097/10000 [6:50:53<13:01:14, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:29,212 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3098/10000 [6:51:00<12:50:35, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:35,284 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3099/10000 [6:51:06<12:29:53, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:41,362 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3100/10000 [6:51:12<12:15:07, 6.39s/it] 31%|██████████████████▎ | 3100/10000 [6:51:12<12:15:07, 6.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:47,870 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3101/10000 [6:51:18<12:18:02, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:13:54,216 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3102/10000 [6:51:25<12:17:35, 6.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:00,799 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3103/10000 [6:51:31<12:22:25, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:07,315 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3104/10000 [6:51:38<12:26:04, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:14,002 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3105/10000 [6:51:45<12:31:02, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:20,587 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3106/10000 [6:51:51<12:34:04, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:28,152 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3107/10000 [6:51:59<13:04:55, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:35,091 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3108/10000 [6:52:06<13:12:51, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:41,743 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3109/10000 [6:52:12<13:02:57, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:48,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3110/10000 [6:52:19<13:11:07, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:14:55,734 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3111/10000 [6:52:26<13:12:01, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:02,440 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3112/10000 [6:52:33<13:05:16, 6.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:08,999 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3113/10000 [6:52:40<12:57:07, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:20,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▎ | 3114/10000 [6:52:51<15:29:24, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:27,103 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3115/10000 [6:52:58<14:43:23, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:33,761 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3116/10000 [6:53:04<14:12:29, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:40,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3117/10000 [6:53:11<13:56:18, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:47,293 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3118/10000 [6:53:18<13:27:52, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:53,493 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3119/10000 [6:53:24<12:59:28, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:15:59,524 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3120/10000 [6:53:30<12:35:15, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:12,203 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3121/10000 [6:53:43<15:59:58, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:18,557 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3122/10000 [6:53:49<14:50:39, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:24,404 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3123/10000 [6:53:55<13:43:16, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:30,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3124/10000 [6:54:01<13:02:18, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:45,486 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3125/10000 [6:54:16<17:48:58, 9.33s/it] 31%|██████████████████▍ | 3125/10000 [6:54:16<17:48:58, 9.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:51,564 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3126/10000 [6:54:22<15:59:19, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:16:57,880 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3127/10000 [6:54:28<14:46:24, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:05,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3128/10000 [6:54:36<14:40:37, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:12,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3129/10000 [6:54:43<14:23:25, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:20,638 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3130/10000 [6:54:51<14:41:34, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:28,159 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3131/10000 [6:54:59<14:35:17, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:35,946 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3132/10000 [6:55:07<14:35:51, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:42,662 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3133/10000 [6:55:13<14:08:24, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:51,967 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3134/10000 [6:55:23<15:09:20, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:17:59,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▍ | 3135/10000 [6:55:30<14:51:30, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:06,432 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3136/10000 [6:55:37<14:25:58, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:13,433 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3137/10000 [6:55:44<14:06:56, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:21,050 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3138/10000 [6:55:52<14:13:48, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:28,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3139/10000 [6:56:00<14:28:29, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:37,408 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3140/10000 [6:56:08<14:56:35, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:45,695 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3141/10000 [6:56:16<15:13:30, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:18:53,934 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3142/10000 [6:56:25<15:23:41, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:01,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3143/10000 [6:56:32<14:59:28, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:08,608 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3144/10000 [6:56:39<14:38:07, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:16,337 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3145/10000 [6:56:47<14:39:13, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:24,063 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3146/10000 [6:56:55<14:41:41, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:31,584 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3147/10000 [6:57:02<14:34:52, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:38,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3148/10000 [6:57:10<14:25:11, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:46,711 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 31%|██████████████████▌ | 3149/10000 [6:57:17<14:29:56, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:19:54,868 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3150/10000 [6:57:25<14:46:52, 7.77s/it] 32%|██████████████████▌ | 3150/10000 [6:57:25<14:46:52, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:03,143 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3151/10000 [6:57:34<15:03:52, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:10,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3152/10000 [6:57:41<14:41:34, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:19,207 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3153/10000 [6:57:50<15:21:07, 8.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:26,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3154/10000 [6:57:58<15:07:36, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:34,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3155/10000 [6:58:05<14:53:09, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:42,654 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▌ | 3156/10000 [6:58:13<15:05:52, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:51,107 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3157/10000 [6:58:22<15:26:06, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:20:58,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3158/10000 [6:58:29<14:50:47, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:05,139 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3159/10000 [6:58:36<14:17:38, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:12,163 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3160/10000 [6:58:43<14:01:58, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:19,218 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3161/10000 [6:58:50<13:48:04, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:26,109 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3162/10000 [6:58:57<13:37:25, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:33,103 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3163/10000 [6:59:04<13:30:54, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:40,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3164/10000 [6:59:11<13:43:00, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:47,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3165/10000 [6:59:19<13:45:33, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:21:55,277 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3166/10000 [6:59:26<13:50:14, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:03,562 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3167/10000 [6:59:34<14:26:06, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:11,172 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3168/10000 [6:59:42<14:24:43, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:19,427 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3169/10000 [6:59:50<14:46:05, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:26,455 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3170/10000 [6:59:57<14:21:18, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:33,333 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3171/10000 [7:00:04<13:57:01, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:40,414 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3172/10000 [7:00:11<13:47:57, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:50,181 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3173/10000 [7:00:21<15:13:37, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:22:58,050 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3174/10000 [7:00:29<15:04:41, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:05,648 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3175/10000 [7:00:36<14:52:20, 7.84s/it] 32%|██████████████████▋ | 3175/10000 [7:00:36<14:52:20, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:12,720 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3176/10000 [7:00:43<14:24:41, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:19,441 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▋ | 3177/10000 [7:00:50<13:58:42, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:26,646 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3178/10000 [7:00:57<13:51:26, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:33,954 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3179/10000 [7:01:05<13:50:24, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:41,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3180/10000 [7:01:12<13:45:16, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:48,597 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3181/10000 [7:01:19<13:52:34, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:23:55,572 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3182/10000 [7:01:26<13:37:37, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:03,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3183/10000 [7:01:34<14:05:14, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:11,464 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3184/10000 [7:01:42<14:21:00, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:19,420 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3185/10000 [7:01:50<14:34:14, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:26,822 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3186/10000 [7:01:57<14:25:05, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:34,082 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3187/10000 [7:02:05<14:12:50, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:24:46,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3188/10000 [7:02:17<16:48:23, 8.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:25:12,479 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3189/10000 [7:02:43<26:40:44, 14.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:25:19,864 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3190/10000 [7:02:50<22:53:33, 12.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:25:30,127 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3191/10000 [7:03:01<21:53:08, 11.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:25:37,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3192/10000 [7:03:08<19:21:55, 10.24s/it]{'eval_loss': 0.28189417719841003, 'eval_wer': 21.946852619675838, 'eval_runtime': 314.6023, 'eval_samples_per_second': 5.416, 'eval_steps_per_second': 0.172, 'epoch': 0.3} +{'loss': 0.1298, 'learning_rate': 2.203263157894737e-06, 'epoch': 0.3} +{'loss': 0.1249, 'learning_rate': 2.1953684210526315e-06, 'epoch': 0.3} +{'loss': 0.1267, 'learning_rate': 2.1874736842105263e-06, 'epoch': 0.31} +{'loss': 0.1278, 'learning_rate': 2.179578947368421e-06, 'epoch': 0.31} +{'loss': 0.1399, 'learning_rate': 2.1716842105263158e-06, 'epoch': 0.31} +{'loss': 0.1953, 'learning_rate': 2.1637894736842105e-06, 'epoch': 0.32} +{'loss': 0.1883, 'learning_rate': 2.1558947368421053e-06, 'epoch': 0.32} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 3.94it/s] Reading metadata...: 2165it [00:00, 7909.81it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 16:25:46,478 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3193/10000 [7:03:17<18:47:05, 9.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:25:53,563 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3194/10000 [7:03:24<17:04:33, 9.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:00,555 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3195/10000 [7:03:31<15:58:47, 8.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:07,573 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3196/10000 [7:03:38<15:10:58, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:15,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3197/10000 [7:03:46<14:52:49, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:21,948 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3198/10000 [7:03:52<14:12:20, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:28,580 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▊ | 3199/10000 [7:03:59<13:48:26, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:37,625 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3200/10000 [7:04:08<14:45:39, 7.81s/it] 32%|██████████████████▉ | 3200/10000 [7:04:08<14:45:39, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:45,239 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3201/10000 [7:04:16<14:37:13, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:26:52,572 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3202/10000 [7:04:23<14:21:56, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:00,015 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3203/10000 [7:04:31<14:14:58, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:06,824 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3204/10000 [7:04:37<13:52:20, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:13,767 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3205/10000 [7:04:44<13:38:12, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:24,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3206/10000 [7:04:56<15:52:03, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:32,214 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3207/10000 [7:05:03<15:14:19, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:39,877 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3208/10000 [7:05:11<15:02:12, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:47,322 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3209/10000 [7:05:18<14:43:10, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:27:55,237 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3210/10000 [7:05:26<14:47:59, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:02,340 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3211/10000 [7:05:33<14:22:55, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:10,552 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3212/10000 [7:05:41<14:37:59, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:20,271 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3213/10000 [7:05:51<15:47:31, 8.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:28,936 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3214/10000 [7:06:00<15:59:05, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:36,378 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3215/10000 [7:06:07<15:23:32, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:44,090 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3216/10000 [7:06:15<15:06:47, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:51,538 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3217/10000 [7:06:22<14:47:16, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:28:58,387 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3218/10000 [7:06:29<14:13:32, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:05,545 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3219/10000 [7:06:36<13:57:55, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:12,645 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|██████████████████▉ | 3220/10000 [7:06:43<13:47:47, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:20,216 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3221/10000 [7:06:51<13:56:17, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:27,076 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3222/10000 [7:06:58<13:35:47, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:34,028 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3223/10000 [7:07:05<13:27:45, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:41,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3224/10000 [7:07:12<13:28:08, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:48,430 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3225/10000 [7:07:19<13:33:34, 7.21s/it] 32%|███████████████████ | 3225/10000 [7:07:19<13:33:34, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:29:55,489 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3226/10000 [7:07:26<13:28:20, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:02,337 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3227/10000 [7:07:33<13:16:36, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:10,166 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3228/10000 [7:07:41<13:43:37, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:17,685 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3229/10000 [7:07:48<13:50:50, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:25,479 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3230/10000 [7:07:56<14:01:27, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:34,332 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3231/10000 [7:08:05<14:52:36, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:42,566 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3232/10000 [7:08:13<14:59:20, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:50,991 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3233/10000 [7:08:22<15:19:47, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:30:58,022 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3234/10000 [7:08:29<14:40:00, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:05,197 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3235/10000 [7:08:36<14:20:02, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:12,631 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3236/10000 [7:08:43<14:09:10, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:20,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3237/10000 [7:08:51<14:23:31, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:29,048 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3238/10000 [7:09:00<14:53:28, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:35,572 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3239/10000 [7:09:06<14:05:45, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:42,408 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3240/10000 [7:09:13<13:44:36, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:49,063 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████ | 3241/10000 [7:09:20<13:17:01, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:31:55,479 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3242/10000 [7:09:26<12:57:26, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:02,064 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3243/10000 [7:09:33<12:46:53, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:12,570 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3244/10000 [7:09:43<14:50:33, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:19,101 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3245/10000 [7:09:50<14:05:11, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:25,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3246/10000 [7:09:56<13:19:56, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:31,506 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3247/10000 [7:10:02<12:45:31, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:37,382 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3248/10000 [7:10:08<12:17:41, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:44,356 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3249/10000 [7:10:15<12:28:51, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:50,919 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 32%|███████████████████▏ | 3250/10000 [7:10:22<12:27:52, 6.65s/it] 32%|███████████████████▏ | 3250/10000 [7:10:22<12:27:52, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:32:58,450 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3251/10000 [7:10:29<12:58:38, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:06,791 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3252/10000 [7:10:37<13:42:15, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:13,365 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3253/10000 [7:10:44<13:19:43, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:20,281 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3254/10000 [7:10:51<13:13:34, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:27,293 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3255/10000 [7:10:58<13:13:25, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:33,948 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3256/10000 [7:11:05<13:00:26, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:40,450 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3257/10000 [7:11:11<12:41:26, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:48,455 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3258/10000 [7:11:19<13:26:41, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:33:55,841 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3259/10000 [7:11:27<13:33:05, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:02,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3260/10000 [7:11:33<13:18:51, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:09,311 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3261/10000 [7:11:40<12:59:21, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:16,339 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▏ | 3262/10000 [7:11:47<13:02:09, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:22,988 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3263/10000 [7:11:54<12:54:59, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:29,595 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3264/10000 [7:12:00<12:40:41, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:36,975 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3265/10000 [7:12:08<13:05:31, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:43,589 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3266/10000 [7:12:14<12:48:16, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:49,983 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3267/10000 [7:12:21<12:32:40, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:34:56,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3268/10000 [7:12:27<12:16:43, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:02,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3269/10000 [7:12:33<11:58:33, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:08,200 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3270/10000 [7:12:39<11:45:32, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:14,355 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3271/10000 [7:12:45<11:40:28, 6.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:24,116 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3272/10000 [7:12:55<13:41:41, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:32,790 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3273/10000 [7:13:03<14:21:00, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:39,265 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3274/10000 [7:13:10<13:43:16, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:48,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3275/10000 [7:13:19<14:37:05, 7.83s/it] 33%|███████████████████▎ | 3275/10000 [7:13:19<14:37:05, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:35:54,840 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3276/10000 [7:13:26<13:58:46, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:00,890 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3277/10000 [7:13:32<13:10:15, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:07,656 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3278/10000 [7:13:38<13:01:07, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:14,341 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3279/10000 [7:13:45<12:50:54, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:20,752 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3280/10000 [7:13:51<12:30:20, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:27,219 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3281/10000 [7:13:58<12:25:42, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:33,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3282/10000 [7:14:05<12:29:35, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:41,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▎ | 3283/10000 [7:14:13<13:11:10, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:36:50,150 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3284/10000 [7:14:21<13:47:55, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:02,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3285/10000 [7:14:34<16:53:58, 9.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:10,862 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3286/10000 [7:14:41<16:09:54, 8.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:18,895 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3287/10000 [7:14:50<15:52:43, 8.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:26,623 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3288/10000 [7:14:57<15:23:56, 8.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:34,460 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3289/10000 [7:15:05<15:09:15, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:41,960 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3290/10000 [7:15:13<14:49:57, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:49,489 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3291/10000 [7:15:20<14:30:57, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:37:58,189 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3292/10000 [7:15:29<15:03:20, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:05,347 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3293/10000 [7:15:36<14:34:50, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:12,262 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3294/10000 [7:15:43<14:01:55, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:19,470 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3295/10000 [7:15:50<13:53:32, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:26,564 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3296/10000 [7:15:57<13:41:37, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:34,577 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3297/10000 [7:16:05<13:59:13, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:42,158 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3298/10000 [7:16:13<14:06:07, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:49,472 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3299/10000 [7:16:20<13:55:49, 7.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:38:57,035 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3300/10000 [7:16:28<13:57:28, 7.50s/it] 33%|███████████████████▍ | 3300/10000 [7:16:28<13:57:28, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:06,155 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3301/10000 [7:16:37<14:51:29, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:15,007 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3302/10000 [7:16:46<15:22:17, 8.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:22,157 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3303/10000 [7:16:53<14:44:09, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:29,846 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3304/10000 [7:17:01<14:36:52, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:37,118 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▍ | 3305/10000 [7:17:08<14:16:14, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:44,950 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3306/10000 [7:17:16<14:22:57, 7.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:39:52,133 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3307/10000 [7:17:23<14:02:33, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:01,418 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3308/10000 [7:17:32<15:01:43, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:10,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3309/10000 [7:17:41<15:25:43, 8.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:17,825 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3310/10000 [7:17:48<15:01:22, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:29,943 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3311/10000 [7:18:01<17:17:05, 9.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:37,979 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3312/10000 [7:18:09<16:35:45, 8.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:45,546 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3313/10000 [7:18:16<15:48:33, 8.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:52,970 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3314/10000 [7:18:24<15:11:56, 8.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:40:59,954 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3315/10000 [7:18:31<14:29:51, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:06,806 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3316/10000 [7:18:37<13:59:33, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:14,660 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3317/10000 [7:18:45<14:08:37, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:22,727 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3318/10000 [7:18:53<14:24:25, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:31,566 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3319/10000 [7:19:02<14:59:53, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:39,285 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3320/10000 [7:19:10<14:45:50, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:47,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3321/10000 [7:19:19<15:08:47, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:41:56,174 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3322/10000 [7:19:27<15:10:57, 8.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:03,428 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3323/10000 [7:19:34<14:43:04, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:11,234 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3324/10000 [7:19:42<14:33:05, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:18,416 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3325/10000 [7:19:49<14:16:28, 7.70s/it] 33%|███████████████████▌ | 3325/10000 [7:19:49<14:16:28, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:26,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▌ | 3326/10000 [7:19:57<14:39:26, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:34,771 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3327/10000 [7:20:05<14:41:46, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:42,167 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3328/10000 [7:20:13<14:23:07, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:49,368 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3329/10000 [7:20:20<14:06:02, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:42:56,296 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3330/10000 [7:20:27<13:40:18, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:05,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3331/10000 [7:20:36<14:33:04, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:13,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3332/10000 [7:20:44<14:42:34, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:21,000 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3333/10000 [7:20:52<14:29:09, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:28,299 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3334/10000 [7:20:59<14:09:06, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:35,645 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3335/10000 [7:21:06<13:57:46, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:43,322 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3336/10000 [7:21:14<14:05:40, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:50,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3337/10000 [7:21:21<13:56:31, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:43:58,434 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3338/10000 [7:21:29<14:02:36, 7.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:05,697 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3339/10000 [7:21:36<13:56:05, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:12,725 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3340/10000 [7:21:43<13:35:20, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:19,693 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|█████���█████████████▋ | 3341/10000 [7:21:50<13:25:35, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:26,598 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3342/10000 [7:21:57<13:14:44, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:34,278 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3343/10000 [7:22:05<13:29:16, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:41,620 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3344/10000 [7:22:12<13:32:44, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:44:48,225 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3345/10000 [7:22:19<13:09:05, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:01,085 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3346/10000 [7:22:32<16:20:24, 8.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:07,642 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▋ | 3347/10000 [7:22:38<15:02:45, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:14,133 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▊ | 3348/10000 [7:22:45<14:07:45, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:22,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 33%|███████████████████▊ | 3349/10000 [7:22:53<14:42:10, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:28,840 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3350/10000 [7:23:00<13:39:11, 7.39s/it] 34%|███████████████████▊ | 3350/10000 [7:23:00<13:39:11, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:36,686 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3351/10000 [7:23:07<13:52:06, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:43,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3352/10000 [7:23:14<13:18:32, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:50,008 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3353/10000 [7:23:21<13:02:29, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:45:56,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3354/10000 [7:23:27<12:44:58, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:02,654 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3355/10000 [7:23:33<12:18:16, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:08,544 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3356/10000 [7:23:39<11:54:14, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:14,570 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|█████��█████████████▊ | 3357/10000 [7:23:45<11:39:11, 6.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:20,677 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3358/10000 [7:23:51<11:35:52, 6.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:34,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3359/10000 [7:24:06<15:58:22, 8.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:41,126 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3360/10000 [7:24:12<14:38:20, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:48,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3361/10000 [7:24:19<14:09:34, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:46:54,885 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3362/10000 [7:24:25<13:33:48, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:02,651 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3363/10000 [7:24:33<13:44:51, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:10,112 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3364/10000 [7:24:41<13:49:40, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:17,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3365/10000 [7:24:48<13:36:07, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:23,676 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3366/10000 [7:24:54<13:05:24, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:30,523 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3367/10000 [7:25:01<12:57:38, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:37,416 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▊ | 3368/10000 [7:25:08<12:52:20, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:43,599 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3369/10000 [7:25:14<12:24:05, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:50,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3370/10000 [7:25:21<12:18:34, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:47:59,289 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3371/10000 [7:25:30<13:39:14, 7.42s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:06,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3372/10000 [7:25:37<13:35:33, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:13,648 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3373/10000 [7:25:44<13:23:21, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:21,290 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3374/10000 [7:25:52<13:36:09, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:29,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3375/10000 [7:26:00<13:50:10, 7.52s/it] 34%|███████████████████▉ | 3375/10000 [7:26:00<13:50:10, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:35,691 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3376/10000 [7:26:06<13:19:01, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:42,321 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3377/10000 [7:26:13<13:00:07, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:48,966 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3378/10000 [7:26:20<12:46:33, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:48:55,550 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3379/10000 [7:26:26<12:32:31, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:02,333 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3380/10000 [7:26:33<12:31:25, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:08,944 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3381/10000 [7:26:40<12:26:37, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:15,255 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3382/10000 [7:26:46<12:13:17, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:21,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3383/10000 [7:26:52<12:00:53, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:28,227 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3384/10000 [7:26:59<12:04:08, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:35,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3385/10000 [7:27:06<12:13:12, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:41,684 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3386/10000 [7:27:12<12:12:14, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:48,420 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3387/10000 [7:27:19<12:12:09, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:49:55,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3388/10000 [7:27:26<12:19:54, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:02,149 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|███████████████████▉ | 3389/10000 [7:27:33<12:26:53, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:14,537 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3390/10000 [7:27:45<15:32:22, 8.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:21,135 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3391/10000 [7:27:52<14:31:35, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:27,733 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3392/10000 [7:27:58<13:45:09, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:34,210 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3393/10000 [7:28:05<13:09:02, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:40,574 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3394/10000 [7:28:11<12:41:32, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:46,610 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3395/10000 [7:28:17<12:17:47, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:52,775 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3396/10000 [7:28:23<11:59:23, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:50:58,876 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3397/10000 [7:28:29<11:42:39, 6.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:06,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3398/10000 [7:28:37<12:16:57, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:12,786 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3399/10000 [7:28:43<12:12:06, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:19,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3400/10000 [7:28:50<12:10:43, 6.64s/it] 34%|████████████████████ | 3400/10000 [7:28:50<12:10:43, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:31,255 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3401/10000 [7:29:02<14:58:54, 8.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:37,330 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3402/10000 [7:29:08<13:48:23, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:43,496 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3403/10000 [7:29:14<13:08:17, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:50,151 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3404/10000 [7:29:21<12:52:22, 7.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:51:58,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3405/10000 [7:29:29<13:17:30, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:04,156 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3406/10000 [7:29:35<12:39:07, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:10,138 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3407/10000 [7:29:41<12:10:42, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:16,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3408/10000 [7:29:47<11:51:52, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:22,452 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3409/10000 [7:29:53<11:40:04, 6.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:30,242 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3410/10000 [7:30:01<12:28:47, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:36,327 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████ | 3411/10000 [7:30:07<12:03:34, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:42,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3412/10000 [7:30:13<11:51:39, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:48,604 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3413/10000 [7:30:19<11:35:44, 6.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:52:56,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3414/10000 [7:30:27<12:22:25, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:02,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3415/10000 [7:30:33<11:59:44, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:08,487 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3416/10000 [7:30:39<11:45:13, 6.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:15,357 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3417/10000 [7:30:46<11:58:42, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:22,276 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3418/10000 [7:30:53<12:13:39, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:28,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3419/10000 [7:30:59<11:52:25, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:34,283 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3420/10000 [7:31:05<11:32:42, 6.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:40,391 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3421/10000 [7:31:11<11:26:42, 6.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:46,620 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3422/10000 [7:31:17<11:23:19, 6.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:53:52,680 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3423/10000 [7:31:23<11:18:02, 6.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:01,097 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3424/10000 [7:31:32<12:28:31, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:06,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3425/10000 [7:31:37<11:57:14, 6.55s/it] 34%|████████████████████▏ | 3425/10000 [7:31:37<11:57:14, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:13,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3426/10000 [7:31:44<12:02:24, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:20,061 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3427/10000 [7:31:51<11:56:50, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:26,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3428/10000 [7:31:57<12:00:23, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:33,111 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3429/10000 [7:32:04<11:57:18, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:45,289 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3430/10000 [7:32:16<15:02:39, 8.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:53,023 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3431/10000 [7:32:24<14:46:13, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:54:59,889 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▏ | 3432/10000 [7:32:31<14:05:25, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:07,415 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3433/10000 [7:32:38<13:54:35, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:13,925 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3434/10000 [7:32:45<13:21:25, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:20,940 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3435/10000 [7:32:51<13:07:21, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:27,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3436/10000 [7:32:58<12:44:25, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:34,070 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3437/10000 [7:33:05<12:36:40, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:41,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3438/10000 [7:33:12<12:47:51, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:47,807 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3439/10000 [7:33:18<12:29:44, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:55:54,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3440/10000 [7:33:25<12:14:31, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:00,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3441/10000 [7:33:31<12:10:38, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:07,201 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3442/10000 [7:33:38<11:59:40, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:13,493 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3443/10000 [7:33:44<11:48:52, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:19,734 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3444/10000 [7:33:50<11:40:53, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:25,996 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3445/10000 [7:33:57<11:38:19, 6.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:32,106 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3446/10000 [7:34:03<11:28:28, 6.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:47,106 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3447/10000 [7:34:18<16:14:18, 8.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:56:54,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3448/10000 [7:34:25<15:13:36, 8.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:00,780 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3449/10000 [7:34:31<14:14:30, 7.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:25,884 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 34%|████████████████████▎ | 3450/10000 [7:34:56<23:38:23, 12.99s/it] 34%|████████████████████▎ | 3450/10000 [7:34:56<23:38:23, 12.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:32,383 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▎ | 3451/10000 [7:35:03<20:07:01, 11.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:38,910 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▎ | 3452/10000 [7:35:10<17:39:35, 9.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:45,597 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▎ | 3453/10000 [7:35:16<15:57:57, 8.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:52,176 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3454/10000 [7:35:23<14:46:16, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:57:58,413 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3455/10000 [7:35:29<13:42:45, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:04,683 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3456/10000 [7:35:35<13:05:53, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:10,843 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3457/10000 [7:35:41<12:26:35, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:18,132 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3458/10000 [7:35:49<12:45:04, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:24,558 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3459/10000 [7:35:55<12:25:02, 6.83s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:30,819 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3460/10000 [7:36:01<12:03:33, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:37,291 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3461/10000 [7:36:08<12:00:39, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:43,687 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3462/10000 [7:36:14<11:53:43, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:50,311 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3463/10000 [7:36:21<11:57:54, 6.59s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:58:57,164 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3464/10000 [7:36:28<12:04:59, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:03,762 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3465/10000 [7:36:34<12:01:30, 6.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:10,444 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3466/10000 [7:36:41<12:03:24, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:17,080 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3467/10000 [7:36:48<12:04:29, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:23,916 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3468/10000 [7:36:54<12:05:25, 6.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:33,213 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3469/10000 [7:37:04<13:33:33, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:39,658 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3470/10000 [7:37:10<12:57:29, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:45,673 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|███████████████████��▍ | 3471/10000 [7:37:16<12:22:18, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:53,024 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3472/10000 [7:37:24<12:42:00, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 16:59:59,978 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3473/10000 [7:37:31<12:40:08, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:08,172 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▍ | 3474/10000 [7:37:39<13:18:01, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:15,124 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3475/10000 [7:37:46<13:04:33, 7.21s/it] 35%|████████████████████▌ | 3475/10000 [7:37:46<13:04:33, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:22,045 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3476/10000 [7:37:53<12:56:12, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:29,393 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3477/10000 [7:38:00<13:01:46, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:37,588 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3478/10000 [7:38:08<13:31:00, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:44,462 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3479/10000 [7:38:15<13:17:18, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:52,125 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3480/10000 [7:38:23<13:29:16, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:00:59,744 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3481/10000 [7:38:30<13:29:28, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:01:07,005 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3482/10000 [7:38:38<13:25:26, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:01:15,148 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3483/10000 [7:38:46<13:48:23, 7.63s/it]{'loss': 0.1726, 'learning_rate': 2.148e-06, 'epoch': 0.32} +{'loss': 0.197, 'learning_rate': 2.1401052631578948e-06, 'epoch': 0.32} +{'loss': 0.1587, 'learning_rate': 2.1322105263157895e-06, 'epoch': 0.33} +{'loss': 0.1815, 'learning_rate': 2.1243157894736843e-06, 'epoch': 0.33} +{'loss': 0.1887, 'learning_rate': 2.116421052631579e-06, 'epoch': 0.33} +{'loss': 0.1535, 'learning_rate': 2.1085263157894737e-06, 'epoch': 0.33} +{'loss': 0.2055, 'learning_rate': 2.1006315789473685e-06, 'epoch': 0.34} +{'loss': 0.1645, 'learning_rate': 2.0927368421052632e-06, 'epoch': 0.34} +{'loss': 0.1457, 'learning_rate': 2.084842105263158e-06, 'epoch': 0.34} +{'loss': 0.1494, 'learning_rate': 2.0769473684210527e-06, 'epoch': 0.34} +{'loss': 0.1395, 'learning_rate': 2.0690526315789475e-06, 'epoch': 0.34} +{'loss': 0.1433, 'learning_rate': 2.0611578947368422e-06, 'epoch': 0.35} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.32it/s] Reading metadata...: 2165it [00:00, 15590.25it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 17:01:24,102 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3484/10000 [7:38:55<14:31:12, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:01:46,889 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3485/10000 [7:39:18<22:34:09, 12.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:01:54,841 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3486/10000 [7:39:26<20:07:46, 11.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:02,514 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3487/10000 [7:39:33<18:11:16, 10.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:09,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3488/10000 [7:39:40<16:44:32, 9.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:20,672 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3489/10000 [7:39:51<17:37:12, 9.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:28,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3490/10000 [7:39:59<16:24:47, 9.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:35,998 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3491/10000 [7:40:07<15:43:37, 8.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:43,449 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3492/10000 [7:40:14<15:01:26, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:50,692 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3493/10000 [7:40:21<14:27:17, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:02:58,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3494/10000 [7:40:29<14:08:16, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:04,973 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▌ | 3495/10000 [7:40:36<13:34:33, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:11,621 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3496/10000 [7:40:42<13:05:00, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:18,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3497/10000 [7:40:49<12:47:43, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:29,517 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3498/10000 [7:41:00<15:05:18, 8.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:36,600 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3499/10000 [7:41:07<14:22:50, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:43,988 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3500/10000 [7:41:15<14:04:33, 7.80s/it] 35%|████████████████████▋ | 3500/10000 [7:41:15<14:04:33, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:51,491 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3501/10000 [7:41:22<13:52:01, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:03:58,615 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3502/10000 [7:41:29<13:36:01, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:07,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3503/10000 [7:41:38<14:08:47, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:14,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3504/10000 [7:41:45<13:44:42, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:21,192 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3505/10000 [7:41:52<13:20:48, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:28,055 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3506/10000 [7:41:59<13:04:39, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:34,881 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3507/10000 [7:42:06<12:51:19, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:42,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3508/10000 [7:42:13<12:54:58, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:04:52,626 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3509/10000 [7:42:23<14:41:58, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:00,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3510/10000 [7:42:32<14:45:38, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:08,348 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3511/10000 [7:42:39<14:23:25, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:16,789 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3512/10000 [7:42:47<14:34:48, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:24,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3513/10000 [7:42:55<14:15:46, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:31,937 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3514/10000 [7:43:03<14:06:58, 7.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:41,078 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3515/10000 [7:43:12<14:49:07, 8.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:48,660 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▋ | 3516/10000 [7:43:19<14:25:11, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:05:56,301 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3517/10000 [7:43:27<14:18:15, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:03,935 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3518/10000 [7:43:35<14:08:44, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:11,432 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3519/10000 [7:43:42<13:53:59, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:19,395 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3520/10000 [7:43:50<14:04:20, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:27,099 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3521/10000 [7:43:58<13:56:37, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:34,643 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3522/10000 [7:44:05<13:53:38, 7.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:43,194 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3523/10000 [7:44:14<14:21:19, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:51,565 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3524/10000 [7:44:22<14:33:25, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:06:59,986 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3525/10000 [7:44:31<14:42:27, 8.18s/it] 35%|████████████████████▊ | 3525/10000 [7:44:31<14:42:27, 8.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:08,314 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3526/10000 [7:44:39<14:48:53, 8.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:20,278 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3527/10000 [7:44:51<16:46:51, 9.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:27,738 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3528/10000 [7:44:58<15:49:09, 8.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:36,214 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3529/10000 [7:45:07<15:37:15, 8.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:44,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3530/10000 [7:45:15<15:34:46, 8.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:07:53,378 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3531/10000 [7:45:24<15:29:25, 8.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:03,258 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3532/10000 [7:45:34<16:11:12, 9.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:10,755 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3533/10000 [7:45:41<15:23:47, 8.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:18,926 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3534/10000 [7:45:50<15:08:43, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:26,381 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3535/10000 [7:45:57<14:35:44, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:34,130 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3536/10000 [7:46:05<14:23:47, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:42,550 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3537/10000 [7:46:13<14:38:39, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:50,253 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▊ | 3538/10000 [7:46:21<14:23:46, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:08:57,976 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3539/10000 [7:46:29<14:13:37, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:05,831 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3540/10000 [7:46:36<14:09:21, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:14,085 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3541/10000 [7:46:45<14:22:55, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:21,695 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3542/10000 [7:46:52<14:07:29, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:30,453 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3543/10000 [7:47:01<14:35:25, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:37,389 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3544/10000 [7:47:08<13:58:26, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:44,597 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3545/10000 [7:47:15<13:39:49, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:09:52,417 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3546/10000 [7:47:23<13:48:36, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:00,346 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3547/10000 [7:47:31<13:54:19, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:09,337 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|████████████████████▉ | 3548/10000 [7:47:40<14:34:02, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:17,026 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 35%|��███████████████████▉ | 3549/10000 [7:47:48<14:18:12, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:24,781 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3550/10000 [7:47:55<14:07:42, 7.89s/it] 36%|████████████████████▉ | 3550/10000 [7:47:55<14:07:42, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:31,927 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3551/10000 [7:48:03<13:44:50, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:39,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3552/10000 [7:48:10<13:33:32, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:45,988 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3553/10000 [7:48:17<13:05:36, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:52,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3554/10000 [7:48:23<12:39:30, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:10:59,104 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3555/10000 [7:48:30<12:27:43, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:05,771 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3556/10000 [7:48:36<12:15:59, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:12,290 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3557/10000 [7:48:43<12:05:42, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:18,836 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3558/10000 [7:48:49<11:58:14, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:25,833 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████▉ | 3559/10000 [7:48:56<12:07:53, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:32,453 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3560/10000 [7:49:03<12:03:34, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:39,150 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3561/10000 [7:49:10<12:03:37, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:45,876 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3562/10000 [7:49:17<12:03:23, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:52,390 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3563/10000 [7:49:23<11:55:13, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:11:58,496 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3564/10000 [7:49:29<11:36:21, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:05,112 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3565/10000 [7:49:36<11:41:27, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:12,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3566/10000 [7:49:43<11:55:08, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:18,858 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3567/10000 [7:49:50<11:58:17, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:26,457 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3568/10000 [7:49:57<12:24:38, 6.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:32,724 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3569/10000 [7:50:03<12:05:42, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:39,346 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3570/10000 [7:50:10<11:58:26, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:45,585 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3571/10000 [7:50:16<11:43:15, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:12:53,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3572/10000 [7:50:24<12:27:54, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:02,619 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3573/10000 [7:50:33<13:34:07, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:10,699 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3574/10000 [7:50:41<13:49:49, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:18,938 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3575/10000 [7:50:50<14:05:16, 7.89s/it] 36%|█████████████████████ | 3575/10000 [7:50:50<14:05:16, 7.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:27,099 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3576/10000 [7:50:58<14:10:56, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:33,943 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3577/10000 [7:51:05<13:38:57, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:40,021 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3578/10000 [7:51:11<12:48:14, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:46,808 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3579/10000 [7:51:17<12:37:39, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:13:55,495 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████ | 3580/10000 [7:51:26<13:28:06, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:03,398 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3581/10000 [7:51:34<13:38:42, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:09,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3582/10000 [7:51:40<12:54:25, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:16,617 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3583/10000 [7:51:47<12:43:17, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:23,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3584/10000 [7:51:54<12:29:19, 7.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:31,104 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3585/10000 [7:52:02<12:55:57, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:37,208 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3586/10000 [7:52:08<12:20:07, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:14:43,434 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3587/10000 [7:52:14<11:57:26, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:13,745 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3588/10000 [7:52:44<24:30:52, 13.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:20,715 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3589/10000 [7:52:51<20:55:33, 11.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:27,039 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3590/10000 [7:52:58<17:58:13, 10.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:33,078 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3591/10000 [7:53:04<15:49:06, 8.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:39,297 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3592/10000 [7:53:10<14:24:23, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:46,120 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3593/10000 [7:53:17<13:40:27, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:52,604 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3594/10000 [7:53:23<13:07:10, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:15:59,235 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3595/10000 [7:53:30<12:43:11, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:06,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3596/10000 [7:53:37<12:32:30, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:12,855 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3597/10000 [7:53:43<12:23:08, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:19,855 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3598/10000 [7:53:51<12:25:05, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:26,818 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3599/10000 [7:53:57<12:23:26, 6.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:33,737 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3600/10000 [7:54:04<12:19:08, 6.93s/it] 36%|█████████████████████▏ | 3600/10000 [7:54:04<12:19:08, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:40,365 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▏ | 3601/10000 [7:54:11<12:11:18, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:47,033 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3602/10000 [7:54:18<12:05:45, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:16:54,311 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3603/10000 [7:54:25<12:22:17, 6.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:01,536 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3604/10000 [7:54:32<12:31:13, 7.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:09,709 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3605/10000 [7:54:40<13:03:48, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:16,742 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3606/10000 [7:54:47<12:54:44, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:23,778 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3607/10000 [7:54:54<12:48:26, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:32,175 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3608/10000 [7:55:03<13:24:34, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:38,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3609/10000 [7:55:09<12:47:31, 7.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:44,763 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3610/10000 [7:55:15<12:15:49, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:17:52,684 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3611/10000 [7:55:23<12:48:29, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:00,251 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3612/10000 [7:55:31<12:59:08, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:11,683 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|████████████████████���▎ | 3613/10000 [7:55:42<15:11:00, 8.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:24,820 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3614/10000 [7:55:56<17:37:55, 9.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:31,181 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3615/10000 [7:56:02<15:44:03, 8.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:37,740 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3616/10000 [7:56:08<14:25:38, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:44,207 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3617/10000 [7:56:15<13:34:53, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:50,902 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3618/10000 [7:56:21<12:59:15, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:18:57,435 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3619/10000 [7:56:28<12:34:03, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:03,454 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3620/10000 [7:56:34<11:59:47, 6.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:09,639 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3621/10000 [7:56:40<11:45:53, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:15,989 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▎ | 3622/10000 [7:56:47<11:36:07, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:22,824 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3623/10000 [7:56:53<11:42:31, 6.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:29,469 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3624/10000 [7:57:00<11:46:09, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:40,535 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3625/10000 [7:57:11<14:04:49, 7.95s/it] 36%|█████████████████████▍ | 3625/10000 [7:57:11<14:04:49, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:48,504 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3626/10000 [7:57:19<14:05:11, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:19:57,879 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3627/10000 [7:57:29<14:52:54, 8.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:05,238 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3628/10000 [7:57:36<14:19:47, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:12,756 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3629/10000 [7:57:43<13:58:45, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:20,690 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3630/10000 [7:57:51<13:59:07, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:27,848 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3631/10000 [7:57:58<13:34:43, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:34,619 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3632/10000 [7:58:05<13:08:46, 7.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:41,560 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3633/10000 [7:58:12<12:53:20, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:48,637 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3634/10000 [7:58:19<12:44:20, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:20:55,534 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3635/10000 [7:58:26<12:33:33, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:03,169 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3636/10000 [7:58:34<12:49:29, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:11,491 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3637/10000 [7:58:42<13:26:46, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:20,398 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3638/10000 [7:58:51<14:08:06, 8.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:28,808 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3639/10000 [7:58:59<14:19:01, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:36,451 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3640/10000 [7:59:07<14:04:08, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:44,070 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3641/10000 [7:59:15<13:52:49, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:51,734 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3642/10000 [7:59:22<13:49:09, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:21:58,531 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3643/10000 [7:59:29<13:14:30, 7.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:06,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▍ | 3644/10000 [7:59:37<13:23:26, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:13,722 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3645/10000 [7:59:44<13:15:02, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:20,883 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3646/10000 [7:59:52<13:08:20, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:29,663 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3647/10000 [8:00:00<13:50:52, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:38,633 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3648/10000 [8:00:09<14:21:06, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:46,019 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3649/10000 [8:00:17<13:58:46, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:22:53,675 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 36%|█████████████████████▌ | 3650/10000 [8:00:24<13:52:05, 7.86s/it] 36%|█████████████████████▌ | 3650/10000 [8:00:24<13:52:05, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:06,220 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3651/10000 [8:00:37<16:21:58, 9.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:14,855 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3652/10000 [8:00:45<15:57:55, 9.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:21,959 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3653/10000 [8:00:53<14:55:36, 8.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:29,162 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3654/10000 [8:01:00<14:19:50, 8.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:36,143 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3655/10000 [8:01:07<13:40:48, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:43,187 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3656/10000 [8:01:14<13:17:53, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:50,759 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3657/10000 [8:01:21<13:14:24, 7.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:23:57,556 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3658/10000 [8:01:28<12:56:38, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:04,363 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3659/10000 [8:01:35<12:39:00, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:11,328 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3660/10000 [8:01:42<12:31:14, 7.11s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:18,291 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3661/10000 [8:01:49<12:24:06, 7.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:25,286 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3662/10000 [8:01:56<12:26:17, 7.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:32,628 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3663/10000 [8:02:03<12:33:31, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:40,402 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3664/10000 [8:02:11<12:55:31, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:48,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▌ | 3665/10000 [8:02:19<13:07:53, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:24:56,461 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3666/10000 [8:02:27<13:34:21, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:05,721 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3667/10000 [8:02:36<14:20:15, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:13,617 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3668/10000 [8:02:44<14:16:24, 8.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:23,508 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3669/10000 [8:02:54<15:11:51, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:30,905 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3670/10000 [8:03:01<14:27:47, 8.23s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:37,750 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3671/10000 [8:03:08<13:48:49, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:45,076 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3672/10000 [8:03:16<13:29:33, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:51,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3673/10000 [8:03:23<13:05:13, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:25:58,926 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3674/10000 [8:03:29<12:47:30, 7.28s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:05,827 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3675/10000 [8:03:37<12:39:20, 7.20s/it] 37%|█████████████████████▋ | 3675/10000 [8:03:37<12:39:20, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:12,775 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3676/10000 [8:03:43<12:26:58, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:19,414 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████���███▋ | 3677/10000 [8:03:50<12:17:43, 7.00s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:26,869 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3678/10000 [8:03:57<12:26:45, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:33,740 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3679/10000 [8:04:04<12:19:14, 7.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:40,510 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3680/10000 [8:04:11<12:15:37, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:47,546 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3681/10000 [8:04:18<12:15:51, 6.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:26:54,473 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3682/10000 [8:04:25<12:15:07, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:00,939 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3683/10000 [8:04:32<11:57:44, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:07,197 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3684/10000 [8:04:38<11:40:19, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:13,595 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3685/10000 [8:04:44<11:31:23, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:20,310 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▋ | 3686/10000 [8:04:51<11:37:51, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:27,867 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3687/10000 [8:04:59<12:07:46, 6.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:34,256 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3688/10000 [8:05:05<11:49:37, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:40,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3689/10000 [8:05:11<11:28:07, 6.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:46,481 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3690/10000 [8:05:17<11:13:41, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:53,094 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3691/10000 [8:05:24<11:20:40, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:27:59,475 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3692/10000 [8:05:30<11:19:16, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:07,354 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3693/10000 [8:05:38<12:04:06, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:14,027 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3694/10000 [8:05:45<11:52:39, 6.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:20,317 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3695/10000 [8:05:51<11:41:35, 6.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:30,579 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3696/10000 [8:06:01<13:32:46, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:37,581 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3697/10000 [8:06:08<13:12:55, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:45,499 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3698/10000 [8:06:16<13:22:46, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:52,807 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3699/10000 [8:06:24<13:12:41, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:28:59,446 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3700/10000 [8:06:30<12:41:18, 7.25s/it] 37%|█████████████████████▊ | 3700/10000 [8:06:30<12:41:18, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:06,342 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3701/10000 [8:06:37<12:32:35, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:14,961 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3702/10000 [8:06:45<13:12:41, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:21,191 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3703/10000 [8:06:52<12:32:09, 7.17s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:27,263 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3704/10000 [8:06:58<11:58:19, 6.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:33,338 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3705/10000 [8:07:04<11:36:28, 6.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:44,617 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3706/10000 [8:07:15<14:00:08, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:51,069 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▊ | 3707/10000 [8:07:22<13:13:49, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:29:58,100 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3708/10000 [8:07:29<12:53:52, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:04,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3709/10000 [8:07:35<12:30:12, 7.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:11,302 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3710/10000 [8:07:42<12:11:38, 6.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:17,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3711/10000 [8:07:49<12:00:41, 6.88s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:24,167 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3712/10000 [8:07:55<11:39:19, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:30,300 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3713/10000 [8:08:01<11:23:30, 6.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:36,355 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3714/10000 [8:08:07<11:08:29, 6.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:42,616 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3715/10000 [8:08:13<11:05:10, 6.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:49,248 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3716/10000 [8:08:20<11:15:56, 6.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:30:55,847 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3717/10000 [8:08:26<11:16:47, 6.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:01,920 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3718/10000 [8:08:33<11:08:32, 6.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:08,731 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3719/10000 [8:08:39<11:19:22, 6.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:15,055 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3720/10000 [8:08:46<11:10:31, 6.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:21,859 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3721/10000 [8:08:53<11:28:05, 6.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:30,744 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3722/10000 [8:09:01<12:40:55, 7.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:36,917 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3723/10000 [8:09:08<12:05:02, 6.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:43,224 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3724/10000 [8:09:14<11:46:26, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:49,381 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3725/10000 [8:09:20<11:24:53, 6.55s/it] 37%|█████████████████████▉ | 3725/10000 [8:09:20<11:24:53, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:31:58,322 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3726/10000 [8:09:29<12:43:04, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:04,636 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3727/10000 [8:09:35<12:05:42, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:10,541 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|█████████████████████▉ | 3728/10000 [8:09:41<11:39:20, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:17,410 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3729/10000 [8:09:48<11:42:33, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:23,903 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3730/10000 [8:09:54<11:34:25, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:30,952 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3731/10000 [8:10:02<11:46:13, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:37,405 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3732/10000 [8:10:08<11:38:56, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:45,732 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3733/10000 [8:10:16<12:30:56, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:32:52,941 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3734/10000 [8:10:24<12:29:34, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:01,513 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3735/10000 [8:10:32<13:15:10, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:09,464 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3736/10000 [8:10:40<13:23:35, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:16,439 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3737/10000 [8:10:47<12:59:33, 7.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:23,473 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3738/10000 [8:10:54<12:46:28, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:31,467 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3739/10000 [8:11:02<13:09:55, 7.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:38,963 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3740/10000 [8:11:10<13:06:42, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:45,954 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3741/10000 [8:11:17<12:49:21, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:33:53,221 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3742/10000 [8:11:24<12:45:43, 7.34s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:00,210 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3743/10000 [8:11:31<12:33:12, 7.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:07,267 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3744/10000 [8:11:38<12:30:22, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:14,936 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3745/10000 [8:11:46<12:41:57, 7.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:24,020 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3746/10000 [8:11:55<13:39:53, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:32,282 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3747/10000 [8:12:03<13:51:56, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:40,123 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3748/10000 [8:12:11<13:46:49, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:47,710 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 37%|██████████████████████ | 3749/10000 [8:12:18<13:37:51, 7.85s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:34:56,074 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3750/10000 [8:12:27<13:51:42, 7.98s/it] 38%|██████████████████████▏ | 3750/10000 [8:12:27<13:51:42, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:04,737 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3751/10000 [8:12:35<14:13:00, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:11,939 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3752/10000 [8:12:43<13:42:51, 7.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:19,382 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3753/10000 [8:12:50<13:28:34, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:26,990 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3754/10000 [8:12:58<13:19:03, 7.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:34,218 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3755/10000 [8:13:05<13:09:13, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:41,745 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3756/10000 [8:13:12<13:07:10, 7.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:49,885 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3757/10000 [8:13:21<13:26:14, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:35:57,451 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3758/10000 [8:13:28<13:17:30, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:04,866 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3759/10000 [8:13:36<13:11:40, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:14,504 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3760/10000 [8:13:45<14:14:05, 8.21s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:22,215 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3761/10000 [8:13:53<13:58:00, 8.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:29,972 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3762/10000 [8:14:01<13:49:24, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:38,217 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3763/10000 [8:14:09<13:57:12, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:46,695 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3764/10000 [8:14:17<14:12:06, 8.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:36:59,222 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3765/10000 [8:14:30<16:25:11, 9.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:08,811 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3766/10000 [8:14:40<16:28:46, 9.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:15,723 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3767/10000 [8:14:46<15:04:02, 8.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:24,033 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3768/10000 [8:14:55<14:54:11, 8.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:31,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3769/10000 [8:15:02<14:17:05, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:38,859 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3770/10000 [8:15:09<13:48:42, 7.98s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:46,746 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▏ | 3771/10000 [8:15:17<13:47:30, 7.97s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:37:53,773 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3772/10000 [8:15:24<13:18:02, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:00,802 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3773/10000 [8:15:31<12:52:17, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:07,831 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3774/10000 [8:15:38<12:42:02, 7.34s/it]{'loss': 0.2415, 'learning_rate': 2.053263157894737e-06, 'epoch': 0.35} +{'loss': 0.1563, 'learning_rate': 2.0453684210526317e-06, 'epoch': 0.35} +{'loss': 0.1455, 'learning_rate': 2.0374736842105265e-06, 'epoch': 0.35} +{'loss': 0.1406, 'learning_rate': 2.0295789473684208e-06, 'epoch': 0.36} +{'loss': 0.1358, 'learning_rate': 2.0216842105263155e-06, 'epoch': 0.36} +{'loss': 0.1293, 'learning_rate': 2.0137894736842107e-06, 'epoch': 0.36} +{'loss': 0.1599, 'learning_rate': 2.0058947368421054e-06, 'epoch': 0.36} +{'loss': 0.1893, 'learning_rate': 1.998e-06, 'epoch': 0.37} +{'loss': 0.1764, 'learning_rate': 1.990105263157895e-06, 'epoch': 0.37} +{'loss': 0.1739, 'learning_rate': 1.9822105263157897e-06, 'epoch': 0.37} +{'loss': 0.1658, 'learning_rate': 1.9743157894736844e-06, 'epoch': 0.38} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 8.40it/s] Reading metadata...: 2165it [00:00, 15181.53it/s] +[WARNING|modeling_whisper.py:902] 2022-12-16 17:38:15,537 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3775/10000 [8:15:46<12:54:18, 7.46s/it] 38%|██████████████████████▎ | 3775/10000 [8:15:46<12:54:18, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:23,443 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3776/10000 [8:15:54<13:06:28, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:31,162 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3777/10000 [8:16:02<13:13:03, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:39,142 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3778/10000 [8:16:10<13:19:07, 7.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:38:53,743 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3779/10000 [8:16:24<16:58:11, 9.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:01,036 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3780/10000 [8:16:32<15:38:47, 9.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:08,860 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3781/10000 [8:16:39<14:57:52, 8.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:16,059 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3782/10000 [8:16:47<14:14:42, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:22,953 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3783/10000 [8:16:54<13:30:37, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:31,199 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3784/10000 [8:17:02<13:42:36, 7.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:41,654 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3785/10000 [8:17:12<15:00:31, 8.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:49,299 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3786/10000 [8:17:20<14:30:01, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:39:56,887 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3787/10000 [8:17:28<14:04:34, 8.16s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:04,732 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3788/10000 [8:17:35<13:53:37, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:12,268 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3789/10000 [8:17:43<13:40:25, 7.93s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:19,727 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3790/10000 [8:17:50<13:25:52, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:27,080 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3791/10000 [8:17:58<13:12:05, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:35,113 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▎ | 3792/10000 [8:18:06<13:22:58, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:43,765 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3793/10000 [8:18:14<13:49:13, 8.02s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:40:52,792 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3794/10000 [8:18:23<14:19:59, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:00,086 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3795/10000 [8:18:31<13:50:30, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:07,503 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3796/10000 [8:18:38<13:27:36, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:17,373 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3797/10000 [8:18:48<14:31:15, 8.43s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:24,794 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3798/10000 [8:18:55<14:02:42, 8.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:32,200 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3799/10000 [8:19:03<13:37:40, 7.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:39,724 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3800/10000 [8:19:10<13:22:30, 7.77s/it] 38%|██████████████████████▍ | 3800/10000 [8:19:10<13:22:30, 7.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:47,005 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3801/10000 [8:19:18<13:11:13, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:41:55,215 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3802/10000 [8:19:26<13:25:24, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:03,911 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3803/10000 [8:19:35<13:56:45, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:11,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3804/10000 [8:19:43<13:55:24, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:21,823 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3805/10000 [8:19:53<14:51:02, 8.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:29,420 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3806/10000 [8:20:00<14:15:55, 8.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:38,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3807/10000 [8:20:09<14:33:34, 8.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:45,563 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3808/10000 [8:20:16<13:55:37, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:42:53,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3809/10000 [8:20:24<13:50:11, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:00,010 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3810/10000 [8:20:31<13:04:00, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:07,484 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3811/10000 [8:20:38<13:02:11, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:15,071 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3812/10000 [8:20:46<13:01:50, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:21,591 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▍ | 3813/10000 [8:20:52<12:27:49, 7.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:28,800 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3814/10000 [8:20:59<12:28:01, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:35,718 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3815/10000 [8:21:06<12:14:42, 7.13s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:42,149 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3816/10000 [8:21:13<11:55:31, 6.94s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:48,689 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3817/10000 [8:21:19<11:39:12, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:43:55,381 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3818/10000 [8:21:26<11:40:10, 6.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:01,802 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3819/10000 [8:21:32<11:27:13, 6.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:07,910 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3820/10000 [8:21:39<11:09:12, 6.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:14,542 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3821/10000 [8:21:45<11:15:32, 6.56s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:21,623 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3822/10000 [8:21:52<11:30:48, 6.71s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:28,648 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3823/10000 [8:21:59<11:41:00, 6.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:35,343 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3824/10000 [8:22:06<11:33:45, 6.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:43,515 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3825/10000 [8:22:14<12:20:12, 7.19s/it] 38%|██████████████████████▌ | 3825/10000 [8:22:14<12:20:12, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:44:50,642 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3826/10000 [8:22:21<12:19:05, 7.18s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:01,424 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3827/10000 [8:22:32<14:10:29, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:07,664 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3828/10000 [8:22:38<13:06:29, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:14,394 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3829/10000 [8:22:45<12:38:02, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:22,588 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3830/10000 [8:22:53<13:03:54, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:29,800 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3831/10000 [8:23:00<12:47:12, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:37,453 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3832/10000 [8:23:08<12:55:00, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:44,345 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3833/10000 [8:23:15<12:38:55, 7.38s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:51,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▌ | 3834/10000 [8:23:22<12:30:08, 7.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:45:58,160 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3835/10000 [8:23:29<12:07:56, 7.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:04,694 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3836/10000 [8:23:35<11:48:52, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:11,388 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3837/10000 [8:23:42<11:47:52, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:18,334 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3838/10000 [8:23:49<11:48:21, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:25,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3839/10000 [8:23:56<11:47:11, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:31,845 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3840/10000 [8:24:03<11:40:37, 6.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:39,965 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3841/10000 [8:24:11<12:18:17, 7.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:46,806 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3842/10000 [8:24:17<12:07:39, 7.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:53,334 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3843/10000 [8:24:24<11:49:33, 6.91s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:46:59,532 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3844/10000 [8:24:30<11:29:15, 6.72s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:08,131 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3845/10000 [8:24:39<12:25:01, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:14,147 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3846/10000 [8:24:45<11:44:28, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:20,171 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3847/10000 [8:24:51<11:19:35, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:26,262 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3848/10000 [8:24:57<11:04:25, 6.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:32,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3849/10000 [8:25:04<11:13:12, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:40,702 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 38%|██████████████████████▋ | 3850/10000 [8:25:11<11:44:16, 6.87s/it] 38%|██████████████████████▋ | 3850/10000 [8:25:11<11:44:16, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:47,025 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▋ | 3851/10000 [8:25:18<11:31:26, 6.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:47:53,240 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▋ | 3852/10000 [8:25:24<11:10:56, 6.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:00,049 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▋ | 3853/10000 [8:25:31<11:20:59, 6.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:12,463 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▋ | 3854/10000 [8:25:43<14:19:36, 8.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:19,705 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▋ | 3855/10000 [8:25:50<13:44:26, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:26,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3856/10000 [8:25:57<13:08:12, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:34,171 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3857/10000 [8:26:05<13:04:10, 7.66s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:41,245 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3858/10000 [8:26:12<12:44:00, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:48,141 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3859/10000 [8:26:19<12:26:32, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:48:55,939 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3860/10000 [8:26:27<12:41:08, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:03,005 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3861/10000 [8:26:34<12:32:39, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:10,110 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3862/10000 [8:26:41<12:22:50, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:16,932 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3863/10000 [8:26:48<12:10:13, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:28,978 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3864/10000 [8:27:00<14:40:53, 8.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:37,099 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3865/10000 [8:27:08<14:26:55, 8.48s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:44,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3866/10000 [8:27:15<13:41:18, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:50,984 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3867/10000 [8:27:22<13:03:31, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:49:57,914 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3868/10000 [8:27:29<12:41:40, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:04,683 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3869/10000 [8:27:35<12:21:49, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:11,705 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3870/10000 [8:27:42<12:15:16, 7.20s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:23,682 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3871/10000 [8:27:54<14:38:19, 8.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:30,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3872/10000 [8:28:01<13:46:56, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:38,395 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3873/10000 [8:28:09<13:35:34, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:46,556 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3874/10000 [8:28:17<13:43:08, 8.06s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:50:54,014 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3875/10000 [8:28:25<13:23:14, 7.87s/it] 39%|██████████████████████▊ | 3875/10000 [8:28:25<13:23:14, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:01,558 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3876/10000 [8:28:32<13:15:12, 7.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:09,105 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▊ | 3877/10000 [8:28:40<13:04:43, 7.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:16,924 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3878/10000 [8:28:47<13:09:16, 7.74s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:24,301 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3879/10000 [8:28:55<13:00:52, 7.65s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:31,569 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3880/10000 [8:29:02<12:46:57, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:38,596 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3881/10000 [8:29:09<12:34:14, 7.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:45,970 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3882/10000 [8:29:17<12:30:09, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:51:52,853 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3883/10000 [8:29:23<12:17:45, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:02,077 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3884/10000 [8:29:33<13:17:28, 7.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:09,748 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3885/10000 [8:29:40<13:10:49, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:18,129 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3886/10000 [8:29:49<13:34:11, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:24,993 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3887/10000 [8:29:56<12:57:13, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:32,423 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3888/10000 [8:30:03<12:52:35, 7.58s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:39,581 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3889/10000 [8:30:10<12:39:08, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:46,692 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3890/10000 [8:30:17<12:25:34, 7.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:52:54,046 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3891/10000 [8:30:25<12:30:04, 7.37s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:01,406 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3892/10000 [8:30:32<12:28:05, 7.35s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:08,625 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3893/10000 [8:30:39<12:26:01, 7.33s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:16,223 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3894/10000 [8:30:47<12:33:53, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:23,792 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3895/10000 [8:30:54<12:38:41, 7.46s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:31,492 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3896/10000 [8:31:02<12:44:37, 7.52s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:38,575 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3897/10000 [8:31:09<12:28:08, 7.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:45,544 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|██████████████████████▉ | 3898/10000 [8:31:16<12:21:15, 7.29s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:53:52,832 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|█████████��█████████████ | 3899/10000 [8:31:23<12:18:29, 7.26s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:01,307 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3900/10000 [8:31:32<12:53:39, 7.61s/it] 39%|███████████████████████ | 3900/10000 [8:31:32<12:53:39, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:08,737 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3901/10000 [8:31:39<12:52:17, 7.60s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:16,460 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3902/10000 [8:31:47<12:55:39, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:24,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3903/10000 [8:31:55<13:07:23, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:37,282 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3904/10000 [8:32:08<15:41:40, 9.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:44,380 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3905/10000 [8:32:15<14:34:53, 8.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:54:52,399 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3906/10000 [8:32:23<14:16:47, 8.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:00,021 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3907/10000 [8:32:31<13:51:55, 8.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:07,082 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3908/10000 [8:32:38<13:18:33, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:15,549 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3909/10000 [8:32:46<13:34:51, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:23,805 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3910/10000 [8:32:54<13:39:56, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:31,787 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3911/10000 [8:33:02<13:39:07, 8.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:40,523 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3912/10000 [8:33:11<13:59:27, 8.27s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:48,154 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3913/10000 [8:33:19<13:39:15, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:55:55,867 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3914/10000 [8:33:26<13:26:19, 7.95s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:03,260 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3915/10000 [8:33:34<13:10:40, 7.80s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:10,490 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3916/10000 [8:33:41<12:53:41, 7.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:18,302 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3917/10000 [8:33:49<13:00:28, 7.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:25,834 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3918/10000 [8:33:56<12:52:23, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:33,852 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████ | 3919/10000 [8:34:05<13:05:15, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:40,813 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3920/10000 [8:34:11<12:38:42, 7.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:48,037 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3921/10000 [8:34:19<12:30:27, 7.41s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:56:55,776 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3922/10000 [8:34:26<12:43:34, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:03,732 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3923/10000 [8:34:34<12:56:54, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:13,546 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3924/10000 [8:34:44<14:00:33, 8.30s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:21,231 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3925/10000 [8:34:52<13:40:00, 8.10s/it] 39%|███████████████████████▏ | 3925/10000 [8:34:52<13:40:00, 8.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:28,658 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3926/10000 [8:34:59<13:21:54, 7.92s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:37,985 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3927/10000 [8:35:09<14:01:06, 8.31s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:45,352 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3928/10000 [8:35:16<13:34:46, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:57:53,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3929/10000 [8:35:24<13:32:10, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:01,750 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3930/10000 [8:35:32<13:43:35, 8.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:09,461 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3931/10000 [8:35:40<13:32:01, 8.03s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:17,079 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3932/10000 [8:35:48<13:15:55, 7.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:24,228 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3933/10000 [8:35:55<12:55:43, 7.67s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:32,168 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3934/10000 [8:36:03<13:04:05, 7.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:39,344 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3935/10000 [8:36:10<12:48:59, 7.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:46,839 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3936/10000 [8:36:17<12:41:30, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:58:54,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3937/10000 [8:36:25<12:41:46, 7.54s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:01,861 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3938/10000 [8:36:33<12:42:35, 7.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:10,943 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3939/10000 [8:36:42<13:28:57, 8.01s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:18,542 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▏ | 3940/10000 [8:36:49<13:13:56, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:26,970 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3941/10000 [8:36:58<13:32:07, 8.04s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:34,724 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3942/10000 [8:37:05<13:23:45, 7.96s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:42,130 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3943/10000 [8:37:13<13:05:17, 7.78s/it][WARNING|modeling_whisper.py:902] 2022-12-16 17:59:57,678 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3944/10000 [8:37:28<17:02:58, 10.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:06,559 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3945/10000 [8:37:37<16:22:20, 9.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:17,547 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3946/10000 [8:37:48<17:00:55, 10.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:25,097 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3947/10000 [8:37:56<15:44:00, 9.36s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:32,549 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3948/10000 [8:38:03<14:44:09, 8.77s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:40,082 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 39%|███████████████████████▎ | 3949/10000 [8:38:11<14:07:26, 8.40s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:49,286 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3950/10000 [8:38:20<14:31:14, 8.64s/it] 40%|███████████████████████▎ | 3950/10000 [8:38:20<14:31:14, 8.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:00:57,609 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3951/10000 [8:38:28<14:21:29, 8.55s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:06,368 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3952/10000 [8:38:37<14:27:30, 8.61s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:13,284 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3953/10000 [8:38:44<13:34:23, 8.08s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:20,482 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3954/10000 [8:38:51<13:11:51, 7.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:28,135 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3955/10000 [8:38:59<13:01:16, 7.75s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:35,429 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3956/10000 [8:39:06<12:49:48, 7.64s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:44,590 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3957/10000 [8:39:15<13:34:32, 8.09s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:01:52,485 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3958/10000 [8:39:23<13:30:22, 8.05s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:00,403 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3959/10000 [8:39:31<13:24:22, 7.99s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:11,043 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3960/10000 [8:39:42<14:46:55, 8.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:18,928 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▎ | 3961/10000 [8:39:50<14:16:55, 8.51s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:26,451 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3962/10000 [8:39:57<13:46:54, 8.22s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:38,852 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3963/10000 [8:40:10<15:54:22, 9.49s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:45,657 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3964/10000 [8:40:16<14:33:34, 8.68s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:53,116 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3965/10000 [8:40:24<13:57:11, 8.32s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:02:59,761 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3966/10000 [8:40:30<13:05:38, 7.81s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:06,309 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3967/10000 [8:40:37<12:28:07, 7.44s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:14,144 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3968/10000 [8:40:45<12:37:03, 7.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:20,615 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3969/10000 [8:40:51<12:07:14, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:27,426 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3970/10000 [8:40:58<11:50:45, 7.07s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:33,819 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3971/10000 [8:41:04<11:29:25, 6.86s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:40,073 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3972/10000 [8:41:11<11:16:33, 6.73s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:46,280 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3973/10000 [8:41:17<10:59:50, 6.57s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:52,624 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3974/10000 [8:41:23<10:49:58, 6.47s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:03:59,530 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3975/10000 [8:41:30<11:05:33, 6.63s/it] 40%|███████████████████████▍ | 3975/10000 [8:41:30<11:05:33, 6.63s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:06,313 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3976/10000 [8:41:37<11:11:52, 6.69s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:13,356 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3977/10000 [8:41:44<11:18:34, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:20,450 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3978/10000 [8:41:51<11:31:10, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:27,281 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3979/10000 [8:41:58<11:29:01, 6.87s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:33,912 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3980/10000 [8:42:05<11:20:51, 6.79s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:40,431 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3981/10000 [8:42:11<11:12:25, 6.70s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:47,729 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3982/10000 [8:42:18<11:31:40, 6.90s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:04:56,252 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▍ | 3983/10000 [8:42:27<12:21:17, 7.39s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:05:02,836 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3984/10000 [8:42:33<11:54:06, 7.12s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:05:09,125 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3985/10000 [8:42:40<11:30:23, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:05:15,608 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3986/10000 [8:42:46<11:17:51, 6.76s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:05:49,193 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3987/10000 [8:43:20<24:44:49, 14.82s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:05:56,384 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3988/10000 [8:43:27<20:52:26, 12.50s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:02,365 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3989/10000 [8:43:33<17:35:19, 10.53s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:08,331 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3990/10000 [8:43:39<15:20:10, 9.19s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:14,361 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3991/10000 [8:43:45<13:46:25, 8.25s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:20,506 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3992/10000 [8:43:51<12:42:56, 7.62s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:27,608 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3993/10000 [8:43:58<12:25:56, 7.45s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:34,089 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3994/10000 [8:44:05<11:55:18, 7.15s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:40,115 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3995/10000 [8:44:11<11:24:30, 6.84s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:47,130 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3996/10000 [8:44:18<11:29:24, 6.89s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:06:54,763 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3997/10000 [8:44:25<11:49:54, 7.10s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:07:01,962 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3998/10000 [8:44:33<11:54:41, 7.14s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:07:09,535 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 3999/10000 [8:44:40<12:04:30, 7.24s/it][WARNING|modeling_whisper.py:902] 2022-12-16 18:07:17,577 >> `use_cache = True` is incompatible with gradient checkpointing. Setting `use_cache = False`... + 40%|███████████████████████▌ | 4000/10000 [8:44:48<12:31:38, 7.52s/it] 40%|███████████████████████▌ | 4000/10000 [8:44:48<12:31:38, 7.52s/it][INFO|trainer.py:2955] 2022-12-16 18:07:19,714 >> ***** Running Evaluation ***** +[INFO|trainer.py:2959] 2022-12-16 18:07:19,714 >> Num examples: Unknown +[INFO|trainer.py:2960] 2022-12-16 18:07:19,714 >> Batch size = 32 +{'loss': 0.1672, 'learning_rate': 1.966421052631579e-06, 'epoch': 0.38} +{'loss': 0.1834, 'learning_rate': 1.958526315789474e-06, 'epoch': 0.38} +{'loss': 0.1743, 'learning_rate': 1.9506315789473682e-06, 'epoch': 0.38} +{'loss': 0.1768, 'learning_rate': 1.942736842105263e-06, 'epoch': 0.39} +{'loss': 0.1898, 'learning_rate': 1.9348421052631577e-06, 'epoch': 0.39} +{'loss': 0.1878, 'learning_rate': 1.9269473684210525e-06, 'epoch': 0.39} +{'loss': 0.1691, 'learning_rate': 1.9190526315789472e-06, 'epoch': 0.39} +{'loss': 0.1713, 'learning_rate': 1.911157894736842e-06, 'epoch': 0.4} +{'loss': 0.1281, 'learning_rate': 1.9032631578947371e-06, 'epoch': 0.4} +{'loss': 0.1335, 'learning_rate': 1.8953684210526317e-06, 'epoch': 0.4} + + Reading metadata...: 0it [00:00, ?it/s] + Reading metadata...: 1it [00:00, 4.12it/s] Reading metadata...: 1704it [00:00, 6473.43it/s] +[INFO|trainer_utils.py:689] 2022-12-16 18:07:23,209 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender. If up_votes, segment, age, client_id, down_votes, input_length, locale, accent, path, gender are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message. + 40%|███████████████████████▌ | 4000/10000 [8:50:38<12:31:38, 7.52s/it][INFO|trainer.py:2700] 2022-12-16 18:13:09,814 >> Saving model checkpoint to ./checkpoint-4000 +[INFO|configuration_utils.py:447] 2022-12-16 18:13:09,815 >> Configuration saved in ./checkpoint-4000/config.json +[INFO|modeling_utils.py:1680] 2022-12-16 18:13:10,901 >> Model weights saved in ./checkpoint-4000/pytorch_model.bin +[INFO|feature_extraction_utils.py:368] 2022-12-16 18:13:10,920 >> Feature extractor saved in ./checkpoint-4000/preprocessor_config.json +[INFO|feature_extraction_utils.py:368] 2022-12-16 18:13:15,300 >> Feature extractor saved in ./preprocessor_config.json